• python spark 随机森林入门demo


    class pyspark.mllib.tree.RandomForest[source]

    Learning algorithm for a random forest model for classification or regression.

    New in version 1.2.0.

    supportedFeatureSubsetStrategies = ('auto', 'all', 'sqrt', 'log2', 'onethird')
    classmethod trainClassifier(datanumClassescategoricalFeaturesInfonumTreesfeatureSubsetStrategy='auto'impurity='gini'maxDepth=4maxBins=32seed=None)[source]

    Train a random forest model for binary or multiclass classification.

    Parameters:
    • data – Training dataset: RDD of LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}.
    • numClasses – Number of classes for classification.
    • categoricalFeaturesInfo – Map storing arity of categorical features. An entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
    • numTrees – Number of trees in the random forest.
    • featureSubsetStrategy – Number of features to consider for splits at each node. Supported values: “auto”, “all”, “sqrt”, “log2”, “onethird”. If “auto” is set, this parameter is set based on numTrees: if numTrees == 1, set to “all”; if numTrees > 1 (forest) set to “sqrt”. (default: “auto”)
    • impurity – Criterion used for information gain calculation. Supported values: “gini” or “entropy”. (default: “gini”)
    • maxDepth – Maximum depth of tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes). (default: 4)
    • maxBins – Maximum number of bins used for splitting features. (default: 32)
    • seed – Random seed for bootstrapping and choosing feature subsets. Set as None to generate seed based on system time. (default: None)
    Returns:

    RandomForestModel that can be used for prediction.

    Example usage:

    >>> from pyspark.mllib.regression import LabeledPoint
    >>> from pyspark.mllib.tree import RandomForest
    >>>
    >>> data = [
    ...     LabeledPoint(0.0, [0.0]),
    ...     LabeledPoint(0.0, [1.0]),
    ...     LabeledPoint(1.0, [2.0]),
    ...     LabeledPoint(1.0, [3.0])
    ... ]
    >>> model = RandomForest.trainClassifier(sc.parallelize(data), 2, {}, 3, seed=42)
    >>> model.numTrees()
    3
    >>> model.totalNumNodes()
    7
    >>> print(model)
    TreeEnsembleModel classifier with 3 trees
    
    >>> print(model.toDebugString())
    TreeEnsembleModel classifier with 3 trees
    
      Tree 0:
        Predict: 1.0
      Tree 1:
        If (feature 0 <= 1.0)
         Predict: 0.0
        Else (feature 0 > 1.0)
         Predict: 1.0
      Tree 2:
        If (feature 0 <= 1.0)
         Predict: 0.0
        Else (feature 0 > 1.0)
         Predict: 1.0
    
    >>> model.predict([2.0])
    1.0
    >>> model.predict([0.0])
    0.0
    >>> rdd = sc.parallelize([[3.0], [1.0]])
    >>> model.predict(rdd).collect()
    [1.0, 0.0]
    

    New in version 1.2.0.

    摘自:https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTree

  • 相关阅读:
    vue五十:Vue美团项目之商家详情-查看商品详情
    vue四十九:Vue美团项目之商家详情-tabbar状态切换和导航返回
    vue四十八:Vue美团项目之商家详情-左右联动之商品分类跟随商品列表滚动
    vue四十七:Vue美团项目之商家详情-左右联动之选中商品分类跳转到对应商品列表
    vue四十六:Vue美团项目之商家详情-商品滚动完成
    vue四十五:Vue美团项目之商家详情-商品分类滚动完成
    vue四十四:Vue美团项目之商家详情-导航栏和header布局
    vue四十三:Vue美团项目之首页-tabbar搭建
    vue四十二:Vue美团项目之首页-商家列表
    Ugly Number
  • 原文地址:https://www.cnblogs.com/bonelee/p/7150484.html
Copyright © 2020-2023  润新知