• kaggle


    Pipeline可以将许多算法模型串联起来,比如将特征提取、归一化、分类组织在一起形成一个典型的机器学习问题工作流。主要带来两个好处:
    1.直接调用fit和predict方法来对pipeline中的所有算法模型进行训练和预测
    2.可以结合grid search对参数进行选择
     
    在下面的例子中,我们使用决策树模型来预测泰坦尼克乘客生还,我们首先将非数值型数据转换为数值型数据,然后使用决策树模型来进行分类:
    import pandas as pd
    import numpy as np
    titanic = pd.read_csv('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt')
    titanic.head()
    titanic.info()
    X = titanic[['pclass','age','sex']]
    y = titanic['survived']
    X['age'].fillna(X['age'].mean(),inplace=True)
    X.info()
    from sklearn.cross_validation import train_test_split
    X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=33)
    X_train = X_train.to_dict(orient='record')
    X_test = X_test.to_dict(orient='record')
    #将非数值型数据转换为数值型数据
    from sklearn.feature_extraction import DictVectorizer
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.pipeline import Pipeline
    clf = Pipeline([('vecd',DictVectorizer(sparse=False)),('dtc',DecisionTreeClassifier())])
    vec = DictVectorizer(sparse=False)
    clf.fit(X_train,y_train)
    y_predict = clf.predict(X_test)
    from sklearn.metrics import classification_report
    print (clf.score(X_test,y_test))
    print(classification_report(y_predict,y_test,target_names=['died','survivied']))
  • 相关阅读:
    组装query,query汇总,query字段
    POJ 1276, Cash Machine
    POJ 1129, Channel Allocation
    POJ 2531, Network Saboteur
    POJ 1837, Balance
    POJ 3278, Catch That Cow
    POJ 2676, Sudoku
    POJ 3126, Prime Path
    POJ 3414, Pots
    POJ 1426, Find The Multiple
  • 原文地址:https://www.cnblogs.com/gwzz/p/13254993.html
Copyright © 2020-2023  润新知