填补NaN空缺
Imputer
imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit(X_train)
X_train = imp.transform(X_train)
X_test = imp.transform(X_test)
数据正则化
Normalize, min_max,
http://www.tuicool.com/articles/JzMjeyi
分割数据集
cross_validation & metrics AUC
http://blog.csdn.net/u010414589/article/details/51166798
PCA进行降维
http://doc.okbase.net/u012162613/archive/120946.html
kaggle的一点经验之谈
这个很好,介绍了很多实用的模型
http://www.cnblogs.com/DjangoBlog/p/6648035.html