使用train_test_split、GridSearchCV 对load_wine 进行KNN分类

原创转载请注明出处：https://www.cnblogs.com/agilestyle/p/12678281.html

准备数据

import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

samples = load_wine()

samples

# sklearn.utils.Bunch
type(samples)

# dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])
samples.keys()

print(samples['DESCR'])

# (178, 13)
samples['data'].shape

# (178,)
samples['target'].shape

分割数据，将20%的数据作为测试集，其余作为训练集

X_train, X_test, y_train, y_test = train_test_split(samples['data'], samples['target'], test_size=0.2, random_state=2)

# (142, 13)
X_train.shape

# (36, 13)
X_test.shape

# (142,)
y_train.shape

# (36,)
y_test.shape

建模训练

# KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
#                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
#                     weights='uniform')
knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_train)

评价模型

score = knn_clf.score(X_test, y_test)

# 0.75
score

网格搜索+交叉验证

Note: A parameter grid is a dictionary with the parameter setting you would like to try.

param_grid = {
    'n_neighbors': [2, 3, 4, 5, 6, 7, 8],
    'p': [1, 2],
    'weights': ['uniform', 'distance']
}

grid_search = GridSearchCV(knn_clf, param_grid=param_grid, cv=10)
grid_search.fit(X_train, y_train)

# 0.8333333333333334
grid_search.score(X_test, y_test)

# {'n_neighbors': 2, 'p': 1, 'weights': 'distance'}
grid_search.best_params_

new_wine = np.random.randint(1, 100, (1, 13))
# (1, 13)
new_wine.shape

pred_result = grid_search.predict(new_wine)
# array([1])
pred_result

# array(['class_1'], dtype='<U7')
samples['target_names'][pred_result]

Reference

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

相关阅读:
flask Apache 部署（值得参考）
《金刚经》
gin短暂停机反向代理
operationaltransformation 算法源码分析
读研日记悲伤的一天
springboot集成Swagger
聊一聊Java8 Optional，让你的代码更加优雅
oracleora02391 sessions_per_user limit错误
oracle归档满数据库不能使用问题处理
git远程建立仓库后，将本地项目推到远程报错 fatal: refusing to merge unrelated histories

原文地址：https://www.cnblogs.com/agilestyle/p/12678281.html