• 使用Sklearn-train_test_split 划分数据集


    使用sklearn.model_selection.train_test_split可以在数据集上随机划分出一定比例的训练集和测试集

    1.使用形式为:

    1 from sklearn.model_selection import train_test_split 
    2 X_train, X_test, y_train, y_test = train_test_split(train_data,train_target,test_size=0.2, random_state=0)

    2.参数解释:

    train_data:样本特征集

    train_target:样本的标签集

    test_size:样本占比,测试集占数据集的比重,如果是整数的话就是样本的数量

    random_state:是随机数的种子。在同一份数据集上,相同的种子产生相同的结果,不同的种子产生不同的划分结果

    X_train,y_train:构成了训练集

    X_test,y_test:构成了测试集

    3.举例:

    生成一个包含100个样本的数据集,随机换分出20%为测试集

     1 #py36
     2 #!/usr/bin/env python
     3 # -*- coding: utf-8 -*-
     4 
     5 #from sklearn.cross_validation import train_test_split
     6 from sklearn.model_selection import train_test_split 
     7 
     8 # 生成100条数据:100个2维的特征向量,对应100个标签
     9 X = [["feature ","one "]] * 50 + [["feature ","two "]] * 50
    10 y = [1] * 50 + [2] * 50
    11 
    12 # 随机抽取20%的测试集
    13 X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=1)
    14 print ("train:",len(X_train), "test:",len(X_test))
    15 
    16 # 查看被划分出的测试集
    17 for i in range(len(X_test)):
    18     print ("".join(X_test[i]), y_test[i])
    19 
    20 '''
    21 train: 80 test: 20
    22 feature two  2
    23 feature two  2
    24 feature one  1
    25 feature two  2
    26 feature two  2
    27 feature one  1
    28 feature one  1
    29 feature two  2
    30 feature two  2
    31 feature two  2
    32 feature two  2
    33 feature one  1
    34 feature two  2
    35 feature two  2
    36 feature two  2
    37 feature one  1
    38 feature one  1
    39 feature one  1
    40 feature two  2
    41 feature one  1
    42 '''
  • 相关阅读:
    Pycharm创建Django项目示例
    Window下MyCat的下载与安装
    Python中使用xlrd、xlwt、xlutils读写Excel文件
    循环队列(Java实现)
    oracle 创建表
    win10 删除文件卡在99%
    python xx005文件操作
    python xx004集合
    python xx003字典
    不理解
  • 原文地址:https://www.cnblogs.com/cnXuYang/p/8342364.html
Copyright © 2020-2023  润新知