LightBGM之Dataset

最近使用了LightBGM的Dataset，记录一下：

1.说明：　　classlightgbm.Dataset(data, label=None, reference=None, weight=None, group=None, init_score=None, silent=False, feature_name='auto', categorical_feature='auto', params=None, free_raw_data=True)

Bases: object

Dataset in LightGBM.

Constract Dataset.

Parameters:

Parameters:	data (string, numpy array, pandas DataFrame, scipy.sparse or list of numpy arrays) – Data source of Dataset. If string, it represents the path to txt file. label (list, numpy 1-D array, pandas one-column DataFrame/Series or None, optional (default=None)) – Label of the data. reference (Dataset or None, optional (default=None)) – If this is Dataset for validation, training data should be used as reference. weight (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Weight for each instance. group (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Group/query size for Dataset. init_score (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Init score for Dataset. silent (bool, optional (default=False)) – Whether to print messages during construction. feature_name (list of strings or 'auto', optional (default="auto")) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used. categorical_feature (list of strings or int, or 'auto', optional (default="auto")) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify `feature_name` as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). All negative values in categorical features will be treated as missing values. params (dict or None, optional (default=None)) – Other parameters. free_raw_data (bool, optional (default=True)) – If True, raw data is freed after constructing inner Dataset.

data (string, numpy array, pandas DataFrame, scipy.sparse or list of numpy arrays) – Data source of Dataset. If string, it represents the path to txt file.
label (list, numpy 1-D array, pandas one-column DataFrame/Series or None, optional (default=None)) – Label of the data.
reference (Dataset or None, optional (default=None)) – If this is Dataset for validation, training data should be used as reference.
weight (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Weight for each instance.
group (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Group/query size for Dataset.
init_score (list, numpy 1-D array, pandas Series or None, optional (default=None)) – Init score for Dataset.
silent (bool, optional (default=False)) – Whether to print messages during construction.
feature_name (list of strings or 'auto', optional (default="auto")) – Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
categorical_feature (list of strings or int, or 'auto', optional (default="auto")) – Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify feature_name as well). If ‘auto’ and data is pandas DataFrame, pandas categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). All negative values in categorical features will be treated as missing values.
params (dict or None, optional (default=None)) – Other parameters.
free_raw_data (bool, optional (default=True)) – If True, raw data is freed after constructing inner Dataset.

　　输出是一个dataset对象

2.使用：

　　根据说明使用自己的数据，我这里data和label都用了DataFrame格式的

相关阅读:
node中glob模块总结
 HTTP中分块编码（Transfer-Encoding: chunked）
随笔记录--RegExp类型
 Innodb 表空间传输迁移数据
 千金良方说："我现在奉上179341字的MySQL资料包，还来得及吗？有"代码段、附录、和高清图！！"
一不小心，我就上传了 279674 字的 MySQL 学习资料到 github 上了
 MySQL InnoDB Update和Crash Recovery流程
 mysqldump与innobackupex备份过程你知多少
 MySQL 各种超时参数的含义
 mha安装使用手册
原文地址：https://www.cnblogs.com/demo-deng/p/9613259.html