xgboost学习

xgboost学习
1、原理

https://www.cnblogs.com/zhouxiaohui888/p/6008368.html

2、实战

xgboost中比较重要的参数介绍：

（1）学习率：learning rate ：一般设置比较低，0.1以下

（2）tree：

max_depth

min_child_weight

subsample

colsample_bytree

gamma

（3）正则化参数

lambda

alpha

（1）objective [ default=reg:linear ] 定义学习任务及相应的学习目标，可选的目标函数如下：
- “reg:linear” –线性回归。
- “reg:logistic” –逻辑回归。
- “binary:logistic” –二分类的逻辑回归问题，输出为概率。
- “binary:logitraw” –二分类的逻辑回归问题，输出的结果为wTx。
- “count:poisson” –计数问题的poisson回归，输出结果为poisson分布。在poisson回归中，max_delta_step的缺省值为0.7。(used to safeguard optimization)
- “multi:softmax” –让XGBoost采用softmax目标函数处理多分类问题，同时需要设置参数num_class（类别个数）
- “multi:softprob” –和softmax一样，但是输出的是ndata * nclass的向量，可以将该向量reshape成ndata行nclass列的矩阵。没行数据表示样本所属于每个类别的概率。
- “rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss
（2）’eval_metric’ The choices are listed below，评估指标:
- “rmse”: root mean square error
- “logloss”: negative log-likelihood
- “error”: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
- “merror”: Multiclass classification error rate. It is calculated as #(wrong cases)/#(all cases).
- “mlogloss”: Multiclass logloss
- “auc”: Area under the curve for ranking evaluation.
- “ndcg”:Normalized Discounted Cumulative Gain
- “map”:Mean average precision
- “ndcg@n”,”map@n”: n can be assigned as an integer to cut off the top positions in the lists for evaluation.
- “ndcg-“,”map-“,”ndcg@n-“,”map@n-“: In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding “-” in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions.
（3）lambda [default=0] L2 正则的惩罚系数

（4）alpha [default=0] L1 正则的惩罚系数

（5）lambda_bias 在偏置上的L2正则。缺省值为0（在L1上没有偏置项的正则，因为L1时偏置不重要）

（6）eta [default=0.3]
为了防止过拟合，更新过程中用到的收缩步长。在每次提升计算之后，算法会直接获得新特征的权重。 eta通过缩减特征的权重使提升计算过程更加保守。缺省值为0.3
取值范围为：[0,1]

（7）max_depth [default=6] 数的最大深度。缺省值为6 ，取值范围为：[1,∞]

（8）min_child_weight [default=1]
孩子节点中最小的样本权重和。如果一个叶子节点的样本权重和小于min_child_weight则拆分过程结束。在现行回归模型中，这个参数是指建立每个模型所需要的最小样本数。该成熟越大算法越conservative
取值范围为: [0,∞]
```
xgb1=XGBClassifier(
learning_rate=0.1,
n_estimators=1000,
max_depth=5,
min_child_weight=1,
gamma=0,
subsample=0.8
colsample_bytree=0.8,
objective='binary:logistic',
nthread=4,
scale_pos_weight=1,
seed=27)
```
3、xgboost重要模块：plot_importance【显示特征的重要性】
```
from xgboost import XGBClassifier
from xgboost import plot_importance
from matplotlib import pyplot

model=XGBClassifier()
model.fit(X,Y)
plot_importance(model)
pyplot.show()
#图中就可以显示出各种特征的重要性
```
相关阅读:
eureka 注册中心(单机版)
金蝶实际成本培训01
查看WIN10内核
 金蝶K3 WISE 15.0 GUID
win10卸载系统自带office365
金蝶K3wise15.0BOM维护默认只能查看登录账户作为建立人的BOM清单
 阿里云邮箱代收邮件
 金蝶寄售业务流程
 转-商品流通企业代销商品核算方法
 转-ERP待检仓、代管仓、赠品仓
原文地址：https://www.cnblogs.com/Lee-yl/p/9248664.html

1、原理

2、实战

xgboost中比较重要的参数介绍：

3、xgboost重要模块：plot_importance【显示特征的重要性】