高级集成学习技巧

高级集成学习技巧
Examined ensemble methods
- Averaging (or blending)
- Weighted averaging
- Conditional averaging
- Bagging
- Boosting
- Stacking
- StackNet
Averaging ensemble methods

举个例子，假设我们有一个名为age的变量，就像年龄一样，我们试着预测它。我们有两个模型：
- 低于50，模型效果更好
- 高于50，模型效果更好
那么如果我们试图结合它们将会发生什么呢？

Averaging(or blending)
- (model1 + model2) / 2
(R^2)上升到0.95，较之前有所改善。但该模型并没有比单模型做的好的地方更好，尽管如此，它平均表现更好。也许可能会有更好的组合呢？来试试加权平均

Weighted averaging
- (model1 x 0.7 + model 2 x 0.3)
看起来没有之前的好

Conditional averaging
- 各取好的部分
理想情况下，我们希望得到类似的结果

Bagging

Why Bagging

建模中有两个主要误差来源
- 1.由于偏差而存在误差（underfitting）
- 2.由于方差而存在误差（overfitting）
通过略微不同的模型，确保预测不会有读取非常高的方差。这通常使它更具普遍性。

Parameters that control bagging?
- Changing the seed
- Row(Sub) sampling or Bootstrapping
- Shuffling
- Column(Sub) sampling
- Model-specific parameters
- Number of models (or bags)
- (Optionally) parallelism
Examples of bagging

Boosting

Boosting是对每个模型构建的模型进行加权平均的一种形式，顺序地考虑以前的模型性能。

Weight based boosting

假设我们有一个表格数据集，有四个特征。我们称它们为x0，x1，x2和x3，我们希望使用这些功能来预测目标变量y。
我们将预测值称为pred，这些预测有一定的误差。我们可以计算这些绝对误差，|y - pred|。我们可以基于此生成一个新列或向量，在这里我们创建一个权重列，使用1加上绝对误差。当然有不同的方法来计算这个权重，现在我们只是以此为例。

所有接下来要做的是用这些特征去拟合新的模型，但每次也要增加这个权重列。这就是按顺序添加模型的方法。

Weight based boosting parameters
- Learning rate (or shrinkage or eta)
- 每个模型只相信一点点：predictionN = pred0*eta + pred1*eta + ... + predN*eta
- Number of estimators
- estimators扩大一倍，eta减小一倍
- Input model - can be anything that accepts weights
- Sub boosting type:
- AdaBoost-Good implementation in sklearn(python)
- LogitBoost-Good implementation in Weka(Java)
Residual based boosting [&]

我们使用同样的数据集做相同的事。预测出pred后

接下来会计算误差

将error作为新的y得到新的预测new_pred

以Rownum=1为例：

最终预测=0.75 + 0.20 = 0.95更接近于1

这种方法很有效，可以很好的减小误差。

Residual based boosting parameters
- Learning rate (or shrinkage or eta)
- predictionN = pred0 + pred1*eta + ... + predN*eta
- 前面的例子，如果eta为0.1，则Prediction=0.75 + 0.2*(0.1) = 0.77
- Number of estimators
- Row (sub)sampling
- Column (sub)sampling
- Input model - better be trees.
- Sub boosting type:
- Full gradient based
- Dart
Residual based favourite implementations
- Xgboost
- Lightgbm
- H2O's GBM
- Catboost
- Sklearn's GBM
Stacking

Methodology
- Wolpert in 1992 introduced stacking. It involves:
- 1. Splitting the train set into two disjoint sets.
- 1. Train several base learners on the first part.
- 1. Make predictions with the base learners on the second (validation) part.
具体步骤

假设有A,B,C三个数据集，其中A,B的目标变量y已知。

然后
- 算法0拟合A，预测B和C，然后保存pred0到B1,C1
- 算法1拟合A，预测B和C，然后保存pred1到B1,C1
- 算法2拟合A，预测B和C，然后保存pred2到B1,C1
- 算法3拟合B1，预测C1，得到最终结果preds3
Stacking example
```
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
import numpy as np
from sklearn.model_selection import train_test_split
train = '' # your training set
y = ''     # your target variable
# split train data in 2 part, training and valdiation.
training, valid, ytraining, yvalid = train_test_split(train, y, test_size=0.5)
# specify models
model1 = RandomForestRegressor()
model2 = LinearRegression()
#fit models
model1.fit(training, ytraining)
model2.fit(trainging, ytraining)
# make predictions for validation
preds1 = model1.predict(valid)
preds2 = model2.predict(valid)
# make predictions for test data
test_preds1 = model1.predict(test)
test_preds2 = model2.predict(test)
# From a new dataset for valid and test via stacking the predictions
stacked_predictions = np.colum_stack((preds1, preds2))
stacked_test_predictions = np.column_stack((test_preds1, test_preds2))
# specify meta model
meta_model = LinearRegression()
meta_model.fit(stacked_predictions, yvalid)
# make predictions on the stacked predictions of the test data
final_predictions = meta_model.predict(stacked_test_predictions)
```
Stacking(past) example

可以看到，它与我们使用Conditional averaging的结果非常近似。只是在50附件做的不够好，这是有道理的，因为模型没有见到目标变量，无法准确识别出50这个缺口。所以它只是尝试根据模型的输入来确定。

Things to be mindful of
- With time sensitive data - respect time
- 如果你的数据带有时间元素，你需要指定你的stacking，以便尊重时间。
- Diversity as important as performance
- 单一模型表现很重要，但模型的多样性也非常重要。当模型是坏的或弱的情况，你不需太担心，stacking实际上可以从每个预测中提取到精华，得到好的结果。因此，你真正需要关注的是，我正在制作的模型能给我带来哪些信息，即使它通常很弱。
- Diversity may come from:
- Different algorithms
- Different input features
- Performance plateauing after N models
- Meta model is normally modest
StackNet

https://github.com/kaz-Anova/StackNet

Ensembling Tips and Tricks

(1^{st}) level tips
- Diversity based on algorithms:
- 2-3 gradient boosted trees (lightgbm, xgboost, H2O, catboost)
- 2-3 Neural nets (keras, pytorch)
- 1-2 ExtraTrees/RandomForest (sklearn)
- 1-2 linear models as in logistic/ridge regression, linear svm (sklearn)
- 1-2 knn models (sklearn)
- 1 Factorization machine (libfm)
- 1 svm with nonlinear kernel(like RBF) if size/memory allows (sklearn)
- Diversity based on input data:
- Categorical features: One hot, label encoding, target encoding, likelihood encoding, frequency or counts
- Numerical features: outliers, binning, derivatives, percentiles, scaling
- Interactions: col1*/+-col2, groupby, unsupervised
(2^{st}) level tips
- Simpler (or shallower) Algorithms:
- gradient boosted trees with small depth(like 2 or 3)
- Linear models with high regularization
- Extra Trees (just don't make them too big)
- Shallow networks (as in 1 hidden layer, with not that many hidden neurons)
- knn with BrayCurtis Distance
- Brute forcing a search for best linear weights based on cv
- Feature engineering:
- pairwise differences between meta features
- row-wise statistics like averages or stds
- Standard feature selection techniques
- For every 7.5 models in previous level we add 1 in meta (经验)
- Be mindful to target leakage
Additional materials
相关阅读:
解决点击状态栏时ScrollView自动滚动到初始位置失效办法
 如何设计用户、角色、权限表
 Subject的功能
 shiro授权的源码分析
 shiro之认证源码分析
 shiro配置
 JSONArray转JSONObject
parameterType
MyBatis：Parameter Maps collection does not contain value for 的问题解决
 mybatis报ORA-00911: 无效字符
原文地址：https://www.cnblogs.com/ishero/p/11136376.html

高级集成学习技巧

Examined ensemble methods

Averaging ensemble methods

Bagging

Why Bagging

Parameters that control bagging?

Examples of bagging

Boosting

Weight based boosting

Weight based boosting parameters

Residual based boosting [&]

Residual based boosting parameters

Residual based favourite implementations

Stacking

Methodology

具体步骤

Stacking example

Stacking(past) example

Things to be mindful of

StackNet

Ensembling Tips and Tricks

(1^{st}) level tips

(2^{st}) level tips

Additional materials