training set, validation set, test set的区别

training set：用来训练模型
validation set : 用来做model selection
test set : 用来评估所选出来的model的实际性能

我们知道，在做模型训练之前，我们必须选择所训练的模型的形式：线性模型(y = wx+b)或者非线性模型(SVM,decision tree,neural network….)。选择好模型之后，我们才会开始训练，训练的目标是确定模型的参数，训练一般是通过设计损失函数，然后对损失函数进行优化来完成训练。

而很多时候我们并不知道哪种模型适合，所以往往我们需要对多种模型进行训练，训练完之后就会得到多个模型的结果，我们希望从这些训练好的模型中选择最适合的模型。我们通过用validation set对所有模型进行测试，然后选出error rate最小的那个模型。

所以说valaidation set主要是用来选择模型的。

The main trick here is to 'hold out' a portion of our data from training and use the models performance on that sub-set of the data as a proxy for the true risk.

This data is known as 'validation' data. It contrasts with test data, because it's values are known at the model design time. However, in contrast to test data we don't use it to fit our model.

This means that it doesn't exhibit the same bias that the empirical risk does when estimating the true risk.

相关阅读:
Entropy
MonkeyEatsPeach
python中使用可选参数
java中二元数组的构建
静态语言和动态语言
开胃菜
python 工具箱
python处理多层嵌套列表
小球落体
LoadRunner：Error 27796

原文地址：https://www.cnblogs.com/focusonoutput/p/12208102.html