# class_weight的传参 class_weight : {dict, 'balanced'}, optional Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies as ``n_samples / (n_classes * np.bincount(y))`` # 当使用字典时,其形式为:Weights associated with classes in the form ``{class_label: weight}``,比如:{0: 1, 1: 1}表示类0的权值为1,类1的权值为1. # sample_weight的传参 sample_weight : array-like, shape (n_samples,) Per-sample weights. Rescale C per sample. Higher weights force the classifier to put more emphasis on these points.
1. 在:from sklearn.utils.class_weight import compute_class_weight 里面可以看到计算的源代码。
2. 除了通过字典形式传入权重参数,还可以设置的是:class_weight = 'balanced',例如使用SVM分类:
clf = SVC(kernel = 'linear', class_weight='balanced', decision_function_shape='ovr') clf.fit(X_train, y_train)
3. 那么'balanced'的计算方法是什么呢?看例子:
import numpy as np y = [0,0,0,0,0,0,0,0,1,1,1,1,1,1,2,2] #标签值,一共16个样本 a = np.bincount(y) # array([8, 6, 2], dtype=int64) 计算每个类别的样本数量 aa = 1/a #倒数 array([0.125 , 0.16666667, 0.5 ]) print(aa) from sklearn.utils.class_weight import compute_class_weight class_weight = 'balanced' classes = np.array([0, 1, 2]) #标签类别 weight = compute_class_weight(class_weight, classes, y) print(weight) # [0.66666667 0.88888889 2.66666667] print(0.66666667*8) #5.33333336 print(0.88888889*6) #5.33333334 print(2.66666667*2) #5.33333334 # 这三个值非常接近 # 'balanced'计算出来的结果很均衡,使得惩罚项和样本量对应
4. 真正的魔法到了:还记得上面所给出的python中,当class_weight为'balanced'时的计算公式吗?
# weight_ = n_samples / (n_classes * np.bincount(y))``
# 这里
# n_samples为16
# n_classes为3
# np.bincount(y)实际上就是每个类别的样本数量
print(16/(3*8)) #输出 0.6666666666666666 print(16/(3*6)) #输出 0.8888888888888888 print(16/(3*2)) #输出 2.6666666666666665
5. 当然,需要说明一下传入字典时的情形
import numpy as np y = [0,0,0,0,0,0,0,0,1,1,1,1,1,1,2,2] #标签值,一共16个样本 from sklearn.utils.class_weight import compute_class_weight class_weight = {0:1,1:3,2:5} # {class_label_1:weight_1, class_label_2:weight_2, class_label_3:weight_3} classes = np.array([0, 1, 2]) #标签类别 weight = compute_class_weight(class_weight, classes, y) print(weight) # 输出:[1. 3. 5.],也就是字典中设置的值