to_categorical

构建测试target数据：

from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
y = y + 1
y

输出：

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

对于神经网络，训练前对y进行独热编码：

from tensorflow.python.keras.utils.np_utils import to_categorical
y_encoder = to_categorical(y, 4)
y_encoder[:3]

输出：

array([[0., 1., 0., 0.],
       [0., 1., 0., 0.],
       [0., 1., 0., 0.]], dtype=float32)

可以看出to_categorical 就是先初始化一个n_classes长度的数组，其值全部为0，在将label值当成索引，对应位置置为1，比如一个数据集中标签列最大的数字为5，
有一个标签的值为1，它的转换过程为：

初始化一个长度为 5 + 1 的数组，[0,0,0,0,0,0]
标签值为1，把1当成索引置，对应位置置为1的到[0,1,0,0,0,0]

如果最大数字为3，但是数据中仅有3个不同值，设置n_classs为3可以么？

y = [1,2,3]
to_categorical(y, 3)

输出：

IndexError: index 3 is out of bounds for axis 1 with size 3

to_categorical 方法的num_classes参数一定要设置成比参数y中的最大值还要大才可以，如果大于最大值，还是只有label作为索引位置有值，其他都为0。

原文地址：https://www.cnblogs.com/oaks/p/14230373.html