感知机(转)

感知机(转)
原文：http://www.hankcs.com/ml/the-perceptron.html

《统计学习方法》系列笔记的第一篇，对应原著第二章。大量引用原著讲解，加入了自己的理解。对书中算法采用Python实现，并用Matplotlib可视化了动画出来，应该算是很硬派了。一套干货下来，很是辛苦，要是能坚持下去就好。

概念

感知机是二分类模型，输入实例的特征向量，输出实例的±类别。

感知机模型

定义

假设输入空间是，输出空间是，x和y分属这两个空间，那么由输入空间到输出空间的如下函数：

称为感知机。其中，w和b称为感知机模型参数，叫做权值或权值向量，叫做偏置，w·x表示向量w和x的内积。sign是一个函数：

感知机的几何解释是，线性方程

将特征空间划分为正负两个部分：

这个平面（2维时退化为直线）称为分离超平面。

感知机学习策略

数据集的线性可分性

定义

给定数据集

其中如果存在某个超平面S

能够完全正确地将正负实例点全部分割开来，则称T线性可分，否则称T线性不可分。

感知机学习策略

假定数据集线性可分，我们希望找到一个合理的损失函数。

一个朴素的想法是采用误分类点的总数，但是这样的损失函数不是参数w，b的连续可导函数，不可导自然不能把握函数的变化，也就不易优化（不知道什么时候该终止训练，或终止的时机不是最优的）。

另一个想法是选择所有误分类点到超平面S的总距离。为此，先定义点x₀到平面S的距离：

分母是w的L₂范数，所谓L₂范数，指的是向量各元素的平方和然后求平方根（长度）。这个式子很好理解，回忆中学学过的点到平面的距离：

此处的点到超平面S的距离的几何意义就是上述距离在多维空间的推广。

又因为，如果点i被误分类，一定有

成立，所以我们去掉了绝对值符号，得到误分类点到超平面S的距离公式：

假设所有误分类点构成集合M，那么所有误分类点到超平面S的总距离为

分母作用不大，反正一定是正的，不考虑分母，就得到了感知机学习的损失函数：

感知机学习算法

原始形式

感知机学习算法是对以下最优化问题的算法：

感知机学习算法是误分类驱动的，先随机选取一个超平面，然后用梯度下降法不断极小化上述损失函数。损失函数的梯度由：

给出。所谓梯度，是一个向量，指向的是标量场增长最快的方向，长度是最大变化率。所谓标量场，指的是空间中任意一个点的属性都可以用一个标量表示的场（个人理解该标量为函数的输出）。

随机选一个误分类点i，对参数w，b进行更新：

上式是学习率。损失函数的参数加上梯度上升的反方向，于是就梯度下降了。所以，上述迭代可以使损失函数不断减小，直到为0。于是得到了原始形式的感知机学习算法：

对于此算法，使用下面的例子作为测试数据：

给出Python实现和可视化代码如下：

感知机算法代码

终于到了最激动人心的时刻了，有了上述知识，就可以完美地可视化这个简单的算法：
1. # -*- coding:utf-8 -*-
2. # Filename: train2.1.py
3. # Author：hankcs
4. # Date: 2015/1/30 16:29
5. import copy
6. from matplotlib import pyplot as plt
7. from matplotlib import animation
9. training_set = [[(3, 3), 1], [(4, 3), 1], [(1, 1), -1]]
10. w = [0, 0]
11. b = 0
12. history = []
15. def update(item):
16. """
17. update parameters using stochastic gradient descent
18. :param item: an item which is classified into wrong class
19. :return: nothing
20. """
21. global w, b, history
22. w[0] += 1 * item[1] * item[0][0]
23. w[1] += 1 * item[1] * item[0][1]
24. b += 1 * item[1]
25. print w, b
26. history.append([copy.copy(w), b])
27. # you can uncomment this line to check the process of stochastic gradient descent
30. def cal(item):
31. """
32. calculate the functional distance between 'item' an the dicision surface. output yi(w*xi+b).
33. :param item:
34. :return:
35. """
36. res = 0
37. for i in range(len(item[0])):
38. res += item[0][i] * w[i]
39. res += b
40. res *= item[1]
41. return res
44. def check():
45. """
46. check if the hyperplane can classify the examples correctly
47. :return: true if it can
48. """
49. flag = False
50. for item in training_set:
51. if cal(item) <= 0:
52. flag = True
53. update(item)
54. # draw a graph to show the process
55. if not flag:
56. print "RESULT: w: " + str(w) + " b: " + str(b)
57. return flag
60. if __name__ == "__main__":
61. for i in range(1000):
62. if not check(): break
64. # first set up the figure, the axis, and the plot element we want to animate
65. fig = plt.figure()
66. ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
67. line, = ax.plot([], [], 'g', lw=2)
68. label = ax.text([], [], '')
70. # initialization function: plot the background of each frame
71. def init():
72. line.set_data([], [])
73. x, y, x_, y_ = [], [], [], []
74. for p in training_set:
75. if p[1] > 0:
76. x.append(p[0][0])
77. y.append(p[0][1])
78. else:
79. x_.append(p[0][0])
80. y_.append(p[0][1])
82. plt.plot(x, y, 'bo', x_, y_, 'rx')
83. plt.axis([-6, 6, -6, 6])
84. plt.grid(True)
85. plt.xlabel('x')
86. plt.ylabel('y')
87. plt.title('Perceptron Algorithm (www.hankcs.com)')
88. return line, label
91. # animation function. this is called sequentially
92. def animate(i):
93. global history, ax, line, label
95. w = history[i][0]
96. b = history[i][1]
97. if w[1] == 0: return line, label
98. x1 = -7
99. y1 = -(b + w[0] * x1) / w[1]
100. x2 = 7
101. y2 = -(b + w[0] * x2) / w[1]
102. line.set_data([x1, x2], [y1, y2])
103. x1 = 0
104. y1 = -(b + w[0] * x1) / w[1]
105. label.set_text(history[i])
106. label.set_position([x1, y1])
107. return line, label
109. # call the animator. blit=true means only re-draw the parts that have changed.
110. print history
111. anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
112. blit=True)
113. plt.show()
114. anim.save('perceptron.gif', fps=2, writer='imagemagick')
可视化

可见超平面被误分类点所吸引，朝着它移动，使得两者距离逐步减小，直到正确分类为止。通过这个动画，是不是对感知机的梯度下降算法有了更直观的感悟呢？

算法的收敛性

记输入向量加进常数1的拓充形式，其最大长度为，记感知机的参数向量，设满足条件的超平面可以将数据集完全正确地分类，定义最小值伽马：

则误分类次数k满足：

证明请参考《统计学习方法》P31。

感知机学习算法的对偶形式

对偶指的是，将w和b表示为测试数据i的线性组合形式，通过求解系数得到w和b。具体说来，如果对误分类点i逐步修改wb修改了n次，则w，b关于i的增量分别为，这里，则最终求解到的参数分别表示为：

于是有算法2.2：

感知机对偶算法代码

涉及到比较多的矩阵计算，于是用NumPy比较多：
1. # -*- coding:utf-8 -*-
2. # Filename: train2.2.py
3. # Author：hankcs
4. # Date: 2015/1/31 15:15
5. import numpy as np
6. from matplotlib import pyplot as plt
7. from matplotlib import animation
9. # An example in that book, the training set and parameters' sizes are fixed
10. training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1]])
12. a = np.zeros(len(training_set), np.float)
13. b = 0.0
14. Gram = None
15. y = np.array(training_set[:, 1])
16. x = np.empty((len(training_set), 2), np.float)
17. for i in range(len(training_set)):
18. x[i] = training_set[i][0]
19. history = []
21. def cal_gram():
22. """
23. calculate the Gram matrix
24. :return:
25. """
26. g = np.empty((len(training_set), len(training_set)), np.int)
27. for i in range(len(training_set)):
28. for j in range(len(training_set)):
29. g[i][j] = np.dot(training_set[i][0], training_set[j][0])
30. return g
33. def update(i):
34. """
35. update parameters using stochastic gradient descent
36. :param i:
37. :return:
38. """
39. global a, b
40. a[i] += 1
41. b = b + y[i]
42. history.append([np.dot(a * y, x), b])
43. # print a, b # you can uncomment this line to check the process of stochastic gradient descent
46. # calculate the judge condition
47. def cal(i):
48. global a, b, x, y
50. res = np.dot(a * y, Gram[i])
51. res = (res + b) * y[i]
52. return res
55. # check if the hyperplane can classify the examples correctly
56. def check():
57. global a, b, x, y
58. flag = False
59. for i in range(len(training_set)):
60. if cal(i) <= 0:
61. flag = True
62. update(i)
63. if not flag:
65. w = np.dot(a * y, x)
66. print "RESULT: w: " + str(w) + " b: " + str(b)
67. return False
68. return True
71. if __name__ == "__main__":
72. Gram = cal_gram() # initialize the Gram matrix
73. for i in range(1000):
74. if not check(): break
76. # draw an animation to show how it works, the data comes from history
77. # first set up the figure, the axis, and the plot element we want to animate
78. fig = plt.figure()
79. ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
80. line, = ax.plot([], [], 'g', lw=2)
81. label = ax.text([], [], '')
83. # initialization function: plot the background of each frame
84. def init():
85. line.set_data([], [])
86. x, y, x_, y_ = [], [], [], []
87. for p in training_set:
88. if p[1] > 0:
89. x.append(p[0][0])
90. y.append(p[0][1])
91. else:
92. x_.append(p[0][0])
93. y_.append(p[0][1])
95. plt.plot(x, y, 'bo', x_, y_, 'rx')
96. plt.axis([-6, 6, -6, 6])
97. plt.grid(True)
98. plt.xlabel('x')
99. plt.ylabel('y')
100. plt.title('Perceptron Algorithm 2 (www.hankcs.com)')
101. return line, label
104. # animation function. this is called sequentially
105. def animate(i):
106. global history, ax, line, label
108. w = history[i][0]
109. b = history[i][1]
110. if w[1] == 0: return line, label
111. x1 = -7.0
112. y1 = -(b + w[0] * x1) / w[1]
113. x2 = 7.0
114. y2 = -(b + w[0] * x2) / w[1]
115. line.set_data([x1, x2], [y1, y2])
116. x1 = 0.0
117. y1 = -(b + w[0] * x1) / w[1]
118. label.set_text(str(history[i][0]) + ' ' + str(b))
119. label.set_position([x1, y1])
120. return line, label
122. # call the animator. blit=true means only re-draw the parts that have changed.
123. anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
124. blit=True)
125. plt.show()
126. # anim.save('perceptron2.gif', fps=2, writer='imagemagick')
可视化

与算法1的结果相同，我们也可以将数据集改一下：
1. training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1], [[5, 2], -1]])
会得到一个复杂一些的结果：

读后感

通过最简单的模型，学习到ML中的常用概念和常见流程。

另外本文只是个人笔记，服务于个人备忘用，对质量和后续不做保证。还是那句话，博客只做补充，要入门，还是得看经典著作。

Reference

文中部分代码参考了OldPanda的实现。
相关阅读:
POJ 1703 Find them, Catch them
POJ 2236 Wireless Network
POJ 2010 Moo University
POJ 2184 Cow Exhibition
POJ 3280 Cheapest Palindrome
POJ 3009 Curling 2.0
POJ 3669 Meteor Shower
POJ 2718 Smallest Difference
POJ 3187 Backward Digit Sums
POJ 3050 Hopscotch
原文地址：https://www.cnblogs.com/wyuzl/p/7640011.html

概念

定义

感知机学习策略

定义

感知机学习算法

原始形式

感知机算法代码

可视化

算法的收敛性

感知机学习算法的对偶形式

感知机对偶算法代码

可视化

读后感

Reference