梯度下降法
前提:固定学习率,两个函数和三个变量组成
函数1:待优化函数f(x);
函数2:待优化函数f(x)的导数g(x);
变量x:函数中的变量,优化过程中会不断变化,直到它找到最小值;
变量grad:变量x点处的梯度值;
变量step:表示沿着梯度下降方法行进的步长,也被称为学习率(LearningRate),优化过程中固定不变。
梯度下降(Gradient Descent)代码:
1 #coding:utf-8 2 import numpy as np 3 import matplotlib.pyplot as plt 4 #定义梯度下降法 5 def gd(x_start,step,g): 6 x=x_start 7 for i in range(20): 8 grad=g(x) 9 x-=grad*step #x=x-grad*step 10 print('[epoch {0} ] grad={1}, x={2}'.format(i,grad, x)) 11 if abs(grad) < 1e-6: 12 break; 13 return x 14 #解释 15 #定义待优化函数 16 def f(x): 17 return x*x-2*x+1 18 #定义待优化函数的导数 19 def g(x): 20 return 2*x-2 21 #可视化待优化函数 22 x=np.linspace(-5,7,100) 23 y=f(x) 24 plt.plot(x,y) 25 plt.show() 26 #实现梯度下降法 27 gd(5,0.1,g) 28 29 #print('{0} {1} {2}'.format('epoch', 'grad','x')) 30 #epoch grad x
运行结果:
分析:从抛物线看出当x=1是函数的最小值点。
[epoch 0 ] grad=8, x=4.2 #epoch0, grad=2*5-2=8, x=5-grad*0.1=4.2 [epoch 1 ] grad=6.4, x=3.56 #epoch1, grad=2*4.2-2=6.4, x=4.2-grad*0.1=3.56 [epoch 2 ] grad=5.12, x=3.048 [epoch 3 ] grad=4.096, x=2.6384 [epoch 4 ] grad=3.2767999999999997, x=2.31072 [epoch 5 ] grad=2.6214399999999998, x=2.0485759999999997 [epoch 6 ] grad=2.0971519999999995, x=1.8388607999999997 [epoch 7 ] grad=1.6777215999999995, x=1.6710886399999998 [epoch 8 ] grad=1.3421772799999996, x=1.536870912 [epoch 9 ] grad=1.0737418239999998, x=1.4294967295999998 [epoch 10 ] grad=0.8589934591999997, x=1.34359738368 [epoch 11 ] grad=0.6871947673599998, x=1.274877906944 [epoch 12 ] grad=0.5497558138879999, x=1.2199023255552 [epoch 13 ] grad=0.4398046511103999, x=1.17592186044416 [epoch 14 ] grad=0.35184372088831983, x=1.1407374883553278 [epoch 15 ] grad=0.2814749767106557, x=1.1125899906842622 [epoch 16 ] grad=0.22517998136852446, x=1.0900719925474098 [epoch 17 ] grad=0.18014398509481966, x=1.0720575940379278 [epoch 18 ] grad=0.14411518807585555, x=1.0576460752303423 [epoch 19 ] grad=0.11529215046068453, x=1.0461168601842739
分析:可以看到,初始值x从5出发,梯度值在不断下降,经过20轮迭代,x虽然没有完全等于1,但是在迭代中它不断地逼近最优值x=1。