当特征的数量很多的时候,逻辑回归在分类问题上就难以发挥了,这时需要运用神经网络。
神经网络的起源
Origins: Algorithms that try to mimic the brain.
Was very widely used in 80s ans early 90s; popularity diminished in late 90s.
Recent resurgence: State-of-the-art technique for many applications
起源:试图模仿大脑的算法。
在80年代和90年代初被广泛使用; 人气在90年代末减少了。
最近的复兴:适用于许多应用的最先进技术(最近复兴的原因之一是计算机的计算能力提升了很多)。
简单的神经元模型:逻辑单元
可以认为中间的绿色神经元做了这样的操作
1,将输入值转化为向量x
[x = left[ {egin{array}{*{20}{c}}
{egin{array}{*{20}{c}}
{{x_0}}\
{{x_1}}
end{array}}\
{{x_2}}\
{{x_3}}
end{array}}
ight]]
2,将x与θ相乘后,做sigmoid函数处理并输出
[ heta = left[ {egin{array}{*{20}{c}}
{egin{array}{*{20}{c}}
{{ heta _0}}\
{{ heta _1}}
end{array}}\
{{ heta _2}}\
{{ heta _3}}
end{array}}
ight]]
[{h_ heta }left( x ight) = frac{1}{{1 + {e^{ - { heta ^T}x}}}}]
注意:这里没有画出x0,它的值总等于1,通常称为“bias unit”
逻辑回归中的Sigmoid函数在这里称为“activation function”(激活函数)
[gleft( z ight) = frac{1}{{1 + {e^{ - z}}}}]
逻辑回归中的θ,在这里称为“weights”(权重)
标准的神经网络
- 1,这是一个有三层结构的神经网络,三层分别是:输入层(input layer)、隐含层(hidden layer)、输出层(output layer)
- 2,按照惯例加上“bias unit”:x0, a0。
符号:
[a_i^{left( i ight)}]
“activation” of unit i in layer j
第j层的第i个神经元的激活函数
[{Theta ^{left( j ight)}}]
matrix of wights controlling function mapping from layer j to layer j+1
控制从j层到j+1层函数映射的权重矩阵
对于上图所示的神经网络有
[egin{array}{l}
a_1^{left( 2
ight)} = gleft( {Theta _{10}^{left( 1
ight)}{x_0} + Theta _{11}^{left( 1
ight)}{x_1} + Theta _{12}^{left( 1
ight)}{x_2} + Theta _{13}^{left( 1
ight)}{x_3}}
ight)\
a_2^{left( 2
ight)} = gleft( {Theta _{20}^{left( 1
ight)}{x_0} + Theta _{21}^{left( 1
ight)}{x_1} + Theta _{22}^{left( 1
ight)}{x_2} + Theta _{23}^{left( 1
ight)}{x_3}}
ight)\
a_3^{left( 2
ight)} = gleft( {Theta _{30}^{left( 1
ight)}{x_0} + Theta _{31}^{left( 1
ight)}{x_1} + Theta _{32}^{left( 1
ight)}{x_2} + Theta _{33}^{left( 1
ight)}{x_3}}
ight)\
{h_ heta }left( x
ight) = a_1^{left( 3
ight)} = gleft( {Theta _{10}^{left( 2
ight)}{x_0} + Theta _{11}^{left( 2
ight)}{x_1} + Theta _{12}^{left( 2
ight)}{x_2} + Theta _{13}^{left( 2
ight)}{x_3}}
ight)
end{array}]
if network has sj units in layer j, sj+1 units in layer j+1, then Θ(j) will be of dimension sj+1 x (sj + 1)
如果网络中第j层有sj个神经元,第j+1层有sj+1个神经元,那么Θ(j)将是一个sj+1 x (sj + 1)的矩阵(因为要加上“bias unit”)
Forward propagation: Vectorized implementation
前向传播:矢量化实现
定义
[Z_1^{left( 2 ight)} = Theta _{10}^{left( 1 ight)}{x_0} + Theta _{11}^{left( 1 ight)}{x_1} + Theta _{12}^{left( 1 ight)}{x_2} + Theta _{13}^{left( 1 ight)}{x_3}]
则
[a_1^{left( 2 ight)} = gleft( {Z_1^{left( 2 ight)}} ight)]
同理
[egin{array}{l}
a_2^{left( 2
ight)} = gleft( {Z_2^{left( 2
ight)}}
ight)\
a_3^{left( 2
ight)} = gleft( {Z_3^{left( 2
ight)}}
ight)\
a_1^{left( 3
ight)} = gleft( {Z_1^{left( 3
ight)}}
ight)
end{array}]
定义
[x = left[ {egin{array}{*{20}{c}}
{{x_0}}\
{{x_1}}\
{{x_2}}\
{{x_3}}
end{array}}
ight] = {a^{left( 1
ight)}},{z^{left( 2
ight)}} = left[ {egin{array}{*{20}{c}}
{z_1^{left( 2
ight)}}\
{z_2^{left( 2
ight)}}\
{z_2^{left( 2
ight)}}
end{array}}
ight]]
则
[egin{array}{l}
{z^{left( 2
ight)}} = {Theta ^{left( 1
ight)}}{a^{left( 1
ight)}}\
{a^{left( 2
ight)}} = gleft( {{z^{left( 2
ight)}}}
ight)\
加入“bias unit” a_0^{left( 2
ight)} = 1\
{z^{left( 3
ight)}} = {Theta ^{left( 2
ight)}}{a^{left( 2
ight)}}\
{h_ heta }left( x
ight) = {a^{left( 3
ight)}} = gleft( {{z^{left( 3
ight)}}}
ight)
end{array}]
加深理解
如果挡住神经网路的左半部分,会发现这就是逻辑回归。只不过原来的x变成了现在的a(2)。
而a(2)又是由前面的x(1)确定
其他神经网络结构
神经网路可以由很多层,第一层为输入层,最后一层为输出层,其余为隐含层