• logistic回归学习


    logistic回归是一种分类方法,用于两分类的问题,其基本思想为:

    1. 寻找合适的假设函数,即分类函数,用来预测输入数据的结果;
    2. 构造损失函数,用来表示预测的输出结果与训练数据中实际类别之间的偏差;
    3. 最小化损失函数,从而获得最优的模型参数。

    首先来看一下sigmoid函数:

    (g(x)=frac{1}{1-e^{x}})

    它的函数图像为:

    logistic回归中的假设函数(分类函数):

    (h_{ heta }(x)=g( heta ^{T}x)=frac{1}{1+e^{- heta ^{T}x}})

    解释:

    ( heta ) —— 我们在后面要求取的参数;

    (T) —— 向量的转置,默认的向量都是列向量;

    ( heta ^{T}x) —— 列向量( heta)先转置,然后与(x)进行点乘,比如:

    (egin{bmatrix}1\ -1\ 3end{bmatrix}^{T}egin{bmatrix}1\ 1\ -1end{bmatrix} = egin{bmatrix}1 & -1 & 3end{bmatrix}egin{bmatrix}1\ 1\ -1end{bmatrix}=1 imes 1+(-1) imes1+3 imes(-1) = -3)

    logistic分类有线性边界和非线性边界两种:

    线性边界形式为:( heta_{0}+ heta_{1}x_{1}+cdots+ heta_{n}x_{n}=sum_{i=0}^{n} heta_{i}x_{i}= heta^{T}x)

    非线性边界的形式为:( heta_{0}+ heta_{1}x_{1}+ heta_{2}x_{2}+ heta_{3}x_{1}^{2}+ heta_{4}x_{2}^{2})

    在概率上计算输入(x)结果为1或者0的概率分别为:

    (P(y=1|x; heta)=h_{ heta}(x))

    (P(y=0|x; heta)=1-h_{ heta}(x))

    损失函数被定义为:(J( heta)=frac{1}{m}sum_{m}^{i=1}cost(h_{ heta}(x^{i}), y^{i}))

    其中:

    这里(m)是所有训练样本的数目;

    (cost(h_{ heta}(x), y)=left{egin{matrix} -log(h_{ heta}(x)) if y=1\ -log(1-h_{ heta}(x)) if y=0end{matrix} ight.)

    (cost)的另一种形式是:(cost(h_{ heta}(x), y)=-y imes log(h_{ heta}(x))-(1-y) imes log(1-h_{ heta}(x)))

    将(cost)代入到(J( heta))中可以得到损失函数如下:

    (J( heta)=-frac{1}{m}[sum_{m}^{i=1}y^{(i)}logh_{ heta}(x^{(i)})+(1-y^{(i)})log(1-h_{ heta}(x^{(i)}))])

    梯度法求(J( heta))的最小值

    ( heta)的更新过程如下:

     ( heta_{j}:= heta_{j}-alphafrac{partial }{partial heta_{j}}J( heta), (j=0cdots n))

    其中:(alpha)是学习步长。

    (egin{align*} frac{partial }{partial heta_{j}}J( heta) &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{h_{ heta}(x^{(i)})} frac{partial }{partial heta_{j}}h_{ heta}(x^{(i)})-(1-y^{(i)})frac{1}{1-h_{ heta}(x^{(i)})}frac{partial }{partial heta_{j}}h_{ heta}(x^{(i)}) ight )  \  &=-frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{gleft ( heta^{T}x^{(i)} ight )}-left ( 1-y^{(i)} ight )frac{1}{1-gleft ( heta^{T}x^{(i)} ight )} ight )frac{partial }{partial heta_{j}}gleft ( heta^{T}x^{(i)} ight ) \   &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}frac{1}{gleft ( heta^{T}x^{(i)} ight )}-left ( 1-y^{(i)} ight )frac{1}{1-gleft ( heta^{T}x^{(i)} ight )} ight ) gleft ( heta^{T}x^{(i)} ight ) left ( 1-gleft ( heta^{T}x^{(i)} ight ) ight ) frac{partial }{partial heta_{j}} heta^{T}x^{(i)} end{align*})

    (egin{align*} frac{partial }{partial heta_{j}}J( heta) &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)}left ( 1-gleft ( heta^{T}x^{(i)} ight ) ight )-left ( 1-y^{(i)} ight )gleft ( heta^{T}x^{(i)} ight) ight )x_{j}^{left (i  ight )} \ &= -frac{1}{m}sum_{m}^{i=1}left ( y^{(i)} -gleft ( heta^{T}x^{(i)} ight) ight )x_{j}^{left (i  ight )} \ &=-frac{1}{m}sum_{m}^{i=1}left ( y^{(i)} -h_{ heta}left ( x^{(i)} ight) ight )x_{j}^{left (i  ight )} \&=frac{1}{m}sum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i  ight )} end{align*})

    把偏导代入更新过程那么可以得到:

    ( heta_{j}:= heta_{j}-alphafrac{1}{m}sum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i  ight )})

    学习步长(alpha)通常是一个常量,然后省去(frac{1}{m}),可以得到最终的更新过程:

    ( heta_{j}:= heta_{j}-alphasum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{j}^{left (i  ight )}, left ( j=0cdots n ight ))

    向量化梯度

    训练样本用矩阵来描述就是:

    (X= egin{bmatrix} x^{(1)}\ x^{(2)}\ cdots \ x^{(m)}end{bmatrix}=egin{bmatrix} x_{0}^{(1)} & x_{1}^{(1)} & cdots & x_{n}^{(1)}\ x_{0}^{(2)} & x_{1}^{(2)} & cdots & x_{n}^{(2)}\ cdots & cdots & cdots & cdots \  x_{0}^{(m)} & x_{1}^{(m)} & cdots & x_{n}^{(m)} end{bmatrix}, Y=egin{bmatrix} y^{left ( 1 ight )}\  y^{left ( 2 ight )}\  cdots \ y^{left ( m ight )}end{bmatrix})

    参数( heta)的矩阵形式为:

    (Theta=egin{bmatrix} heta^{left ( 1 ight )}\  heta^{left ( 2 ight )}\  cdots \ heta^{left ( m ight )}end{bmatrix})

    先计算(Xcdot Theta),并记结果为(A):

    (A=XcdotTheta),其实就是矩阵的乘法

    再来求取向量版的误差(E):

    (E=h_{Theta}left ( X ight )-Y=egin{bmatrix} gleft ( A^{1} ight )-y^{left (1  ight )}\ gleft ( A^{1} ight )-y^{left (1  ight )}\  cdots \  gleft ( A^{1} ight )-y^{left (1  ight )}end{bmatrix} = egin{bmatrix} e^{(1)}\ e^{(2)}\ cdots \ e^{(m)}end{bmatrix})

     当(j=0)时的更新过程为:

    (egin{align*} heta_{0}&= heta_{0}-alphasum_{m}^{i=1}left ( h_{ heta}left ( x^{(i)} ight)-y^{(i)} ight )x_{0}^{left (i  ight )}, left ( j=0cdots n ight ) \  &= heta_{0}-alphasum_{m}^{i=1}e^{left ( i ight )}x_{0}^{left ( i ight )} \ &= heta_{0}-alpha egin{bmatrix} x_{0}^{left ( 1 ight )} & x_{0}^{left ( 2 ight )} & cdots & x_{m}^{left ( 0 ight )} end{bmatrix} cdot E end{align*})

    对于( heta_{j})同理可以得到:

    ( heta_{j} = heta_{j}-alpha egin{bmatrix} x_{j}^{left ( 1 ight )} & x_{j}^{left ( 2 ight )} & cdots & x_{j}^{left ( m ight )} end{bmatrix} cdot E)

    用矩阵来表达就是:

    (egin{align*}egin{bmatrix} heta_{0}\ heta_{1}\ cdots \ heta_{n}end{bmatrix} &= egin{bmatrix} heta_{0}\ heta_{1}\ cdots \ heta_{n}end{bmatrix} - alpha cdot egin{bmatrix} x_{0}^{left ( 1 ight )} & x_{0}^{left ( 2 ight )} & cdots & x_{0}^{left (m ight )}\  x_{1}^{left ( 1 ight )} & x_{1}^{left ( 2 ight )} & cdots & x_{1}^{left (m ight )}\ cdots & cdots  &  cdots &  cdots\ x_{n}^{left ( 1 ight )} & x_{n}^{left ( 2 ight )} & cdots & x_{n}^{left (m ight )}\ end{bmatrix} cdot E \  &= heta - alpha cdot x^{T} cdot E end{align*})

    以上就三个步骤:

    1. 求取模型的输出:(A=X cdot Theta)

    2. sigmoid映射之后求误差:(E=gleft ( A ight )-Y)

    3. 利用推导的公式更新(Theta),(Theta:=Theta-alpha cdot X^{T} cdot E),然后继续回到第一步继续。

  • 相关阅读:
    【重启C++】-- 序
    关于值传递和指针传递
    *** 没有规则可以创建目标“test”。 停止。
    对无法解析的外部符号
    [转]解决在库中已定义的办法
    对库的选择顺序上的库文件
    去除MFC特性之一
    程序集之·二(修改一)
    使用excel4j生成excel文件
    Java拟合算法
  • 原文地址:https://www.cnblogs.com/tuhooo/p/9296915.html
Copyright © 2020-2023  润新知