Proximal Gradient Descent for L1 Regularization

Proximal Gradient Descent for L1 Regularization

[本文链接：http://www.cnblogs.com/breezedeus/p/3426757.html，转载请注明出处]

假设我们要求解以下的最小化问题：
                                                                                ( minlimits_x f(x) ) 。
如果( f(x) )可导，那么一个简单的方法是使用Gradient Descent (GD)方法，也即使用以下的式子进行迭代求解：
                                               ( x_{k+1} := x_{k} - alpha abla f(x_{k}) ) 。
对GD的一种解释是( x_{k} )沿着当前目标函数的下降方向走一小段，只要步子足够小，总能保证得到 ( f(x_{k+1}) leq f(x_{k}) )。

如果( abla f(x) )满足L-Lipschitz，即：
                                                 ( || abla f(x') - abla f(x)|| leq L ||x’ - x|| )，
那么我们可以在点( x_{k} )附近把( f(x) )近似为：
                             ( hat{f}(x, x_k) doteq f(x_k) + langle abla f(x_k), x - x_k angle + frac{L}{2} ||x - x_k||^2 )。

把上面式子中各项重新排列下，可以得到：

显然( hat{f}(x, x_k) )的最小值在
                                                      ( x_{k+1} = x_k - frac 1 L abla f(x_k) )
获得。所以，从这个角度上看的话，GD的每次迭代是在最小化原目标的一个二次近似函数。


在很多最小化问题中，我们往往会加入非光滑的惩罚项( g(x) )，比如常见的L1惩罚：( g(x) = ||x||_1 )。这个时候，GD就不好直接推广了。但上面的二次近似思想却可以推广到这种情况：
                              。
这就是所谓的proximal gradient descent(PGD)算法。只要给定( g(x) )时下面的最小化问题能容易地求解，PGD就能高效地使用：
                                    。
比如( g(x) = ||x||_1 )时， ( ext{prox}_{mu g} (z))能够通过所谓的soft thresholding获得：

                                                 ( ext{prox}_{mu g} (z) = ext{sign}(z) max{|z| - mu, 0} )。

[References]

[1] John Wright. Lecture III: Algorithms, 2013.
相关阅读:
topcoder srm 681 div1
topcoder srm 683 div1
topcoder srm 684 div1
topcoder srm 715 div1
topcoder srm 685 div1
topcoder srm 687 div1
topcoder srm 688 div1
topcoder srm 689 div1
topcoder srm 686 div1
topcoder srm 690 div1 -3
原文地址：https://www.cnblogs.com/breezedeus/p/3426757.html

Proximal Gradient Descent for L1 Regularization

[本文链接：http://www.cnblogs.com/breezedeus/p/3426757.html，转载请注明出处]