Why Adaboost is equivalent to forward stagewise additive modeling using the loss function (L(y,f(x))=exp(-yf(x)))?
First we consider forward stagewise additive modeling using the loss function (L(y,f(x))=exp(-yf(x)))
Using the exponential loss function, one must solve:
[left(eta_{m}, G_{m}
ight)=arg min _{eta, G} sum_{i=1}^{N} exp left[-y_{i}left(f_{m-1}left(x_{i}
ight)+eta Gleft(x_{i}
ight)
ight)
ight]
]
we denote (w_{i}^{(m)}=exp left(-y_{i} f_{m-1}left(x_{i}
ight)
ight)),then:
[left(eta_{m}, G_{m}
ight)=arg min _{eta, G} sum_{i=1}^{N} w_{i}^{(m)} exp left(-eta y_{i} Gleft(x_{i}
ight)
ight)
]
Since when (y_{i} =G(x_{i})),(exp left(-eta y_{i} Gleft(x_{i}
ight)
ight)=1);when (y_{i}
eq G(x_{i})),(exp left(-eta y_{i} Gleft(x_{i}
ight)
ight)=-1)
we can rewrite (sum_{i=1}^{N} w_{i}^{(m)} exp left(-eta y_{i} Gleft(x_{i}
ight)
ight)) as:
[e^{-eta} cdot sum_{y_{i}=Gleft(x_{i}
ight)} w_{i}^{(m)}+e^{eta} cdot sum_{y_{i}
eq Gleft(x_{i}
ight)} w_{i}^{(m)}
]
which is equivalent as:
[left(e^{eta}-e^{-eta}
ight) cdot sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i}
eq Gleft(x_{i}
ight)
ight)+e^{-eta} cdot sum_{i=1}^{N} w_{i}^{(m)}
]
Therefore, the optimization of G and w are independent:
[G_{m}=arg min _{G} sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i}
eq Gleft(x_{i}
ight)
ight)
]
plugging this (G_m) into object function taking derivative to (eta), we have:
[left(e^{eta}+e^{-eta}
ight) cdot sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i}
eq G_mleft(x_{i}
ight)
ight)=e^{-eta} cdot sum_{i=1}^{N} w_{i}^{(m)}
]
[e^{2eta}+1=frac{sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i}
eq G_mleft(x_{i}
ight)
ight)}{sum_{i=1}^{N} w_{i}^{(m)}}
]
Thus:
[eta_{m}=frac{1}{2} log frac{1-operatorname{err}_{m}}{operatorname{err}_{m}}
]
where
[operatorname{err}_{m}=frac{sum_{i=1}^{N} w_{i}^{(m)} Ileft(y_{i}
eq G_{m}left(x_{i}
ight)
ight)}{sum_{i=1}^{N} w_{i}^{(m)}}
]
Since we use forward stagewise additive modeling,
[f_{m}(x)=f_{m-1}(x)+eta_{m} G_{m}(x)
]
we have:
[egin{aligned}
w_{i}^{(m+1)}&=exp left(-y_{i} f_{m}left(x_{i}
ight)
ight)\&=exp(-y_i(f_{m-1}(x)+eta_{m} G_{m}(x)))\&=w_{i}^{(m)} cdot e^{-eta_{m} y_{i} G_{m}left(x_{i}
ight)}
end{aligned}
]
Since (-y_{i} G_{m}left(x_{i}
ight)=2 cdot Ileft(y_{i}
eq G_{m}left(x_{i}
ight)
ight)-1), then
we can rewrite the above equation as
[w_{i}^{(m+1)}=w_{i}^{(m)} cdot e^{alpha_{m} Ileft(y_{i}
eq G_{m}left(x_{i}
ight)
ight)} cdot e^{-eta_{m}}
]
Where (alpha_{m}=2 eta_{m})
we can ignore the factors (e^{-eta_m}) since it is multiplied all weights by the same value.
[w_{i}^{(m+1)}=w_{i}^{(m)} cdot e^{alpha_{m} Ileft(y_{i}
eq G_{m}left(x_{i}
ight)
ight)}
]
also, we have:
[alpha_{m}=log frac{1-operatorname{err}_{m}}{operatorname{err}_{m}}
]
Compare it with adaboost, we can find that this is the same algorithm as Adaboost.