EM算法
EM算法主要用于求概率密度函数参数的最大似然估计,将问题$arg max _{ heta_{1}} sum_{i=1}^{n} ln pleft(x_{i} | heta_{1} ight)$转换为更加易于计算的$sum_{i=1}^{n} ln pleft(x_{i}, heta_{2} | heta_{1} ight)$,其中$ heta_2$可以取任意的先验分布$q( heta_2)$。EM算法的推导过程如下:$$egin{aligned} ln pleft(x | heta_{1} ight) &=int qleft( heta_{2} ight) ln pleft(x | heta_{1} ight) d heta_{2}=int qleft( heta_{2} ight) ln frac{pleft(x, heta_{2} | heta_{1} ight)}{pleft( heta_{2} | x, heta_{1} ight)} d heta_{2}=int qleft( heta_{2} ight) ln frac{pleft(x, heta_{2} | heta_{1} ight) qleft( heta_{2} ight)}{pleft( heta_{2} | x, heta_{1} ight) qleft( heta_{2} ight)} d heta_{2} \ &=underbrace{int qleft( heta_{2} ight) ln frac{pleft(x, heta_{2} | heta_{1} ight)}{qleft( heta_{2} ight)} d heta_{2}}_{ ext { define this to }mathcal{L}left(x, heta_1 ight)}+underbrace{int qleft( heta_{2} ight) ln frac{qleft( heta_{2} ight)}{pleft( heta_{2} | x, heta_{1} ight)} d heta_{2}}_{ ext { Kullback-Leibler divergence }} end{aligned}$$利用凸函数的性质,$ ext{KL divergence}=Eleft[-ln frac{pleft( heta_{2} | x, heta_{1} ight)}{qleft( heta_{2} ight)} ight]geq{-ln{Eleft[frac{pleft( heta_{2} | x, heta_{1} ight)}{qleft( heta_{2} ight)} ight]}}=-ln{1}=0$,当且仅当$qleft( heta_{2} ight)=pleft( heta_{2} | x, heta_{1} ight)$时$ ext{KL divergence}$取值为0。
基于以上推导,EM算法的计算流程如下:
给定初始值$ heta_1^{(0)}$,按以下步骤迭代至收敛(以第t+1步为例):
- E-step: 令$q_{t}left( heta_{2} ight)=pleft( heta_{2} | x, heta_{1}^{(t)} ight)$,则$mathcal{L}_{t}left(x, heta_{1} ight)=int q_{t}left( heta_{2} ight) ln pleft(x, heta_{2} | heta_{1} ight) d heta_{2}-underbrace{int q_{t}left( heta_{2} ight) ln q_{t}left( heta_{2} ight) d heta_{2}}_{ ext { can ignore this term }}$
- M-step: 令$ heta_{1}^{(t+1)}=arg max _{ heta_{1}} mathcal{L}_{t}left(x, heta_{1} ight)$
算法解释:
$$
egin{aligned} ln pleft(x | heta_{1}^{(t)}
ight) &=mathcal{L}_{t}left(x, heta_{1}^{(t)}
ight)+underbrace{K Lleft(q_tleft( heta_{2}
ight) | pleft( heta_{2} | x_{1}, heta_{1}^{(t)}
ight)
ight)}_{=0 ext { by setting } q=p}quad leftarrow ext { E-step } \ & leq mathcal{L}_{t}left(x, heta_{1}^{(t+1)}
ight) quad leftarrow ext { M-step } \ & leq mathcal{L}_{t}left(x, heta_{1}^{(t+1)}
ight)+underbrace{K Lleft(q_{t}left( heta_{2}
ight) | pleft( heta_{2} | x_{1}, heta_{1}^{(t+1)}
ight)
ight)}_{>0 ext { because } q
eq p} \ &=ln pleft(x | heta_{1}^{(t+1)}
ight)end{aligned}
$$
高斯混合模型GMM
高斯混合模型是一个用于聚类的概率模型,对于数据$vec{x}_1,vec{x}_2,cdots,vec{x}_n$中的任一数据$vec{x}_i$,$c_i$表示$vec{x}_i$被分配到了第$c_i$个簇中,并且$c_iin{1,2,cdots,K}$。模型定义如下:
- Prior cluster assignment: $c_{i} stackrel{ ext { iid }}{sim}$ Discrete $(vec{pi}) Rightarrow operatorname{Prob}left(c_{i}=k | vec{pi} ight)=pi_{k}$
- Generate observation: $vec{x}_i sim Nleft(vec{mu}_{c_{i}}, Sigma_{c_{i}} ight)$
模型需要求解的就是先验概率$vec{pi}=(pi_1,pi_2,cdots,pi_K)$,各簇高斯分布的均值${vec{mu}_1,vec{mu}_2,cdots,vec{mu}_K}$以及协方差矩阵${Sigma_1,Sigma_2,cdots,Sigma_K}$这些量。为了求解这些量,使用最大似然估计,定义需最大化的目标函数为
$$sum_{i=1}^{n} ln pleft(vec{x}_{i} | vec{pi}, oldsymbol{mu}, oldsymbol{Sigma} ight) ext{, where }oldsymbol{mu}={vec{mu}_1,vec{mu}_2,cdots,vec{mu}_K} ext{ and }oldsymbol{Sigma}={Sigma_1,Sigma_2,cdots,Sigma_K}$$
利用EM算法求解上式的最大值,将上式写为$$sum_{i=1}^{n} ln pleft(vec{x}_{i} | vec{pi}, oldsymbol{mu}, oldsymbol{Sigma} ight)=sum_{i=1}^{n} underbrace{sum_{k=1}^{K} qleft(c_{i}=k ight) ln frac{pleft(vec{x}_{i}, c_{i}=k | vec{pi}, oldsymbol{mu}, oldsymbol{Sigma} ight)}{qleft(c_{i}=k ight)}}_{mathcal{L}}+sum_{i=1}^nunderbrace{sum_{k=1}^{K} qleft(c_{i}=k ight) ln frac{qleft(c_{i}=k ight)}{pleft(c_{i}=k | vec{x}_{i}, vec{pi}, oldsymbol{mu}, oldsymbol{Sigma} ight)}}_{ ext{KL divergence}}$$
- E-step: 根据贝叶斯法则,令$q_tleft(c_{i}=k ight)=pleft(c_{i}=k | vec{x}_{i}, vec{pi}^{(t)}, mu^{(t)}, Sigma^{(t)} ight)propto pleft(c_{i}=k | vec{pi}^{(t)} ight) pleft(vec{x}_{i} | c_{i}=k, oldsymbol{mu}^{(t)}, oldsymbol{Sigma}^{(t)} ight)$,容易看出$$q_tleft(c_{i}=k ight)=frac{pi_{k}^{(t)} Nleft(vec{x}_{i} | vec{mu}_{k}^{(t)}, Sigma_{k}^{(t)} ight)}{sum_{j} pi_{j}^{(t)} Nleft(vec{x}_{i} | vec{mu}_{j}^{(t)}, Sigma_{j}^{(t)} ight)}$$
- M-step: $$argmax_{vec{pi}, oldsymbol{mu}, oldsymbol{Sigma}}sum_{i=1}^{n} sum_{k=1}^{K} q_tleft(c_{i}=k ight)ln pleft(vec{x}_{i}, c_{i}=k | vec{pi}, oldsymbol{mu}, oldsymbol{Sigma} ight)=argmax_{vec{pi}, oldsymbol{mu}, oldsymbol{Sigma}}sum_{i=1}^{n} sum_{k=1}^{K} q_tleft(c_{i}=k ight)left[ln pi_k+ln Nleft(vec{x}_{i} | vec{mu}_{k}, Sigma_{k} ight) ight]$$可以得出$pi_{k}^{(t+1)}=frac{sum_{i=1}^{n}q_tleft(c_i=k ight)}{sum_{j=1}^{K}sum_{i=1}^{n}q_tleft(c_i=j ight)}=frac{sum_{i=1}^{n}q_tleft(c_i=k ight)}{n}, quadvec{mu}_{k}^{(t+1)}=frac{sum_{i=1}^{n} q_tleft(c_i=k ight) vec{x}_{i}}{sum_{i=1}^{n}q_tleft(c_i=k ight)}, quad Sigma_{k}^{(t+1)}=frac{ sum_{i=1}^{n} q_tleft(c_i=k ight)left(vec{x_{i}}-vec{mu}_{k}^{(t+1)} ight)left(vec{x}_{i}-vec{mu}_{k}^{(t+1)} ight)^{T}}{sum_{i=1}^{n}q_tleft(c_i=k ight)}$