李宏毅老师机器学习第二课classification

李宏毅老师机器学习第二课classification

1.Classification

classification: x->function->class n

how to do classification?

train data for classification:

(x¹,y^{^1}) (x²,y^{^2}) (x³,y^{^3}) (x⁴,y^{^4})

ideal alternatives:

*function (model):

　　x->g(x)->g(x)>0------->class 1

　　　　　->g(x)<0-------->class 2

*loss function

      L(f)=∑δ(f(xⁿ)!=y^{^n})    the number of times f get incorrect results on training data

*find the best function

example:perceptron,svm

2.Gaussian distribution

*Gaussian distribution fuction      f_{u,_Σ}(x)=(2π)^-1/2Σ^-1/2exp(-1/2(x-u)^TΣ^-1(x-u))

input vector x     output:probability of sampling x

the shape of the function determines by vector mean u and covariance matrix Σ

*maxinum likeihood

the Gaussian with any mean u and covariance matrix Σ can generate these point but with different likehood

likehood of a Gaussian with mean u and covariance matrix Σ = the probability of the Gaussion sample x¹,x²,x³.....xⁿ

loss function    L(u,Σ)=f_u,Σ(x¹)f_u,Σ(x²)f_u,Σ(x³)_.......f_u,Σ(x⁴)

find best parameters    u*,Σ*=argmaxL(u,Σ)     u*=1/n∑x_i     Σ*=1/n∑(x_i-u*)(x_i-u*)^T

*classification with Gaussion distribution

Naive Bayes     P(c₁|x)=P(x|c1)P(c1)/P(x|c2)P(c2)+P(x|c1)P(c1)

P(x|c1):f_u^c1,Σ^c1(x)             P(x|c2):f_u^c2,Σ^c2(x)

*Modifying model

use different uc1,uc2,but use the same Σc1, Σc2,due to less parameters, Σ parameters number proportional to (x parameter)²

Modifying           ∑_new=(m/m+n)∑_c1+(n/m+n)∑_c2

*model flaw

use Naive Bayes classifier,all the dimensions are independent

*posterior probability:

P(c₁|x)=P(x|c₁)P(c₁)/P(x|c₁)P(c₁)+P(x|c₂)P(c₂)=1/1+P(x|c₂)P(c₂)/P(x|c₁)P(c₁)=1/1+exp(-z)=σ(z)=sigmod(z)

z=ln(P(x|c₁)P(c₁)/P(x|c₂)P(c₂))

*mathematical derivation

z=wx+b

3.Logistic Regression

P_w,b(c₁|x)=σ(z)    z=ln(P(x|c₁)P(c₁)/P(x|c₂)P(c₂))=wx+b σ(z)=1/1+exp(-z)

*step1 function set:     f_w,b(x)=P_w,b(c1|x)

*step 2 loss function of Logistic Regression

train data    x    x¹ x² x³ x⁴.....xⁿ                     x¹ x² x³ x⁴.....xⁿ

                   y^    c₁ c₂ c₁ c₁...... c_{2       ——>    1   0    1    1 ......0}

Assume the data is generated based on f_w,b(x)=P_w,b(c₁|x)

L(w,b)=f_w,b(x¹)(1-f_w,b(x²))f_w,b(x³)f_w,b(x⁴).....(1-f_w,b(xⁿ))

L(w,b)=Πf_w,b(xⁱ)    w*,b*=argmaxL(w,b)=argmin(-lnL(w,b))

-lnL(w,b)=-lnf_w,b(x¹)-ln(1-f_w,b(x²))-lnf_w,b(x³)-lnf_w,b(x⁴)........-ln(1-f_w,b(xⁿ))

              =∑-(y^lnf_w,b(xⁱ)+(1-y^)(ln(1-f_w,b(xⁱ)))) cross entropy between two Bernoulli distribution

*step3find the best function

δlnf_w,b(xⁿ)/δw_i=(1-σ(z))x_i

δln(1-f_w,b(xⁿ))/δw_i=-σ(z)

δlnL(w,b)/δw_i=∑-(y^{^n}-f_w,b(xⁿ))x_iⁿ

4.Multi-class classification

*softmax

c1:w¹,b₁     z₁=w¹+b₁       ——> e^z₁/∑e^z_j

c2:w²,b₂z₂=w²+b₂      ——>e^z₂/∑e^z_j

c3:w³,b₃z₃=w³+b₃     ——>e^z₂/∑e^z_j

softmax z_i——>e^z_i/∑e^z_i

probability of softmax: 0<y_i<1   ∑y_i=1

     ——>z₁ ——>softmax——>y₁loss fuction    y^{^}₁=[1 0 0]^T

x   ——>z₂ ——>softmax——>y₂     <————>   y^{^}₂=[1 0 0]^T

     ——>z₃——>softmax——>y₃      -∑y^{^}_ilny_iy^{^}₃=[1 0 0]^T

*once Logistic Regression can transformat feature

*cascading logistic regression models

x₁ ——>z₁——>softmax——>x₁^'

                                                        ——>z₃——>softmax——>y

x₂ ——>z₂——>softmax——>x₂^'

           feature transformat    Neual classification
相关阅读:
MySQL5.6升级5.7步骤
 PG数据库学习随笔(1)
MySQL 8017+版本的clone-plugin 应用
 AWS多元复制到EC2机器
 AWS告警优化
 mongo微服务搭建
 py执行数据库存储过程
 mysql temporary table表一个机智用法：
记录ddl操作
 sql改写
原文地址：https://www.cnblogs.com/SAM-CJM/p/13932096.html