二项LR
二项LR模型
[egin{array}{*{20}{l}}
{Pleft( {Y = 1|x}
ight) = frac{{{e^{w cdot x}}}}{{1 + {{
m{e}}^{w cdot x}}}} = sigma left( {w cdot x}
ight)}\
{Pleft( {Y = 0|x}
ight) = frac{1}{{1 + {{
m{e}}^{w cdot x}}}} = 1 - sigma left( {w cdot x}
ight)}
end{array}]
损失函数
LR的损失函数可以用最大似然估计得到,等价于交叉熵损失
[Lleft( w ight) = - tln left( {sigma left( {w cdot x} ight)} ight) - left( {1 - t} ight)ln left( {1 - sigma left( {w cdot x} ight)} ight)]
权重更新公式推导
梯度计算
[egin{array}{l}
frac{{dLleft( w
ight)}}{{dsigma left( {w cdot x}
ight)}} = - frac{t}{{sigma left( {w cdot x}
ight)}} + frac{{1 - t}}{{1 - sigma left( {w cdot x}
ight)}}\
frac{{dLleft( w
ight)}}{{dleft( {w cdot x}
ight)}} = frac{{dLleft( w
ight)}}{{dsigma left( {w cdot x}
ight)}} cdot frac{{dsigma left( {w cdot x}
ight)}}{{dleft( {w cdot x}
ight)}} = left( { - frac{t}{{sigma left( {w cdot x}
ight)}} + frac{{1 - t}}{{1 - sigma left( {w cdot x}
ight)}}}
ight)sigma left( {w cdot x}
ight)left( {1 - sigma left( {w cdot x}
ight)}
ight) = sigma left( {w cdot x}
ight) - t\
frac{{dLleft( w
ight)}}{{dw}} = frac{{dLleft( w
ight)}}{{dsigma left( {w cdot x}
ight)}} cdot frac{{dsigma left( {w cdot x}
ight)}}{{dleft( {w cdot x}
ight)}} cdot frac{{dleft( {w cdot x}
ight)}}{{dw}} = left( {sigma left( {w cdot x}
ight) - t}
ight) cdot x
end{array}]
权重更新
[w + = learning\_rate cdot left( {t - sigma left( {w cdot x} ight)} ight) cdot x]
LR和线性回归的联系
事件的几率:$frac{p}{{1 - p}}$
LR的对数几率:$log frac{{Pleft( {Y = 1|x} ight)}}{{1 - Pleft( {Y = 1|x} ight)}} = w cdot x$
LR是一个分类模型,LR可以看作是在线性回归输出上加了一个sigmoid函数,属于广义线性模型。
多项LR
[egin{array}{l}
Pleft( {Y = k|x}
ight) = frac{{exp left( {{w_k} cdot x}
ight)}}{{1 + sumlimits_{i = 1}^{K - 1} {exp left( {{w_i} cdot x}
ight)} }},k = 1,2, cdots ,K - 1\
Pleft( {Y = K|x}
ight) = frac{1}{{1 + sumlimits_{i = 1}^{K - 1} {exp left( {{w_i} cdot x}
ight)} }}
end{array}]
LR应用
LR的应用和优缺点
优点:
(1)对内存需求小,容易并行,无论在时间还是空间上都相当高效。
(2)模型可解释性强,很容易分析出各个特征对预测结果的影响。
缺点:
(1)容易欠拟合,分类精度不高。
(2)对特征工程依赖大,模型本身无法提取高阶特征,FM正是为了弥补这个缺陷。
LR怎么处理特征多重共线性的问题
(1)用PCA主成分分析去除特征共线性。
(2)L2正则化
LR+离散特征优势
1. 逻辑回归属于广义线性模型,表达能力受限;单变量离散化为N个后,每个变量有单独的权重,相当于为模型引入了非线性,能够提升模型表达能力。
2. 离散化后的特征对异常数据有很强的鲁棒性:比如一个特征是年龄>30是1,否则0。如果特征没有离散化,一个异常数据“年龄300岁”会给模型造成很大的干扰。
3. 稀疏向量内积乘法运算速度快,计算结果方便存储。
LRpython实现
class LogisticReressionClassifier: def __init__(self, max_iter=200, learning_rate=0.01): self.max_iter = max_iter self.learning_rate = learning_rate def sigmoid(self, x): return 1 / (1 + exp(-x)) def data_matrix(self, X): data_mat = [] for d in X: data_mat.append([1.0, *d]) return data_mat def fit(self, X, y): # label = np.mat(y) data_mat = self.data_matrix(X) # m*n self.weights = np.zeros((len(data_mat[0]),1), dtype=np.float32) for iter_ in range(self.max_iter): for i in range(len(X)): result = self.sigmoid(np.dot(data_mat[i], self.weights)) error = y[i] - result self.weights += self.learning_rate * error * np.transpose([data_mat[i]]) print('LogisticRegression Model(learning_rate={},max_iter={})'.format(self.learning_rate, self.max_iter)) def score(self, X_test, y_test): right = 0 X_test = self.data_matrix(X_test) for x, y in zip(X_test, y_test): result = np.dot(x, self.weights) if (result > 0 and y == 1) or (result < 0 and y == 0): right += 1 return right / len(X_test)
sklearn中的LR
from sklearn.linear_model import LogisticRegression clf = LogisticRegression(max_iter=200) clf.fit(X_train, y_train) clf.score(X_test, y_test)