机器学习基础-Logistic回归2

机器学习基础-Logistic回归2
随机梯度上升法--一次仅用一个样本点来更新回归系数（因为可以在新样本到来时对分类器进行增量式更新，因而属于在线学习算法）

梯度上升法在每次更新回归系统时都需要遍历整个数据集，该方法在处理100个左右的数据集时尚可，但如果有数十亿样本和成千上万的特征，那么该方法的计算复杂度太高了。

随机梯度上升算法伪代码：

所有回归系数初始化为1

对数据集中每个样本

计算该样本的梯度

使用alpha*gradient更新回归系数值

返回回归系数值
```
def stocGradAscent0(dataMatrix, classLabels):
    m,n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)   #initialize to all ones
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i]*weights))
        error = classLabels[i] - h
        weights = weights + alpha * error * dataMatrix[i]
    return weights
```
回归系数经过大量迭代才能达到稳定值，并且仍然有局部波动的现象。

对于随机梯度算法中存在的问题，可以通过改进的随机梯度上升算法来解决。
```
def stocGradAscent1(dataMatrix, classLabels, numIter=150):
    m,n = shape(dataMatrix)
    weights = ones(n)   #initialize to all ones
    for j in range(numIter):
        dataIndex = range(m)
        for i in range(m):
            alpha = 4/(1.0+j+i)+0.0001    #apha decreases with iteration, does not 
            randIndex = int(random.uniform(0,len(dataIndex)))#go to 0 because of the constant
            index=dataIndex[randIndex]
            h = sigmoid(sum(dataMatrix[index]*weights))
            error = classLabels[index] - h
            weights = weights + alpha * error * dataMatrix[index]
            del(dataIndex[randIndex])
    return weights
```
改进：

1.alpha在每次迭代的时候都会调整，这会缓解数据的波动或者高频波动。虽然alpha会随着迭代次数不断减小，但永远不会减到0，保证了新数据在多次迭代之后仍然具有一定的影响。

2.通过随机选取样本来更新回归系数。这种方法将减少周期性的波动。
相关阅读:
oracle数据库创建后要做的事情
 （转）ORA-12519: TNS:no appropriate service handler found 的问题处理。
oracle数据库出现“批处理中出现错误: ORA-00001: 违反唯一约束条件”解决方法
 oracle一点记录
 ora-01400 无法将NULL插入 ID 解决方法
 execl一个工作薄中有几个个工作表，将这几个个工作表分别保存到不同execl文件中
 cutpFTP设置步骤
 Oracle数据库备份与还原操作具体步骤
 redis的安装和pip连接
 微信授权登录
原文地址：https://www.cnblogs.com/ryuham/p/4236065.html