逻辑回归实战

开始，首先下载数据ex4Data.zip

假设该数据集代表着一所高中学生中40名被大学录取，而另外40名没有被大学录取。

每一个训练样例(x⁽ⁱ⁾,y⁽ⁱ⁾)包含一个学生的两科标准考试成绩以及是否被录取的标签。

现在需要建立一个分类模型，要求根据学生的两科考试成绩，来判断学生被录取的概率。

画出数据：

x = load('ex4x.dat');
y = load('ex4y.dat');

[m, n] = size(x);

% 插入项。因为有一个参数是常数项
x = [ones(m, 1), x];

figure
pos = find(y); neg = find(y == 0);
plot(x(pos, 2), x(pos,3), '+')
hold on
plot(x(neg, 2), x(neg, 3), 'o')
hold on
xlabel('Exam 1 score')
ylabel('Exam 2 score')

牛顿法

假设函数：

损失函数：

参数更新规则：

（t是迭代次数）

梯度和海森矩阵：

全部Matlab代码如下（参考NG的机器学习教程）：

clear all; close all; clc

x = load('ex4x.dat'); 
y = load('ex4y.dat');

[m, n] = size(x);

% Add intercept term to x
x = [ones(m, 1), x]; 

% Plot the training data
% Use different markers for positives and negatives
figure
pos = find(y); neg = find(y == 0);
plot(x(pos, 2), x(pos,3), '+')
hold on
plot(x(neg, 2), x(neg, 3), 'o')
hold on
xlabel('Exam 1 score')
ylabel('Exam 2 score')


% Initialize fitting parameters
theta = zeros(n+1, 1);

% Define the sigmoid function
g = inline('1.0 ./ (1.0 + exp(-z))'); 

% Newton's method
MAX_ITR = 7;
J = zeros(MAX_ITR, 1);

for i = 1:MAX_ITR
    % Calculate the hypothesis function
    z = x * theta;
    h = g(z);
    
    % Calculate gradient and hessian.
    % The formulas below are equivalent to the summation formulas
    % given in the lecture videos.
    grad = (1/m).*x' * (h-y);
    H = (1/m).*x' * diag(h) * diag(1-h) * x;
    
    % Calculate J (for testing convergence)
    J(i) =(1/m)*sum(-y.*log(h) - (1-y).*log(1-h));
    
    theta = theta - Hgrad;
end
% Display theta
theta

% Calculate the probability that a student with
% Score 20 on exam 1 and score 80 on exam 2 
% will not be admitted
prob = 1 - g([1, 20, 80]*theta)

% Plot Newton's method result
% Only need 2 points to define a line, so choose two endpoints
plot_x = [min(x(:,2))-2,  max(x(:,2))+2];
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
legend('Admitted', 'Not admitted', 'Decision Boundary')
hold off

% Plot J
figure
plot(0:MAX_ITR-1, J, 'o--', 'MarkerFaceColor', 'r', 'MarkerSize', 8)
xlabel('Iteration'); ylabel('J')
% Display J
J

相关阅读:
负载（Load）分析及问题排查
MySQL 数据库规范--调优篇(终结篇)
AbstractQueuedSynchronizer
为什么String被设计为不可变?是否真的不可变？
数据库分库分表分区
Oracle 数据库知识汇总篇
小知识：如何判断数据文件的高水位线
RHEL7安装11204 RAC的注意事项
案例：DG主库未设置force logging导致备库坏块
Oracle 11g RAC之HAIP相关问题总结

原文地址：https://www.cnblogs.com/90zeng/p/logistic_regression_newton.html