• Classification and logistic regression


    logistic 回归

    1.问题:

    在上面讨论回归问题时。讨论的结果都是连续类型。但假设要求做分类呢?即讨论结果为离散型的值。

    2.解答:

    • 假设:这里写图片描写叙述
      当中:这里写图片描写叙述
      g(z)的图形例如以下:
      这里写图片描写叙述
      由此可知:当hθ(x)<0.5时我们能够觉得为0,反之为1,这样就变成离散型的数据了。

    • 推导迭代式:

      • 利用概率论进行推导,找出样本服从的分布类型,利用最大似然法求出对应的θ
      • 这里写图片描写叙述
      • 因此:这里写图片描写叙述
        这里写图片描写叙述
    • 结果:这里写图片描写叙述

    • 注意:这里的迭代式增量迭代法

    Newton迭代法:

    1.问题:

    上述迭代法,收敛速度非常慢,在利用最大似然法求解的时候能够运用Newton迭代法,即θ := θf(θ)f(θ)

    2.解答:

    • 推导:

      • Newton迭代法是求θ,且f(θ)=0。刚好:l(θ)=0
      • 所以能够将Newton迭代法改写成:这里写图片描写叙述
    • 定义:

      • 当中:l(θ) = 这里写图片描写叙述
      • 这里写图片描写叙述
        因此:H矩阵就是l′′(θ),即H1 = 1/l′′(θ)
      • 所以:这里写图片描写叙述
    • 应用:

      • 特征值比較少的情况,否则H1的计算量是非常大的

    Logistic 0、1分类:

    1.自己设定迭代次数

      自己编写对应的循环,给出迭代次数以及下降坡度alpha,进行增量梯度下降。


    主要函数及功能:

    • Logistic_Regression 相当于主函数
    • gradientDecent 梯度下降更新θ函数
    • computeCost 计算损失J函数

    Logistic_Regression

    
    %%  part0: 准备
    data = load('ex2data1.txt');
    x = data(:,[1,2]);
    y = data(:,3);
    pos = find(y==1);
    neg = find(y==0);
    
    x1 = x(:,1);
    x2 = x(:,2);
    plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');
    pause;
    
    
    %% part1: GradientDecent and compute cost of J
    [m,n] = size(x);
    x = [ones(m,1),x];
    theta = zeros(3,1);
    J = computeCost(x,y,theta);
    
    theta = gradientDecent(x, y, theta);
    X = 25:100;
    Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);
    plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');
    pause;
    

    gradientDecent

    function theta = gradientDecent(x, y, theta)
    
    %% compute GradientDecent 更新theta,利用的是增量梯度下降
    m = size(x,1);
    alph = 0.001;
    for iter = 1:150000
        for j = 1:3
            dec = 0;
            for i = 1:m
                dec = dec + (y(i) - sigmoid(x(i,:)*theta))*x(i,j);
            end
            theta(j,1) = theta(j,1) + dec*alph/m;
        end
    end
    end
    

    sigmoid

    function g = sigmoid(z)
    
    %% SIGMOID Compute sigmoid functoon
    
    g = 1/(1+exp(-z));
    
    end
    

    computeCost

    function J = computeCost(x, y, theta)
    
    %% compute cost: J
    
    m = size(x,1);
    J = 0;
    for i = 1:m
       J =  J + y(i)*log(sigmoid(x(i,:)*theta)) + (1 - y(i))*log(1 - sigmoid(x(i,:)*theta));
    end
    J = (-1/m)*J;
    end

    结果例如以下:

    离散点。初始数据

    这里写图片描写叙述

    2. 利用fminunc函数:

      给出损失J的计算方式和θ的计算方式。然后调用fminunc函数计算出最优解

    主要函数及功能:

    • Logistics_Regression 相当于主函数
    • computeCost给出Jθ的计算方式
    • sigmoid函数

    Logistics_Regression

    
    %%  part0: 准备
    data = load('ex2data1.txt');
    x = data(:,[1,2]);
    y = data(:,3);
    pos = find(y==1);
    neg = find(y==0);
    
    x1 = x(:,1);
    x2 = x(:,2);
    plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');
    pause;
    
    
    %% part1: GradientDecent and compute cost of J
    [m,n] = size(x);
    x = [ones(m,1),x];
    theta = zeros(3,1);
    options = optimset('GradObj', 'on', 'MaxIter', 400);
    
    %  Run fminunc to obtain the optimal theta
    %  This function will return theta and the cost 
    [theta, cost] = ...
        fminunc(@(t)(computeCost(x,y,t)), theta, options);
    X = 25:100;
    Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);
    plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');
    pause;
    

    sigmoid

    function g = sigmoid(z)
    
    %% SIGMOID Compute sigmoid functoon
    
    g = zeros(size(z));
    g = 1.0 ./ (1.0 + exp(-z));
    
    end
    

    computeCost

    function [J,grad] = computeCost(x, y, theta)
    
    %% compute cost: J
    
    m = size(x,1);
    grad = zeros(size(theta));
    hx = sigmoid(x * theta);  
     J = (1.0/m) * sum(-y .* log(hx) - (1.0 - y) .* log(1.0 - hx));  
     grad = (1.0/m) .* x' * (hx - y);
    end

    结果

    这里写图片描写叙述

    这里写图片描写叙述

    Logistic multi_class

    1.条件

    • 自己做的数据:
    1,5,1
    1,6,1
    1.5,3.5,1
    2.5,3.5,1
    2,6,1
    3,7,1
    4,6,1
    3.5,4.5,1
    2,4,1
    2,5,1
    4,4,1
    5,5,1
    6,4,1
    5,3,1
    4,2,1
    4,3,2
    5,3,2
    5,2,2
    5,1.5,2
    7,1.5,2
    5,2.5,2
    6,2.5,2
    5.5,2.5,2
    5,1,2
    6,2,2
    6,3,2
    5,4,2
    7,5,2
    7,2,2
    8,1,2
    8,3,2
    7,4,3
    7,5,3
    8.5,5.5,3
    9,4,3
    8,5.5,3
    8,4.5,3
    9.5,5.5,3
    8,4.5,3
    8.5,4.5,3
    7,6,3
    6,5,3
    9,5,3
    9,6,3
    8,6,3
    8,7,3
    10,6,3
    10,4,3
    
    
    • 数据离散图:

      这里写图片描写叙述

    2.算法推到

    • 花费J :
      这里写图片描写叙述

    • 更新θ
      这里写图片描写叙述

    • 算法思路(这个算法也叫one_vs_all):

      这里写图片描写叙述

      假设样本分成K类,。那我们训练K组θ,依次考虑每一类样本,然后把其他的全部样本当做一类样本,这样就把这类样本和其他分开了。我们把考虑的那类样本的y值改为1,其他为0。这样就得到K组θ值。

    3.代码实现:

    这里採用fminuc函数实现

    1.函数级功能简单介绍:

    • Logistic_Regression : 相当于主函数
    • oneVsAll: 写成一个循环,依次计算出K组θ。利用fminunc调用计算函数
    • computeCost:当中主要写J&θ更新函数

    2.代码:

    • Logistic_Regerssion:
    
    %%  part0: 准备
    data = load('data.txt');
    x = data(:,[1,2]);
    y = data(:,3);
    y1 = find(y==1);
    y2 = find(y==2);
    y3 = find(y==3);
    
    plot(x(y1,1),x(y1,2),'r*',x(y2,1),x(y2,2),'c+',x(y3,1),x(y3,2),'bo');
    pause;
    
    
    %% part1: GradientDecent and compute cost of J
    [m,n] = size(x);
    x = [ones(m,1),x];
    theta = zeros(3,3);
    
    %  Run fminunc to obtain the optimal theta
    %  This function will return theta and the cost 
    
    [thetas,cost]= one_vs_all(x,y,theta);
    X = 1:10;
    Y1 = -(thetas(1,1) + thetas(2,1)*X)/thetas(3,1);
    Y2 = -(thetas(1,2) + thetas(2,2)*X)/thetas(3,2);
    Y3 = -(thetas(1,3) + thetas(2,3)*X)/thetas(3,3);
    plot(x(y1,2),x(y1,3),'r*',x(y2,2),x(y2,3),'c+',x(y3,2),x(y3,3),'bo');
    hold on
    plot(X,Y1,'r',X,Y2,'g',X,Y3,'c');
    
    
    
    • one_vs_all:
    function [theta,cost] = one_vs_all(x, y, theta)
    
    %% compute cost: J
    
    options = optimset('GradObj', 'on', 'MaxIter', 400);
    n = size(x,2);
    cost = zeros(n,1);
    num_labels = 3;
    for i = 1:num_labels
        L = logical(y==i);
        [theta(:,i), cost(i,1)] = ...
        fminunc(@(t)(computeCost(x,L,t)), theta(:,i), options);
    end
    
    • computeCost:
    function [J,grad] = computeCost(x, y, thetas)
    
    %% compute cost: J
    
    m = size(x,1);
    grad = zeros(size(thetas));
    hx = sigmoid(x * thetas);  
     J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));  
     grad = (1.0/m) .* x' * (hx - y);
    end

    3.效果:

    • θ & J cost:
    
    thetas =
    
        6.3988    5.1407  -24.4266
       -2.0773    0.2173    2.1641
        0.9857   -1.9490    2.2038
    
    >> cost
    
    cost =
    
        0.1715
        0.2876
        0.1031
    
    • 图形显示:
      这里写图片描写叙述

    • 注意三条线组成的三角形。。这个地方的点不属于不论什么类别。

    补充:

    1.regularized Logistic Regerssion

    • regularized 和 普通的Logistics没有太大的差别,仅仅是在J的计算和θ更新中加上了曾经的结果。

    这里写图片描写叙述

    2.one_vs_all:

    1.简单介绍:

    • 事实上one_vs_all另一种算法,把θ当做单隐层前馈神经网络进行计算。比方说我们有K类样本,第一类样本我们能够看成[1,0,0,0...]共k个数,,然后依次。,第i个为1则代表第i类样本。计算方式和上面multi_class一样。

    • 前馈神经网络模型例如以下:
      这里写图片描写叙述

    2.代码:

    • 函数介绍:

      • one_vs_all:相当于主函数。
      • IrCostFunction:花费Jθ更新
      • myPredict:统计训练误差
    • 数据 和 训练得到的θ
      点击这儿下载

    • 训练结果:

    
    Local minimum found.
    
    Optimization completed because the size of the gradient is less than
    the default value of the function tolerance.
    
    <stopping criteria details>
    
    
    Local minimum found.
    
    Optimization completed because the size of the gradient is less than
    the default value of the function tolerance.
    
    <stopping criteria details>
    
    
    Training Set Accuracy: 100.000000
    • one_vs_all:
    function [all_theta,cost] = oneVsAll(X, y, num_labels)
    %ONEVSALL trains multiple logistic regression classifiers and returns all
    %the classifiers in a matrix all_theta, where the i-th row of all_theta 
    %corresponds to the classifier for label i
    %   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
    %   logisitc regression classifiers and returns each of these classifiers
    %   in a matrix all_theta, where the i-th row of all_theta corresponds 
    %   to the classifier for label i
    
    % Some useful variables
    
    m = size(X, 1);
    n = size(X, 2);
    
    % You need to return the following variables correctly 
    all_theta = zeros(n+1,num_labels);
    
    % Add ones to the X data matrix
    X = [ones(m, 1),X];
    
    % ====================== YOUR CODE HERE ======================
    % Instructions: You should complete the following code to train num_labels
    %               logistic regression classifiers with regularization
    %               parameter lambda. 
    %
    % Hint: theta(:) will return a column vector.
    %
    % Hint: You can use y == c to obtain a vector of 1's and 0's that tell use 
    %       whether the ground truth is true/false for this class.
    %
    % Note: For this assignment, we recommend using fmincg to optimize the cost
    %       function. It is okay to use a for-loop (for c = 1:num_labels) to
    %       loop over the different classes.
    %
    %       fmincg works similarly to fminunc, but is more efficient when we
    %       are dealing with large number of parameters.
    %
    % Example Code for fmincg:
    %
    %     % Set Initial theta
    %     initial_theta = zeros(n + 1, 1);
    %     
    %     % Set options for fminunc
    %     options = optimset('GradObj', 'on', 'MaxIter', 50);
    % 
    %     % Run fmincg to obtain the optimal theta
    %     % This function will return theta and the cost 
    %     [theta] = ...
    %         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
    %                 initial_theta, options);
    %
    
    cost = zeros(num_labels,1);
    options = optimset('GradObj', 'on', 'MaxIter', 50);
    for i =1:num_labels
        L = logical(y==i);
         [all_theta(:,i),cost(i,1)] = ...
           fminunc (@(t)(lrCostFunction(t, X, L)),all_theta(:,i), options);
    end
    
    myPredict(all_theta,X,y);
    
    
    
    
    % =========================================================================
    
    
    end
    
    • IrCostFunction:
    function [J,grad] = lrCostFunction(thetas,x, y)
    %LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
    %regularization
    %   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
    %   theta as the parameter for regularized logistic regression and the
    %   gradient of the cost w.r.t. to the parameters. 
    
    % Initialize some useful values
    m = length(y); % number of training examples
    
    %单独调试该函数时用的代码
    %x = [ones(m,1),x];
    %theta = zeros(size(x,2),1);
    %y = logical(y==1);
    
    % ====================== YOUR CODE HERE ======================
    % Instructions: Compute the cost of a particular choice of theta.
    %               You should set J to the cost.
    %               Compute the partial derivatives and set grad to the partial
    %               derivatives of the cost w.r.t. each parameter in theta
    %
    % Hint: The computation of the cost function and gradients can be
    %       efficiently vectorized. For example, consider the computation
    %
    %           sigmoid(X * theta)
    %
    %       Each row of the resulting matrix will contain the value of the
    %       prediction for that example. You can make use of this to vectorize
    %       the cost function and gradient computations. 
    %
    % Hint: When computing the gradient of the regularized cost function, 
    %       there're many possible vectorized solutions, but one solution
    %       looks like:
    %           grad = (unregularized gradient for logistic regression)
    %           temp = theta; 
    %           temp(1) = 0;   % because we don't add anything for j = 0  
    %           grad = grad + YOUR_CODE_HERE (using the temp variable)
    %
    
    grad = zeros(size(thetas));
    hx = sigmoid(x * thetas);  
    J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));  
    grad = (1.0/m) .* x' * (hx - y);
    % ================================================x=============
    
    end
    
    • myPredict:
    function p = myPredict(Theta1,X,y)
    %PREDICT Predict the label of an input given a trained neural network
    %   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
    %   trained weights of a neural network (Theta1, Theta2)
    
    % Useful values
    m = size(X, 1);
    num_labels = 10;
    
    % You need to return the following variables correctly 
    p = zeros(size(X, 1), 1);
    
    % ====================== YOUR CODE HERE ======================
    % Instructions: Complete the following code to make predictions using
    %               your learned neural network. You should set p to a 
    %               vector containing labels between 1 to num_labels.
    %
    % Hint: The max function might come in useful. In particular, the max
    %       function can also return the index of the max element, for more
    %       information see 'help max'. If your examples are in rows, then, you
    %       can use max(A, [], 2) to obtain the max for each row.
    %
    
    z_2 = X*Theta1;
    a_2 = sigmoid(z_2);
    for i = 1:m
        for j = 1:num_labels
            if a_2(i,j) >= 0.5
                p(i,1) = j;
                break;
            end
        end
    end
      fprintf('
    Training Set Accuracy: %f
    ', mean(double(p == y)) * 100);
    
    % =========================================================================
    
    
    end
    

    与本博客相关知识链接:

  • 相关阅读:
    暑假D16 T3 密道(数位DP? 打表找规律)
    暑假D16 T2 无聊 (深搜)
    暑假D14 T3 cruise(SDOI2015 寻宝游戏)(虚树+set)
    Django url
    http协议
    host文件以及host的作用
    用socket写一个简单的服务器
    python中*args **kwargs
    javascript 判断对像是否相等
    html input标签详解
  • 原文地址:https://www.cnblogs.com/yjbjingcha/p/7350957.html
Copyright © 2020-2023  润新知