• 异常检测算法的Octave仿真


    基于高斯分布的异常检测算法一文中,详细给出了异常检测算法的原理及其公式,本文为该算法的Octave仿真。实例为,根据训练样例(一组网络服务器)的吞吐量(Throughput)和延迟时间(Latency)数据,标记出异常的服务器。

    可视化的数据集如下:

    我们根据数据集X,计算其二维高斯分布的数学期望mu与方差sigma2:

    function [mu sigma2] = estimateGaussian(X)
    %ESTIMATEGAUSSIAN This function estimates the parameters of a 
    %Gaussian distribution using the data in X
    %   [mu sigma2] = estimateGaussian(X), 
    %   The input X is the dataset with each n-dimensional data point in one row
    %   The output is an n-dimensional vector mu, the mean of the data set
    %   and the variances sigma^2, an n x 1 vector
    % 
    
    % Useful variables
    [m, n] = size(X);
    
    mu = zeros(n, 1);
    sigma2 = zeros(n, 1);
    
    
    mu = sum(X,1)'/m; 
    %note:mu and sigma are both n-demension.
    for(i=1:m)
    e=(X(i,:)'-mu);
    sigma2 += e.^2;
    endfor
    
    sigma2 = sigma2/m
    
    end
    

    计算概率密度:

    function p = multivariateGaussian(X, mu, Sigma2)
    %MULTIVARIATEGAUSSIAN Computes the probability density function of the
    %multivariate gaussian distribution.
    %    p = MULTIVARIATEGAUSSIAN(X, mu, Sigma2) Computes the probability 
    %    density function of the examples X under the multivariate gaussian 
    %    distribution with parameters mu and Sigma2. If Sigma2 is a matrix, it is
    %    treated as the covariance matrix. If Sigma2 is a vector, it is treated
    %    as the sigma^2 values of the variances in each dimension (a diagonal
    %    covariance matrix)
    %
    
    k = length(mu);
    
    if (size(Sigma2, 2) == 1) || (size(Sigma2, 1) == 1)
        Sigma2 = diag(Sigma2);
    end
    
    X = bsxfun(@minus, X, mu(:)');
    p = (2 * pi) ^ (- k / 2) * det(Sigma2) ^ (-0.5) * ...
        exp(-0.5 * sum(bsxfun(@times, X * pinv(Sigma2), X), 2));
    
    end
    

    可视化后:

    根据预留的一部分已知是否异常的训练样例(CV集),来选择阈值:

    function [bestEpsilon bestF1] = selectThreshold(yval, pval)
    %SELECTTHRESHOLD Find the best threshold (epsilon) to use for selecting
    %outliers
    %   [bestEpsilon bestF1] = SELECTTHRESHOLD(yval, pval) finds the best
    %   threshold to use for selecting outliers based on the results from a
    %   validation set (pval) and the ground truth (yval).
    %
    
    bestEpsilon = 0;
    bestF1 = 0;
    F1 = 0;
    
    stepsize = (max(pval) - min(pval)) / 1000;
    for epsilon = min(pval):stepsize:max(pval)
        
    
        pred = (pval<epsilon); 
        
        p_e_1 = (pred==1);
        y_e_1 = (yval==1);
        p1 = 0;
        m = size(p_e_1,1);
        for(i=1:m)
          if((p_e_1(i)==1)&&(p_e_1(i)==y_e_1(i)))
            p1++;
          endif
        endfor
        p_12 = sum(pred);
        p_13 = sum(y_e_1);
        
        p=p1/p_12;
        r=p1/p_13;
        
        F1 = 2*p*r/(p+r);
        
    
        if F1 > bestF1
           bestF1 = F1;
           bestEpsilon = epsilon;
        end
    end
    
    end
    

     最终的标记结果:

  • 相关阅读:
    Zoundry Raven测试
    asp.net 异步加载?
    网页数据表格自动填充序号
    1.JSP
    C#基础之CLR的执行模型(二)
    C#基础之CLR的执行模型(一)
    java提供类与cglib包实现动态代理
    CSS初窥...
    Go的牛逼之处
    Go灵魂级选手之流程控制
  • 原文地址:https://www.cnblogs.com/rhyswang/p/7089316.html
Copyright © 2020-2023  润新知