• [转]Basic Statistics and Metrics for Sensor Analysis


    http://www.comp.lancs.ac.uk/~kristof/research/notes/basicstats/index.html

    Basic Statistics and Metrics for Sensor Analysis

     

     

    back to notes

    Notations

    Say we have a system where at each point in time, we obtain d values coming from d different sensors. We call those d values features, and define the vector containing the d feature as a sample vector or feature vector x(t). Over a given period of time, we obtain several of these sample vectors and can put these vectors in one matrix, that we call dataset D.

    A row vector of dataset D is called a time vector, while a column vector was already defined as a sample vector.

    timeseries plot draws each feature as a function of time, and illustrates how the sensors behave over time. A cluster plot draws the sample vectors, or sub-vectors of the sample vectors, in their vectorspace.

    There is no a priori knowledge about the distribution of the observed data available.

       

    Example of a timeseries plot over 3000 samples (approx.48 hours)

    The mean / average / expected value

    The mean of a feature i over n d-dimensional samples is defined as:

    This is usually referred to as the arithmetic mean. Other statistics that classify as means are: arithmetic-geometric meangeometric meanharmonic meanquadratic meanroot-mean-square

       

    The variance and standard deviation

    The variance of a feature i over n d-dimensional samples is defined as:

    The square root of the variance is called the standard deviation:

    It basically is a measure of the extent to which a distribution varies from its mean.

     

       

    The covariance

    The covariance of two features i and j from n d-dimensional samples in a dataset D is defined as:

    Several properties of the covariance are:

    1. Cov(i,i) = Std(i)² = Var(i)

    2. i and j increase together  =>  Cov(i,j) > 0

    3. i decreases while j decreases  => Cov(i,j) < 0

    4. i and j independent  =>  Cov(i,j) = 0   for large number of samples

    5. |Cov(i,j)|  ≤ Std(i).Std(j)    or    -Std(i).Std(j)  ≤  Cov(i,j)  ≤ Std(i).Std(j)

    sometimes the lower-part of the division is defined as n-1 instead of n

     

       

    The covariance matrix

    All possible covariances can be put together in a covariance matrix Cov(D) for dataset D:

     

       

    Metrics

    Lets assume that x and y are two d-dimensional vectors. The aim is to calculate how close these vectors are to each other. For this, there exist several common techniques:

    The Manhattan, Cityblock or L1 metric is defined as:

    The Euclidean or L2 metric is defined as:

    The L∞, Chebychev or Maximum metric is defined as:

    The Minkowski or L metric is defined as:

    notice that λ=1 gives the cityblock distance, while λ=2 gives the Euclidean distance and λ=∞ gives the Chebychev distance.

    The Mahalanobis metric is defined as:

    note that if the covariance matrix Cov(D) is the identity matrix, then the Mahalanobis distance would be equal to the Euclidean.

       

    The different contours for constant Manhattan, Euclidean, L infinity and Mahalanobis metrics (given dependent dimensions) in 2D space.

     

    The Canberra metric is defined as:

    the output ranges from 0 to the number of variables used. The Canberra distance is very sensitive to small changes near zero.

    The Correlation metric is defined as:

    The correlation similarity measure is the covariance, divided by the variances, and takes values between -1 and 1. With this measure, the relative direction of the two observation vectors is important. The correlation similarity measure gives the cosine of the angle between the two observations vectors measured from the mean.  This measure makes more sense for time vectors, as opposed to sample vectors.

    The Auto Correlation metric is defined as:

    and is just the correlation between a series and a lagged version of itself, or the covariance divided by the variance. A high correlation is likely to indicate a periodicity in the signal of the corresponding time duration.

    If the autocorrelation coefficient is calculated for all lags k=0,1,2...N-1 the resulting series is called the autocorrelation seriesor the correlogram.

    The Angular metric is defined as:

    which is the cosine of the angle between the two observation vectors measured from zero and takes values from -1 to 1.

    neighbourhood of a vector using a certain metric, is found by varying the second vector in the distance formula so that the right side of the formula is smaller or equal to the left side.

     

       
     

    Compiled by Kristof Van Laerhoven.

    Thanks to Sergio R. Caprile for his comments..

  • 相关阅读:
    剑指 Offer 42. 连续子数组的最大和
    剑指 Offer 41. 数据流中的中位数
    剑指 Offer 40. 最小的k个数
    剑指 Offer 39. 数组中出现次数超过一半的数字
    20210510日报
    20210507日报
    20210506日报
    20210505日报
    20210504日报
    20210503日报
  • 原文地址:https://www.cnblogs.com/gaozehua/p/2516793.html
Copyright © 2020-2023  润新知