• bimodal or multimodal data test and analysis in R


    A number of tests are available to determine if a data set is distributed in a bimodal (or multimodal) fashion.

    Graphical methods:图想法用均值、方差、峰度、偏斜度来plot out数据的分布来观测

    'modes' packages

    bimodality_amplitude functionR中可以用的函数包)

    how to judge the distribution of unimodal or bimodal distribution

     skewness2 + 1 < kurtosis   b2-b1>=1

    Where b2 is the kurtosis and b1 is the square of the skewness. Equality holds only for the two point Bernoulli distribution 伯努利函数or the sum of two different Dirac delta functions.狄拉克δ函数 These are the most extreme cases of bimodality possible. The kurtosis(峰度) in both these cases is 1. 

    Since they are both symmetrical their skewness(偏斜度) is 0 and the difference is 1.

    later several test proposed out: second central differences二次中心矩、 F test、Fishcer's G test、fourth test四重检验、likelihood ratio etc

    Pasted Graphic.tiff

    Antimode tests:检验峰谷的位置也就是两个数据之间、双峰数据之间的最佳分离点

    Otsu's method 最大类间方差法

    This is a commonly employed method in computer graphics to determine the optimal separation between two distributions.

    The algorithm assumes that the image contains two classes of pixels following bi-modal histogram (foreground pixels and background pixels), it then calculates the optimum threshold separating the two classes so that their combined spread (intra-class variance) is minimal, or equivalently (because the sum of pairwise squared distances is constant), so that their inter-class variance is maximal.

    该算法假定图像包含双模式直方图(前景像素和背景像素)之后的两类像素,然后计算分离两个类的最佳阈值,使得它们的组合传播(类内方差)最小,等价地(因为成对平方距离之和是常数),因此它们的类间方差是最大的.

    Other General tests:检验数据除了单峰或者双峰分布还有没有更多的其他的分布

    To test if a distribution is other than unimodal, several additional tests have been devised: the bandwidth test,the dip test,the excess mass test,  MAP test, the mode existence test, the runt test, the span test、and the saddle test.

    The dip test is available for use in R

    The P values for the dip statistic values range between 0 and 1. P values less than 0.05 indicate significant multimodality and P values greater than 0.05 but less than 0.10 suggest multimodality with marginal significance.

    Silverman's test:用于检验数据中有多少种modes形式

    Silverman introduced a bootstrap method for the number of modes。The test uses a fixed bandwidth which reduces the power of the test and its interpretability. Under smoothed densities may have an excessive number of modes whose count during bootstrapping is unstable.

    Bajgier-Aggarwal test 检验峰度的分布

    Bajgier and Aggarwal have proposed a test based on the kurtosis of the distribution

    Mixture of two normal distributions两个混合正态分布的检验

     the Kernel Mean Matching algorithm is used to decide if a data set belongs to a single normal distribution or to a mixture of two normal distributions. Kmeans method to determine the points are in one normal distribution or two mixed normal distribution

    The mix tools package available for R can test for and estimate the parameters of a number of different distributions.A package for a mixture of two right-tailed gamma distributions is available

    Several other packages for R are available to fit mixture models; these include flexmix,mcclust,and mixdist

     

  • 相关阅读:
    string.format()详解
    微信支持的Authorization code授权模式(公众号开发)(开放平台资料中心中的代公众号发起网页授权)
    HttpClient类详解
    PostgreSQL按年月日分组(关键词:extract time as Year/Month/Day)
    sql中的(case when then else end )的用法(相当于java中的if else)
    模板题 Problem I Link Cut Tree
    BZOJ2555 SubString
    poj3264(简单线段树)
    poj1011(DFS+剪枝)
    poj1042(贪心+枚举)
  • 原文地址:https://www.cnblogs.com/beckygogogo/p/9263619.html
Copyright © 2020-2023  润新知