• beta函数与置信度估计


    可信度的估计

    • 二项分布中的(p) 服从Beta分布 $ { m beta}(alpha, eta)$, 密度函数 (frac1{B(alpha, eta)} x^{alpha-1} (1-x)^{eta -1})
    • 均值 (frac alpha {alpha + eta})
    • 方差 (frac {alpha eta} {(alpha+eta)^2 (alpha+ eta + 1) } ​)
    
    from scipy.stats import beta
    
    def confidence(n_bad, n_good, tol=2):
        ''' 返回估计的坏率p, 以及在tol倍标准差下的可信度'''
        a, b = n_bad+1, n_good+1
        p = a / (a+b)
        v = beta.std(a, b)
        up, low =  min(1, p + v*tol), max(0, p - v*tol)
        d = beta.cdf(up, a,b) -  beta.cdf(low, a,b)
        return p, v, d
    
    
    
    test_set = [
        (500, 20000, 2), 
        (1000, 200000, 2), 
        (2000, 200000, 2), 
        (5000, 200000, 2),
        (500,  100000, 2), 
        (1000, 100000, 2), 
        (2000, 100000, 2), 
        (5000, 100000, 2), 
        (2000, 10000, 2), 
    ]
    
    print("  bad;  total; 均值p;    标准差v;     均值的相对误差e;  置信度")
    for (n_bad, n_good, tol)  in  test_set:
        p,v,d = confidence(n_bad, n_good, tol)
    
        ss = ('{:5d};{:7d}; p={p:0.4f}; v={v:0.6f}; e={e:0.3f}; '  
             + '均值在[p - {t}v, p + {t}v]的概率 {d:2.2f}%'
             ).format(n_bad, n_bad+n_good, p=p,v=v, c=v/p, d =d*100,t=tol, e=tol*v/p)
        print(ss)
    
    
      bad;  total; 均值p;    标准差v;     均值的相对误差e;  置信度
      500;  20500; p=0.0244; v=0.001078; e=0.088; 均值在[p - 2v, p + 2v]的概率 95.46%
     1000; 201000; p=0.0050; v=0.000157; e=0.063; 均值在[p - 2v, p + 2v]的概率 95.46%
     2000; 202000; p=0.0099; v=0.000220; e=0.044; 均值在[p - 2v, p + 2v]的概率 95.45%
     5000; 205000; p=0.0244; v=0.000341; e=0.028; 均值在[p - 2v, p + 2v]的概率 95.45%
      500; 100500; p=0.0050; v=0.000222; e=0.089; 均值在[p - 2v, p + 2v]的概率 95.46%
     1000; 101000; p=0.0099; v=0.000312; e=0.063; 均值在[p - 2v, p + 2v]的概率 95.46%
     2000; 102000; p=0.0196; v=0.000434; e=0.044; 均值在[p - 2v, p + 2v]的概率 95.45%
     5000; 105000; p=0.0476; v=0.000657; e=0.028; 均值在[p - 2v, p + 2v]的概率 95.45%
     2000;  12000; p=0.1667; v=0.003402; e=0.041; 均值在[p - 2v, p + 2v]的概率 95.45%
    

    结论: 坏样本大于2000以上, 在95%置信度下, 坏率的相对误差<5%

  • 相关阅读:
    三十一:数据库之SQLAlchemy属性常用数据类型和Column常用参数
    xml和configparser模块
    shelve和hashlib模块
    json和pickle序列化模块
    sys模块和shutil模块
    random和os模块
    collections、time和datetime模块
    Python模块及其导入
    Python生成器和迭代器
    Python装饰器
  • 原文地址:https://www.cnblogs.com/bregman/p/10510308.html
Copyright © 2020-2023  润新知