• 使用 Python 验证数据集中的体温是否符合正态分布


    数据集地址:http://jse.amstat.org/datasets/normtemp.dat.txt 

    数据集描述:总共只有三列:体温、性别、心率

    #代码
    
    from scipy import stats as st
    import matplotlib.pyplot as plt
    import pandas as pd
    
    #防止乱码
    mpl.rcParams['font.sans-serif'] = [u'SimHei']
    mpl.rcParams['axes.unicode_minus'] = False
    
     
    
    #读入数据
    
    data = pd.read_csv('http://jse.amstat.org/datasets/normtemp.dat.txt',sep='s+',header=None,names='temperature;Gender;Heart rate'.split(';'))
    
    #数据描述
    
    data['temperature'].describe()

    输出:

    count    130.000000
    mean      98.249231
    std        0.733183
    min       96.300000
    25%       97.800000
    50%       98.300000
    75%       98.700000
    max      100.800000
    #四种方法验证
    
    #1 shapiro方法来检验体温是否符合正态分布
    
    print(st.shapiro(data['temperature']))
    
    #(0.9865769743919373, 0.2331680953502655)  第二个数为P值,大于0.05
    
    #2 normaltest方法验证体温是否符合正态分布
    
    print(st.normaltest(data['temperature'], axis=None))
    
    #NormaltestResult(statistic=2.703801433319236, pvalue=0.2587479863488212) 第二个数为P值,大于0.05
    
    #3 kstest方法来检验体温是否符合正态分布
    
    u = data['temperature'].mean()
    std = data['temperature'].std()
    print(st.kstest(data['temperature'], 'norm',(u,std)))
    
    #KstestResult(statistic=0.06472685044046644, pvalue=0.645030731743997) 第二个数为P值,大于0.05
    
    #4 anderson方法来检验体温是否符合正态分布
    
    print(st.anderson(data['temperature']))
    
    #AndersonResult(statistic=0.5201038826714353, critical_values=array([0.56 , 0.637, 0.765, 0.892, 1.061]), significance_level=array([15. , 10. ,  5. ,  2.5,  1. ])) 
    
    #显著性水平为[15. , 10. ,  5. ,  2.5,  1. ],statistic小于critical_values,该检验不能拒绝为正态分布,即该检验为正态分布。

    anderson方法说明:
    https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.anderson.html#scipy.stats.anderson
    normal/exponenential
    15%, 10%, 5%, 2.5%, 1%
    
    logistic
    25%, 10%, 5%, 2.5%, 1%, 0.5%
    
    Gumbel
    25%, 10%, 5%, 2.5%, 1%
    
    If the returned statistic is larger than these critical values then for the corresponding significance level, 
    the null hypothesis that the data come from the chosen distribution can be rejected.

    #绘图

    x = data['temperature']
    x = x.sort_values()
    loc,scale = st.norm.fit(x)
    plt.plot(x, st.norm.pdf(x,loc,scale),'b-',label = 'norm')
    plt.show()

  • 相关阅读:
    别了,DjVu!
    DjVu转PDF
    我的电子书历程
    连续翻页浏览器面临的共同问题
    对超过2TB的硬盘进行分区需要使用parted
    DB2常用命令
    CentOS增加网卡
    mysql相关参数解读
    max_user_connections参数设置试验
    mysql最大连接数试验
  • 原文地址:https://www.cnblogs.com/zgq25302111/p/11334044.html
Copyright © 2020-2023  润新知