• python 平均值/MAX/MIN值 计算从入门到精通


    入门级计算

    1、算数平均值

    #样本:
    S = [s1, s2, s3, …, sn]
    #算术平均值:
    m = (s1 + s2 + s3 + … + sn)/n
    

    Numpy中的写法

    m = numpy.mean(样本数组)
    

    2、加权平均值

    #样本:
    S = [s1, s2, s3, …, sn] 
    #权重:
    W = [w1, w2, w3, …, wn] 
    #加权平均值:
    a = (s1w1 + s2w2 + s3w3 + … + snwn)/(w1 + w2 + w3 + … + wn)
    

    3、Numpy中的格式

    首先是数据源:需要求加权平均值的数据列表和对应的权值列表

    elements = []
    weights = []
    

    使用numpy直接求:

    import numpy as np
    np.average(elements, weights=weights)
    

    附纯python写法:

    # 不使用numpy写法1
    round(sum([elements[i]*weights[i] for i in range(n)])/sum(weights), 1)
    
    # 不使用numpy写法2
    round(sum([j[0]*j[1] for j in zip(elements, weights)])/sum(weights), 1)
    

    定义函数计算一个序列的平均值的方法

    def average(seq, total=0.0):   
      num = 0   
      for item in seq:   
        total += item   
        num += 1   
      return total / num  
    

    如果序列是数组或者元祖可以简单使用下面的代码

    def average(seq):   
     return float(sum(seq)) / len(seq)  
    

    3、最大值与最小值

    1、最大值、最小值
    max:获取一个数组中最大元素
    min:获取一个数组中最小元素

    2、比较出最值数组
    maximum:在两个数组的对应元素之间构造最大值数组
    minimum:在两个数组的对应元素之间构造最小值数组

    例:numpy.maximum(a, b):在a数组与b数组中的各个元素对应比较,每次取出较大的那个数构成一个新数组

    3、练习

    import numpy as np
    # 最大值最小值
    a = np.random.randint(10, 100, 9).reshape(3, 3)
    print(a)
    # print('最大值:', np.max(a), a.max())  # 最大值
    # print('最小值:', np.min(a), a.min())  # 最小值
    # print('最大值索引:', np.argmax(a), a.argmax())  # 数组扁平为一维后的最大值索引
    
    # maximum最大值,minimum最小值
    b = np.random.randint(10, 100, 9).reshape(3, 3)
    print(b)
    print('构造最大值数组:
    ', np.maximum(a, b))
    print('构造最小值数组:
    ', np.minimum(a, b))
    

    精通级学习

    例一

    有一个df:

                 ID    wt  value
    Date                        
    01/01/2012  100  0.50     60
    01/01/2012  101  0.75     80
    01/01/2012  102  1.00    100
    01/02/2012  201  0.50    100
    01/02/2012  202  1.00     80
    

    相关代码如下:

    import numpy as np
    import pandas as pd
    index = pd.Index(['01/01/2012','01/01/2012','01/01/2012','01/02/2012','01/02/2012'], name='Date')
    df = pd.DataFrame({'ID':[100,101,102,201,202],'wt':[.5,.75,1,.5,1],'value':[60,80,100,100,80]},index=index)
    

    按“值”加权并按指数分组的“wt”的平均值为:

    Date
    01/01/2012    0.791667
    01/02/2012    0.722222
    dtype: float64
    

    或者,也可以定义函数:

    def grouped_weighted_avg(values, weights, by):
          return (values * weights).groupby(by).sum() / weights.groupby(by).sum()
    grouped_weighted_avg(values=df.wt, weights=df.value, by=df.index)
    
    Date
    01/01/2012    0.791667
    01/02/2012    0.722222
    dtype: float64
    

    更复杂的:

    grouped = df.groupby('Date')
    def wavg(group):
        d = group['value']
        w = group['wt']
        return (d * w).sum() / w.sum()
    grouped.apply(wavg)
    

    例二

      ind  dist  diff  cas
    0  la  10.0  0.54  1.0
    1   p   5.0  3.20  2.0
    2  la   7.0  8.60  3.0
    3  la   8.0  7.20  4.0
    4   p   7.0  2.10  5.0
    5   g   2.0  1.00  6.0
    6   g   5.0  3.50  7.0
    7  la   3.0  4.50  8.0
    
    
    df = pd.DataFrame({'ind':['la','p','la','la','p','g','g','la'],
                            'dist':[10.,5.,7.,8.,7.,2.,5.,3.],
                            'diff':[0.54,3.2,8.6,7.2,2.1,1.,3.5,4.5],
                            'cas':[1.,2.,3.,4.,5.,6.,7.,8.]})
    

    生成一列(使用 transform在组内获得标准化权重)weight
    df['weight'] = df['dist'] / df.groupby('ind')['dist'].transform('sum')
    df

      ind  dist  diff  cas    weight
    0  la  10.0  0.54  1.0  0.357143
    1   p   5.0  3.20  2.0  0.416667
    2  la   7.0  8.60  3.0  0.250000
    3  la   8.0  7.20  4.0  0.285714
    4   p   7.0  2.10  5.0  0.583333
    5   g   2.0  1.00  6.0  0.285714
    6   g   5.0  3.50  7.0  0.714286
    7  la   3.0  4.50  8.0  0.107143
    

    将这些权重乘以这些值,并取总和:

    df['wcas'], df['wdiff'] = (df[n] * df['weight'] for n in ('cas', 'diff'))
    df.groupby('ind')[['wcas', 'wdiff']].sum()
    
             wcas     wdiff
    ind                    
    g    6.714286  2.785714
    la   3.107143  4.882143
    p    3.750000  2.558333
    

    变异的写法:

    backup = df.copy()     # make a backup copy to mutate in place
    cols = df.columns[:2]  # cas, diff
    df[cols] = df['weight'].values[:, None] * df[cols]
    df.groupby('ind')[cols].sum()
    
              cas      diff
    ind                    
    g    6.714286  2.785714
    la   3.107143  4.882143
    p    3.750000  2.558333
    

    例四(比较直观)

    df = pd.DataFrame([('bird', 'Falconiformes', 389.0),
       ...:                    ('bird', 'Psittaciformes', 24.0),
       ...:                    ('mammal', 'Carnivora', 80.2),
       ...:                    ('mammal', 'Primates', np.nan),
       ...:                    ('mammal', 'Carnivora', 58)],
       ...:                   index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
       ...:                   columns=('class', 'order', 'max_speed'))
    
    df: 
              class           order  max_speed
    falcon     bird   Falconiformes      389.0
    parrot     bird  Psittaciformes       24.0
    lion     mammal       Carnivora       80.2
    monkey   mammal        Primates        NaN
    leopard  mammal       Carnivora       58.0
    
    grouped = df.groupby('class')
    grouped.sum()
    Out: 
            max_speed
    class            
    bird        413.0
    mammal      138.2
    

    例五

    df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(),
          'size': list('SSMMMLL'),
          'weight': [8, 10, 11, 1, 20, 12, 12],
          'adult': [False] * 5 + [True] * 2})
    df: 
      animal size  weight  adult
    0    cat    S       8  False
    1    dog    S      10  False
    2    cat    M      11  False
    3   fish    M       1  False
    4    dog    M      20  False
    5    cat    L      12   True
    6    cat    L      12   True
    

    List the size of the animals with the highest weight.

    df.groupby('animal').apply(lambda subf: subf['size'][subf['weight'].idxmax()])
    Out: 
    animal
    cat     L
    dog     M
    fish    M
    dtype: object
    

    其它参考文档:

    理解Pandas的Transform
    https://www.jianshu.com/p/20f15354aedd
    https://www.jianshu.com/p/509d7b97088c
    https://zhuanlan.zhihu.com/p/86350553
    http://www.zyiz.net/tech/detail-136539.html

    pandas:apply和transform方法的性能比较
    https://www.cnblogs.com/wkang/p/9794678.html

    https://www.jianshu.com/p/20f15354aedd
    https://zhuanlan.zhihu.com/p/101284491?utm_source=wechat_session
    https://www.cnblogs.com/bjwu/p/8970818.html
    https://www.jianshu.com/p/42f1d2909bb6

    官网的例子
    https://pandas.pydata.org/pandas-docs/dev/user_guide/groupby.html
    https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#cookbook-grouping
    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.transform.html

    pandas 数据聚合与分组运算

    获得Pandas中几列的加权平均值和标准差
    https://xbuba.com/questions/48307663

    Pandas里面的加权平均,我猜你不会用!
    https://blog.csdn.net/ddxygq/article/details/101351686

  • 相关阅读:
    初识js中的闭包
    ES5新增数组方法every()、some()、filter()、map()
    arguments对象的callee属性和caller属性
    js中的全局变量
    js中switch/case分支的值可以是变量或表达式
    js中的arguments对象
    CSSの変数を使う
    我应该使用预处理器吗
    JS导出网页数据到EXCEL
    冰与火之歌:浏览器前缀
  • 原文地址:https://www.cnblogs.com/treasury-manager/p/14072025.html
Copyright © 2020-2023  润新知