• Pandas | 16 聚合


    当有了滚动,扩展和ewm对象创建了以后,就有几种方法可以对数据执行聚合。

    DataFrame应用聚合

    可以通过向整个DataFrame传递一个函数来进行聚合,或者通过标准的获取项目方法来选择一个列。

    在整个数据框上应用聚合

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('1/1/2000', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    print(df)
    print('
    ')
    
    r = df.rolling(window=3,min_periods=1)
    print(r)
    print('
    ')
    
    print(r.aggregate(np.sum))

    输出结果:

                       A         B         C         D
    2000-01-01 1.081883 -1.133242 -0.477461 0.669900
    2000-01-02 -1.120673 -0.889724 0.232907 0.391879
    2000-01-03 -0.050530 -0.213853 0.100309 0.296723
    2000-01-04 0.165836 -0.015513 -1.008884 -1.877693
    2000-01-05 0.210501 -1.395490 -0.495589 -0.072882
    2000-01-06 -0.639261 -2.301506 0.703845 -0.867376
    2000-01-07 -0.225980 0.684229 0.985126 0.763059
    2000-01-08 -0.748013 1.274504 -0.195817 2.293899
    2000-01-09 -1.683620 -1.466185 0.491427 -1.895749
    2000-01-10 0.842794 1.598099 0.843714 0.777707


    Rolling [window=3,min_periods=1,center=False,axis=0]


    A B C D
    2000-01-01 1.081883 -1.133242 -0.477461 0.669900
    2000-01-02 -0.038790 -2.022966 -0.244553 1.061778
    2000-01-03 -0.089320 -2.236820 -0.144245 1.358501
    2000-01-04 -1.005367 -1.119090 -0.675668 -1.189091
    2000-01-05 0.325807 -1.624856 -1.404165 -1.653851
    2000-01-06 -0.262924 -3.712509 -0.800629 -2.817951
    2000-01-07 -0.654740 -3.012767 1.193381 -0.177199
    2000-01-08 -1.613253 -0.342773 1.493154 2.189581
    2000-01-09 -2.657613 0.492548 1.280736 1.161209
    2000-01-10 -1.588839 1.406418 1.139325 1.175857
    
    

    在数据框的单个列上应用聚合

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('1/1/2000', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    print (df) print("n")
    r
    = df.rolling(window=3,min_periods=1) print (r['A'].aggregate(np.sum))

    输出结果:

                       A         B         C         D
    2000-01-01 -1.095530 -0.415257 -0.446871 -1.267795
    2000-01-02 -0.405793 -0.002723  0.040241 -0.131678
    2000-01-03 -0.136526  0.742393 -0.692582 -0.271176
    2000-01-04  0.318300 -0.592146 -0.754830  0.239841
    2000-01-05 -0.125770  0.849980  0.685083  0.752720
    2000-01-06  1.410294  0.054780  0.297992 -0.034028
    2000-01-07  0.463223 -1.239204 -0.056420  0.440893
    2000-01-08 -2.244446 -0.516937 -2.039601 -0.680606
    2000-01-09  0.991139  0.026987 -2.391856  0.585565
    2000-01-10  0.112228 -0.701284 -1.139827  1.484032
    
    2000-01-01   -1.095530
    2000-01-02   -1.501323
    2000-01-03   -1.637848
    2000-01-04   -0.224018
    2000-01-05    0.056004
    2000-01-06    1.602824
    2000-01-07    1.747747
    2000-01-08   -0.370928
    2000-01-09   -0.790084
    2000-01-10   -1.141079
    Freq: D, Name: A, dtype: float64
    
     

    在DataFrame的多列上应用聚合

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('1/1/2018', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    print (df) print (" ")
    r
    = df.rolling(window=3,min_periods=1) print (r[['A','B']].aggregate(np.sum))
    输出结果:
                       A         B         C         D
    2018-01-01  0.518897  0.988917  0.435691 -1.005703
    2018-01-02  1.793400  0.130314  2.313787  0.870057
    2018-01-03 -0.297601  0.504137 -0.951311 -0.146720
    2018-01-04  0.282177  0.142360 -0.059013  0.633174
    2018-01-05  2.095398 -0.153359  0.431514 -1.185657
    2018-01-06  0.134847  0.188138  0.828329 -1.035120
    2018-01-07  0.780541  0.138942 -1.001229  0.714896
    2018-01-08  0.579742 -0.642858  0.835013 -1.504110
    2018-01-09 -1.692986 -0.861327 -1.125359  0.006687
    2018-01-10 -0.263689  1.182349 -0.916569  0.617476
    
                       A         B
    2018-01-01  0.518897  0.988917
    2018-01-02  2.312297  1.119232
    2018-01-03  2.014697  1.623369
    2018-01-04  1.777976  0.776811
    2018-01-05  2.079975  0.493138
    2018-01-06  2.512422  0.177140
    2018-01-07  3.010786  0.173722
    2018-01-08  1.495130 -0.315777
    2018-01-09 -0.332703 -1.365242
    2018-01-10 -1.376932 -0.321836
    
     

    在DataFrame的单个列上应用多个函数 (用列表包裹多个函数)

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('2019/01/01', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    print (df)
    
    print("
    ")
    
    r = df.rolling(window=3,min_periods=1)
    print (r['A'].aggregate([np.sum,np.mean]))

    输出结果:

                       A         B         C         D
    2019-01-01  1.022641 -1.431910  0.780941 -0.029811
    2019-01-02 -0.302858  0.009886 -0.359331 -0.417708
    2019-01-03 -1.396564  0.944374 -0.238989 -1.873611
    2019-01-04  0.396995 -1.152009 -0.560552 -0.144212
    2019-01-05 -2.513289 -1.085277 -1.016419 -1.586994
    2019-01-06 -0.513179  0.823411  0.670734  1.196546
    2019-01-07 -0.363239 -0.991799  0.587564 -1.100096
    2019-01-08  1.474317  1.265496 -0.216486 -0.224218
    2019-01-09  2.235798 -1.381457 -0.950745 -0.209564
    2019-01-10 -0.061891 -0.025342  0.494245 -0.081681
    
                     sum      mean
    2019-01-01  1.022641  1.022641
    2019-01-02  0.719784  0.359892
    2019-01-03 -0.676780 -0.225593
    2019-01-04 -1.302427 -0.434142
    2019-01-05 -3.512859 -1.170953
    2019-01-06 -2.629473 -0.876491
    2019-01-07 -3.389707 -1.129902
    2019-01-08  0.597899  0.199300
    2019-01-09  3.346876  1.115625
    2019-01-10  3.648224  1.216075

    在DataFrame的多列上应用多个函数

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('2020/01/01', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    
    print (df)
    print("
    ")
    r
    = df.rolling(window=3,min_periods=1) print (r[['A','B']].aggregate([np.sum,np.mean]))

    输出结果:

                       A         B         C         D
    2020-01-01  1.053702  0.355985  0.746638 -0.233968
    2020-01-02  0.578520 -1.171843 -1.764249 -0.709913
    2020-01-03 -0.491185  0.975212  0.200139 -3.372621
    2020-01-04 -1.331328  0.776316  0.216623  0.202313
    2020-01-05 -1.023147 -0.913686  1.457512  0.999232
    2020-01-06  0.995328 -0.979826 -1.063695  0.057925
    2020-01-07  0.576668  1.065767 -0.270744 -0.513707
    2020-01-08  0.520258  0.969043 -0.119177 -0.125620
    2020-01-09 -0.316480  0.549085  1.862249  1.091265
    2020-01-10  0.461321 -0.368662 -0.988323  0.543011
    
                       A                   B          
                     sum      mean       sum      mean
    2020-01-01  1.053702  1.053702  0.355985  0.355985
    2020-01-02  1.632221  0.816111 -0.815858 -0.407929
    2020-01-03  1.141037  0.380346  0.159354  0.053118
    2020-01-04 -1.243993 -0.414664  0.579686  0.193229
    2020-01-05 -2.845659 -0.948553  0.837843  0.279281
    2020-01-06 -1.359146 -0.453049 -1.117195 -0.372398
    2020-01-07  0.548849  0.182950 -0.827744 -0.275915
    2020-01-08  2.092254  0.697418  1.054985  0.351662
    2020-01-09  0.780445  0.260148  2.583896  0.861299
    2020-01-10  0.665099  0.221700  1.149466  0.383155
    
     

    将不同的函数应用于DataFrame的不同列

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(3, 4),
          index = pd.date_range('2020/01/01', periods=3),
          columns = ['A', 'B', 'C', 'D'])
    print (df) print(" ")
    r
    = df.rolling(window=3,min_periods=1) print (r.aggregate({'A' : np.sum,'B' : np.mean}))

    输出结果:

                       A         B         C         D
    2020-01-01 -0.246302 -0.057202  0.923807 -1.019698
    2020-01-02  0.285287  1.467206 -0.368735 -0.397260
    2020-01-03 -0.163219 -0.401368  1.254569  0.580188
    
                       A         B
    2020-01-01 -0.246302 -0.057202
    2020-01-02  0.038985  0.705002
    2020-01-03 -0.124234  0.336212
    
  • 相关阅读:
    hadoop hdfs基本命令的java编码实现
    三维空间旋转和Three.JS中的实现
    Talk about VR
    Keras bug in model.predict
    How to compile tensorflow on CentOS
    熵(Entropy),交叉熵(Cross-Entropy),KL-松散度(KL Divergence)
    Two kinds of item classification model architecture
    Three failed attempts of handling non-sequential data
    How to setup Tensorflow inception-v3 model on Windows
    (译)三维空间中的几种坐标系
  • 原文地址:https://www.cnblogs.com/Summer-skr--blog/p/11705883.html
Copyright © 2020-2023  润新知