• Pandas聚合


    当有了滚动,扩展和ewm对象创建了以后,就有几种方法可以对数据执行聚合。

    DataFrame应用聚合

    让我们创建一个DataFrame并在其上应用聚合。

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('1/1/2019', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    
    print (df)
    print("=======================================")
    r = df.rolling(window=3,min_periods=1)
    print (r)
    
    Python

    执行上面示例代码,得到以下结果 -

                       A         B         C         D
    2019-01-01 -0.901602 -1.778484  0.728295 -0.758108
    2019-01-02 -0.826162  0.994140  0.976164 -0.918249
    2019-01-03  0.260841  0.905993  1.505967 -0.124883
    2019-01-04 -0.112230 -0.111885  0.702712 -0.871768
    2019-01-05 -0.239969  1.435918 -0.160140 -0.547702
    2019-01-06 -0.126897 -2.628206 -0.280658  0.167422
    2019-01-07  0.367903  0.994337 -0.529830  0.195990
    2019-01-08 -0.530872 -0.384915 -0.397150 -0.024074
    2019-01-09 -0.418925  0.049046 -0.816616  0.308107
    2019-01-10 -0.176857  2.573145  0.010211 -1.427078
    =======================================
    Rolling [window=3,min_periods=1,center=False,axis=0]
    
    Shell

    可以通过向整个DataFrame传递一个函数来进行聚合,或者通过标准的获取项目方法来选择一个列。

    在整个数据框上应用聚合

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('1/1/2000', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    print df
    
    r = df.rolling(window=3,min_periods=1)
    print r.aggregate(np.sum)
    
    Python

    执行示例代码,得到以下结果 -

                       A         B         C         D
    2020-01-01  1.069090 -0.802365 -0.323818 -1.994676
    2020-01-02  0.190584  0.328272 -0.550378  0.559738
    2020-01-03  0.044865  0.478342 -0.976129  0.106530
    2020-01-04 -1.349188 -0.391635 -0.292740  1.412755
    2020-01-05  0.057659 -1.331901 -0.297858 -0.500705
    2020-01-06  2.651680 -1.459706 -0.726023  0.294283
    2020-01-07  0.666481  0.679205 -1.511743  2.093833
    2020-01-08 -0.284316 -1.079759  1.433632  0.534043
    2020-01-09  1.115246 -0.268812  0.190440 -0.712032
    2020-01-10 -0.121008  0.136952  1.279354  0.275773
    ============================================
                       A         B         C         D
    2020-01-01  1.069090 -0.802365 -0.323818 -1.994676
    2020-01-02  1.259674 -0.474093 -0.874197 -1.434938
    2020-01-03  1.304539  0.004249 -1.850326 -1.328409
    2020-01-04 -1.113739  0.414979 -1.819248  2.079023
    2020-01-05 -1.246664 -1.245194 -1.566728  1.018580
    2020-01-06  1.360151 -3.183242 -1.316621  1.206333
    2020-01-07  3.375821 -2.112402 -2.535624  1.887411
    2020-01-08  3.033846 -1.860260 -0.804134  2.922160
    2020-01-09  1.497411 -0.669366  0.112329  1.915845
    2020-01-10  0.709922 -1.211619  2.903427  0.097785
    
    Shell

    在数据框的单个列上应用聚合

    示例代码

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('1/1/2000', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    print (df)
    print("====================================")
    r = df.rolling(window=3,min_periods=1)
    print (r['A'].aggregate(np.sum))
    
    Python

    执行上面示例代码,得到以下结果 -

                       A         B         C         D
    2000-01-01 -1.095530 -0.415257 -0.446871 -1.267795
    2000-01-02 -0.405793 -0.002723  0.040241 -0.131678
    2000-01-03 -0.136526  0.742393 -0.692582 -0.271176
    2000-01-04  0.318300 -0.592146 -0.754830  0.239841
    2000-01-05 -0.125770  0.849980  0.685083  0.752720
    2000-01-06  1.410294  0.054780  0.297992 -0.034028
    2000-01-07  0.463223 -1.239204 -0.056420  0.440893
    2000-01-08 -2.244446 -0.516937 -2.039601 -0.680606
    2000-01-09  0.991139  0.026987 -2.391856  0.585565
    2000-01-10  0.112228 -0.701284 -1.139827  1.484032
    ====================================
    2000-01-01   -1.095530
    2000-01-02   -1.501323
    2000-01-03   -1.637848
    2000-01-04   -0.224018
    2000-01-05    0.056004
    2000-01-06    1.602824
    2000-01-07    1.747747
    2000-01-08   -0.370928
    2000-01-09   -0.790084
    2000-01-10   -1.141079
    Freq: D, Name: A, dtype: float64
    
    Shell

    在DataFrame的多列上应用聚合

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('1/1/2018', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    print (df)
    print ("==========================================")
    r = df.rolling(window=3,min_periods=1)
    print (r[['A','B']].aggregate(np.sum))
    
    Python

    执行上面示例代码,得到以下结果 -

                       A         B         C         D
    2018-01-01  0.518897  0.988917  0.435691 -1.005703
    2018-01-02  1.793400  0.130314  2.313787  0.870057
    2018-01-03 -0.297601  0.504137 -0.951311 -0.146720
    2018-01-04  0.282177  0.142360 -0.059013  0.633174
    2018-01-05  2.095398 -0.153359  0.431514 -1.185657
    2018-01-06  0.134847  0.188138  0.828329 -1.035120
    2018-01-07  0.780541  0.138942 -1.001229  0.714896
    2018-01-08  0.579742 -0.642858  0.835013 -1.504110
    2018-01-09 -1.692986 -0.861327 -1.125359  0.006687
    2018-01-10 -0.263689  1.182349 -0.916569  0.617476
    ==========================================
                       A         B
    2018-01-01  0.518897  0.988917
    2018-01-02  2.312297  1.119232
    2018-01-03  2.014697  1.623369
    2018-01-04  1.777976  0.776811
    2018-01-05  2.079975  0.493138
    2018-01-06  2.512422  0.177140
    2018-01-07  3.010786  0.173722
    2018-01-08  1.495130 -0.315777
    2018-01-09 -0.332703 -1.365242
    2018-01-10 -1.376932 -0.321836
    
    Shell

    在DataFrame的单个列上应用多个函数

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('2019/01/01', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    print (df)
    
    print("==========================================")
    
    r = df.rolling(window=3,min_periods=1)
    print (r['A'].aggregate([np.sum,np.mean]))
    
    Python

    执行上面示例代码,得到以下结果 -

                       A         B         C         D
    2019-01-01  1.022641 -1.431910  0.780941 -0.029811
    2019-01-02 -0.302858  0.009886 -0.359331 -0.417708
    2019-01-03 -1.396564  0.944374 -0.238989 -1.873611
    2019-01-04  0.396995 -1.152009 -0.560552 -0.144212
    2019-01-05 -2.513289 -1.085277 -1.016419 -1.586994
    2019-01-06 -0.513179  0.823411  0.670734  1.196546
    2019-01-07 -0.363239 -0.991799  0.587564 -1.100096
    2019-01-08  1.474317  1.265496 -0.216486 -0.224218
    2019-01-09  2.235798 -1.381457 -0.950745 -0.209564
    2019-01-10 -0.061891 -0.025342  0.494245 -0.081681
    ==========================================
                     sum      mean
    2019-01-01  1.022641  1.022641
    2019-01-02  0.719784  0.359892
    2019-01-03 -0.676780 -0.225593
    2019-01-04 -1.302427 -0.434142
    2019-01-05 -3.512859 -1.170953
    2019-01-06 -2.629473 -0.876491
    2019-01-07 -3.389707 -1.129902
    2019-01-08  0.597899  0.199300
    2019-01-09  3.346876  1.115625
    2019-01-10  3.648224  1.216075
    
    Shell

    在DataFrame的多列上应用多个函数

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(10, 4),
          index = pd.date_range('2020/01/01', periods=10),
          columns = ['A', 'B', 'C', 'D'])
    
    print (df)
    print("==========================================")
    r = df.rolling(window=3,min_periods=1)
    print (r[['A','B']].aggregate([np.sum,np.mean]))
    
    Python

    执行上面示例代码,得到以下结果 -

                       A         B         C         D
    2020-01-01  1.053702  0.355985  0.746638 -0.233968
    2020-01-02  0.578520 -1.171843 -1.764249 -0.709913
    2020-01-03 -0.491185  0.975212  0.200139 -3.372621
    2020-01-04 -1.331328  0.776316  0.216623  0.202313
    2020-01-05 -1.023147 -0.913686  1.457512  0.999232
    2020-01-06  0.995328 -0.979826 -1.063695  0.057925
    2020-01-07  0.576668  1.065767 -0.270744 -0.513707
    2020-01-08  0.520258  0.969043 -0.119177 -0.125620
    2020-01-09 -0.316480  0.549085  1.862249  1.091265
    2020-01-10  0.461321 -0.368662 -0.988323  0.543011
    ==========================================
                       A                   B          
                     sum      mean       sum      mean
    2020-01-01  1.053702  1.053702  0.355985  0.355985
    2020-01-02  1.632221  0.816111 -0.815858 -0.407929
    2020-01-03  1.141037  0.380346  0.159354  0.053118
    2020-01-04 -1.243993 -0.414664  0.579686  0.193229
    2020-01-05 -2.845659 -0.948553  0.837843  0.279281
    2020-01-06 -1.359146 -0.453049 -1.117195 -0.372398
    2020-01-07  0.548849  0.182950 -0.827744 -0.275915
    2020-01-08  2.092254  0.697418  1.054985  0.351662
    2020-01-09  0.780445  0.260148  2.583896  0.861299
    2020-01-10  0.665099  0.221700  1.149466  0.383155
    
    Shell

    将不同的函数应用于DataFrame的不同列

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(np.random.randn(3, 4),
          index = pd.date_range('2020/01/01', periods=3),
          columns = ['A', 'B', 'C', 'D'])
    print (df)
    print("==========================================")
    r = df.rolling(window=3,min_periods=1)
    print (r.aggregate({'A' : np.sum,'B' : np.mean}))
    
    Python

    执行上面示例代码,得到以下结果 -

                       A         B         C         D
    2020-01-01 -0.246302 -0.057202  0.923807 -1.019698
    2020-01-02  0.285287  1.467206 -0.368735 -0.397260
    2020-01-03 -0.163219 -0.401368  1.254569  0.580188
    ==========================================
                       A         B
    2020-01-01 -0.246302 -0.057202
    2020-01-02  0.038985  0.705002
    2020-01-03 -0.124234  0.336212
    
    Shell
     
  • 相关阅读:
    Discuz X 2.5 点点(伪静态)
    jq 、xml 省市级联动
    php memcache 初级使用(2)
    关于windows虚拟内存管理的页目录自映射
    SharePoint 2010 网络上的开发经验和资源
    SharePoint 2010 Reporting Services 报表服务器正在内置 NT AUTHORITY\SYSTEM 账户下运行 解决方法
    SharePoint 2010 Reporting Services 报表服务器无法解密用于访问报表服务器数据库中的敏感数据或加密数据的对称密钥 解决方法
    Active Directory Rights Management Services (AD RMS)无法检索证书层次结构。 解决方法
    SharePoint 2010 Reporting Services 报表服务器实例没有正确配置 解决方法
    SharePoint 2010 页面引用 Reporting Services 展现 List 报表
  • 原文地址:https://www.cnblogs.com/navysummer/p/9641157.html
Copyright © 2020-2023  润新知