• Python基本数据统计(三)---- Grouping & Merge


    6. Grouping

      6.1 统计近一年每个月的开盘天数:

    In [38]: tempdf
    Out[38]: 
                     open      close       high        low      volume  month
    2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0      1
    2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0      1
    ...
    2017-01-20  75.989998  76.199997  76.910004  75.389999   8382000.0      1
    
    [252 rows x 6 columns]
    
    In [39]: tempdf.groupby('month').count()
    Out[39]: 
           open  close  high  low  volume
    month                                
    1        19     19    19   19      19
    2        20     20    20   20      20
    ...
    12       21     21    21   21      21
    
    In [41]: tempdf.groupby('month').count().open
    Out[41]: 
    month
    1     19
    2     20
    3     22
    4     21
    5     21
    6     22
    7     20
    8     23
    9     21
    10    21
    11    21
    12    21
    Name: open, dtype: int64
    tempdf.groupby('month').count().open

      6.2 统计近一年每个月的总成交量:(计算总和,再选出需求列)

    In [43]: tempdf.groupby('month').sum()
    Out[43]: 
                  open        close         high          low       volume
    month                                                                 
    1      1317.435962  1311.872716  1328.947705  1301.086107  190434600.0
    2      1056.469042  1058.853017  1066.995761  1046.069857  159711400.0
    3      1281.837069  1286.438220  1293.423343  1270.692270  129864200.0
    4      1296.296570  1299.964789  1308.367777  1288.042310   99855500.0
    5      1332.088683  1333.182926  1342.942408  1323.906527   96741700.0
    6      1365.402100  1363.315079  1376.771659  1350.264324  121788800.0
    7      1244.562320  1247.524731  1254.559166  1237.280184   90064900.0
    8      1484.674795  1484.189312  1492.571208  1475.708341   77514100.0
    9      1342.668112  1341.954764  1353.150415  1332.294785   95572800.0
    10     1331.611493  1329.281184  1341.236638  1320.451196  116243400.0
    11     1459.883671  1463.727278  1471.155603  1449.657290   99527200.0
    12     1549.889790  1549.352107  1561.082060  1538.528258   75948200.0
    
    In [44]: tempdf.groupby('month').sum().volume
    Out[44]: 
    month
    1     190434600.0
    2     159711400.0
    3     129864200.0
    4      99855500.0
    5      96741700.0
    6     121788800.0
    7      90064900.0
    8      77514100.0
    9      95572800.0
    10    116243400.0
    11     99527200.0
    12     75948200.0
    Name: volume, dtype: float64
    tempdf.groupby('month').sum().volume

      6.3 前两例总结为df_object.groupby('month')后跟  .sum()  .mean()  .min()  .max()

      6.4 更高效地统计近一年每个月的总成交量:(先选出需求列,再计算和。处理大数据时效果明显)

    In [46]: tempdf.groupby('month').volume.sum()
    Out[46]: 
    month
    1     190434600.0
    2     159711400.0
    3     129864200.0
    4      99855500.0
    5      96741700.0
    6     121788800.0
    7      90064900.0
    8      77514100.0
    9      95572800.0
    10    116243400.0
    11     99527200.0
    12     75948200.0
    Name: volume, dtype: float64
    tempdf.groupby('month').volume.sum()

    7. Merge

      7.1 Append

    In [49]: quotesdf[:2].append(quotesdf['2016-12-01':'2016-12-05'])
    Out[49]: 
                     open      close       high        low      volume
    2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0
    2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0
    2016-12-01  72.142314  72.221976  72.460955  71.475161   4298200.0
    2016-12-02  72.212016  71.554823  72.351429  71.126648   2959100.0
    2016-12-05  72.022829  71.724100  72.331509  71.684269   3929100.0
    quotesdf[:2].append(quotesdf['2016-12-01':'2016-12-05'])

      7.2 Concat

        7.2.1 两个相同逻辑结构的对象连接

    In [52]: pieces = [tempdf[:2],tempdf[len(tempdf)-2:]]
    
    In [53]: pieces
    Out[53]: 
    [                 open      close       high        low      volume  month
     2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0      1
     2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0      1,
                      open      close       high        low     volume  month
     2017-01-19  77.610001  76.690002  77.779999  76.610001  8111400.0      1
     2017-01-20  75.989998  76.199997  76.910004  75.389999  8382000.0      1]
    
    In [54]: pd.concat(pieces)
    Out[54]: 
                     open      close       high        low      volume  month
    2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0      1
    2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0      1
    2017-01-19  77.610001  76.690002  77.779999  76.610001   8111400.0      1
    2017-01-20  75.989998  76.199997  76.910004  75.389999   8382000.0      1
    pd.concat([tempdf[:2],tempdf[len(tempdf)-2:]])

        7.2.2 两个不同逻辑结构的对象连接

    In [56]: piece1 = quotesdf[:3]
    
    In [57]: piece2 = tempdf[:3]
    
    In [58]: piece1
    Out[58]: 
                     open      close       high        low      volume
    2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0
    2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0
    2016-01-26  54.154255  54.046338  54.684026  53.918800  12844800.0
    
    In [59]: piece2
    Out[59]: 
                     open      close       high        low      volume  month
    2016-01-22  57.283819  54.016907  57.774347  53.114334  43783400.0      1
    2016-01-25  53.428272  53.977664  54.713455  53.114334  18498300.0      1
    2016-01-26  54.154255  54.046338  54.684026  53.918800  12844800.0      1
    
    In [60]: pd.concat([piece1,piece2],ignore_index=True)
    Out[60]: 
           close       high        low  month       open      volume
    0  54.016907  57.774347  53.114334    NaN  57.283819  43783400.0
    1  53.977664  54.713455  53.114334    NaN  53.428272  18498300.0
    2  54.046338  54.684026  53.918800    NaN  54.154255  12844800.0
    3  54.016907  57.774347  53.114334    1.0  57.283819  43783400.0
    4  53.977664  54.713455  53.114334    1.0  53.428272  18498300.0
    5  54.046338  54.684026  53.918800    1.0  54.154255  12844800.0
    
    In [61]: pd.concat([piece1,piece2],ignore_index=False)
    Out[61]: 
                    close       high        low  month       open      volume
    2016-01-22  54.016907  57.774347  53.114334    NaN  57.283819  43783400.0
    2016-01-25  53.977664  54.713455  53.114334    NaN  53.428272  18498300.0
    2016-01-26  54.046338  54.684026  53.918800    NaN  54.154255  12844800.0
    2016-01-22  54.016907  57.774347  53.114334    1.0  57.283819  43783400.0
    2016-01-25  53.977664  54.713455  53.114334    1.0  53.428272  18498300.0
    2016-01-26  54.046338  54.684026  53.918800    1.0  54.154255  12844800.0
    pd.concat([piece1,piece2],ignore_index=True)

       7.3 Join  -- 将 [AXP和KO近一年中每个月的交易总量表(包含code)] 与 [30只道琼斯股票信息] 合并

    In [190]: AKdf
    Out[190]: 
                volume code  month
    month                         
    1      190434600.0  AXP      1
    2      159711400.0  AXP      2
    3      129864200.0  AXP      3
    4       99855500.0  AXP      4
    5       96741700.0  AXP      5
    6      121788800.0  AXP      6
    7       90064900.0  AXP      7
    8       77514100.0  AXP      8
    9       95572800.0  AXP      9
    10     116243400.0  AXP     10
    11      99527200.0  AXP     11
    12      75948200.0  AXP     12
    1      244121600.0   KO      1
    2      304133400.0   KO      2
    3      347238100.0   KO      3
    4      317282000.0   KO      4
    5      230074600.0   KO      5
    6      265483400.0   KO      6
    7      235959400.0   KO      7
    8      235118300.0   KO      8
    9      251007200.0   KO      9
    10     264839100.0   KO     10
    11     316271200.0   KO     11
    12     280949400.0   KO     12
    
    In [191]: djidf
    Out[191]: 
        code                                         name lasttrade
    0   AAPL                                   Apple Inc.   120.000
    1    AXP                     American Express Company    76.200
    2     BA                           The Boeing Company   159.530
    3    CAT                             Caterpillar Inc.    94.580
    4   CSCO                          Cisco Systems, Inc.    30.100
    5    CVX                          Chevron Corporation   115.600
    6     DD         E. I. du Pont de Nemours and Company    73.030
    7    DIS                      The Walt Disney Company   107.660
    8     GE                     General Electric Company    30.530
    9     GS                The Goldman Sachs Group, Inc.   232.200
    10    HD                         The Home Depot, Inc.   135.600
    11   IBM  International Business Machines Corporation   170.550
    12  INTC                            Intel Corporation    36.940
    13   JNJ                        Johnson & Johnson   114.150
    14   JPM                     JPMorgan Chase & Co.    83.670
    15    KO                        The Coca-Cola Company    41.320
    16   MCD                       McDonald's Corporation   122.260
    17   MMM                                   3M Company   178.490
    18   MRK                        Merck & Co., Inc.    62.530
    19  MSFT                        Microsoft Corporation    62.740
    20   NKE                                   NIKE, Inc.    53.200
    21   PFE                                  Pfizer Inc.    31.770
    22    PG             The Procter & Gamble Company    87.450
    23   TRV                The Travelers Companies, Inc.   118.020
    24   UNH              UnitedHealth Group Incorporated   158.660
    25   UTX              United Technologies Corporation   110.790
    26     V                                    Visa Inc.    81.840
    27    VZ                  Verizon Communications Inc.    52.720
    28   WMT                        Wal-Mart Stores, Inc.    67.180
    29   XOM                      Exxon Mobil Corporation    85.890
    
    In [192]: pd.merge(djidf,AKdf,on='code').drop(['lasttrade'],axis=1)
    Out[192]: 
       code                      name       volume  month
    0   AXP  American Express Company  190434600.0      1
    1   AXP  American Express Company  159711400.0      2
    2   AXP  American Express Company  129864200.0      3
    3   AXP  American Express Company   99855500.0      4
    4   AXP  American Express Company   96741700.0      5
    5   AXP  American Express Company  121788800.0      6
    6   AXP  American Express Company   90064900.0      7
    7   AXP  American Express Company   77514100.0      8
    8   AXP  American Express Company   95572800.0      9
    9   AXP  American Express Company  116243400.0     10
    10  AXP  American Express Company   99527200.0     11
    11  AXP  American Express Company   75948200.0     12
    12   KO     The Coca-Cola Company  244121600.0      1
    13   KO     The Coca-Cola Company  304133400.0      2
    14   KO     The Coca-Cola Company  347238100.0      3
    15   KO     The Coca-Cola Company  317282000.0      4
    16   KO     The Coca-Cola Company  230074600.0      5
    17   KO     The Coca-Cola Company  265483400.0      6
    18   KO     The Coca-Cola Company  235959400.0      7
    19   KO     The Coca-Cola Company  235118300.0      8
    20   KO     The Coca-Cola Company  251007200.0      9
    21   KO     The Coca-Cola Company  264839100.0     10
    22   KO     The Coca-Cola Company  316271200.0     11
    23   KO     The Coca-Cola Company  280949400.0     12
    pd.merge(djidf,AKdf,on='code').drop(['lasttrade'],axis=1)

      +    =   

  • 相关阅读:
    2020.04.11【NOIP普及组】模拟赛C组25总结
    【华为云技术分享】云图说 | 容器交付流水线ContainerOps,提升持续交付效率
    【USACO 2020 January Silver】Wormhole Sort 题解
    【华为云技术分享】计算python程序执行时间
    HTAP会成为数据库的未来吗?
    【华为云技术分享】现代富文本编辑器Quill的模块化机制
    Cow Evolution 题解
    【USACO 2020 January Silver】Loan Repayment
    VS2012 黑色护眼主题
    常用的正则表达式
  • 原文地址:https://www.cnblogs.com/wnzhong/p/6340638.html
Copyright © 2020-2023  润新知