6. Grouping
6.1 统计近一年每个月的开盘天数:
In [38]: tempdf Out[38]: open close high low volume month 2016-01-22 57.283819 54.016907 57.774347 53.114334 43783400.0 1 2016-01-25 53.428272 53.977664 54.713455 53.114334 18498300.0 1 ... 2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0 1 [252 rows x 6 columns] In [39]: tempdf.groupby('month').count() Out[39]: open close high low volume month 1 19 19 19 19 19 2 20 20 20 20 20 ... 12 21 21 21 21 21 In [41]: tempdf.groupby('month').count().open Out[41]: month 1 19 2 20 3 22 4 21 5 21 6 22 7 20 8 23 9 21 10 21 11 21 12 21 Name: open, dtype: int64
6.2 统计近一年每个月的总成交量:(计算总和,再选出需求列)
In [43]: tempdf.groupby('month').sum() Out[43]: open close high low volume month 1 1317.435962 1311.872716 1328.947705 1301.086107 190434600.0 2 1056.469042 1058.853017 1066.995761 1046.069857 159711400.0 3 1281.837069 1286.438220 1293.423343 1270.692270 129864200.0 4 1296.296570 1299.964789 1308.367777 1288.042310 99855500.0 5 1332.088683 1333.182926 1342.942408 1323.906527 96741700.0 6 1365.402100 1363.315079 1376.771659 1350.264324 121788800.0 7 1244.562320 1247.524731 1254.559166 1237.280184 90064900.0 8 1484.674795 1484.189312 1492.571208 1475.708341 77514100.0 9 1342.668112 1341.954764 1353.150415 1332.294785 95572800.0 10 1331.611493 1329.281184 1341.236638 1320.451196 116243400.0 11 1459.883671 1463.727278 1471.155603 1449.657290 99527200.0 12 1549.889790 1549.352107 1561.082060 1538.528258 75948200.0 In [44]: tempdf.groupby('month').sum().volume Out[44]: month 1 190434600.0 2 159711400.0 3 129864200.0 4 99855500.0 5 96741700.0 6 121788800.0 7 90064900.0 8 77514100.0 9 95572800.0 10 116243400.0 11 99527200.0 12 75948200.0 Name: volume, dtype: float64
6.3 前两例总结为df_object.groupby('month')后跟 .sum() .mean() .min() .max()
6.4 更高效地统计近一年每个月的总成交量:(先选出需求列,再计算和。处理大数据时效果明显)
In [46]: tempdf.groupby('month').volume.sum() Out[46]: month 1 190434600.0 2 159711400.0 3 129864200.0 4 99855500.0 5 96741700.0 6 121788800.0 7 90064900.0 8 77514100.0 9 95572800.0 10 116243400.0 11 99527200.0 12 75948200.0 Name: volume, dtype: float64
7. Merge
7.1 Append
In [49]: quotesdf[:2].append(quotesdf['2016-12-01':'2016-12-05']) Out[49]: open close high low volume 2016-01-22 57.283819 54.016907 57.774347 53.114334 43783400.0 2016-01-25 53.428272 53.977664 54.713455 53.114334 18498300.0 2016-12-01 72.142314 72.221976 72.460955 71.475161 4298200.0 2016-12-02 72.212016 71.554823 72.351429 71.126648 2959100.0 2016-12-05 72.022829 71.724100 72.331509 71.684269 3929100.0
7.2 Concat
7.2.1 两个相同逻辑结构的对象连接
In [52]: pieces = [tempdf[:2],tempdf[len(tempdf)-2:]] In [53]: pieces Out[53]: [ open close high low volume month 2016-01-22 57.283819 54.016907 57.774347 53.114334 43783400.0 1 2016-01-25 53.428272 53.977664 54.713455 53.114334 18498300.0 1, open close high low volume month 2017-01-19 77.610001 76.690002 77.779999 76.610001 8111400.0 1 2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0 1] In [54]: pd.concat(pieces) Out[54]: open close high low volume month 2016-01-22 57.283819 54.016907 57.774347 53.114334 43783400.0 1 2016-01-25 53.428272 53.977664 54.713455 53.114334 18498300.0 1 2017-01-19 77.610001 76.690002 77.779999 76.610001 8111400.0 1 2017-01-20 75.989998 76.199997 76.910004 75.389999 8382000.0 1
7.2.2 两个不同逻辑结构的对象连接
In [56]: piece1 = quotesdf[:3] In [57]: piece2 = tempdf[:3] In [58]: piece1 Out[58]: open close high low volume 2016-01-22 57.283819 54.016907 57.774347 53.114334 43783400.0 2016-01-25 53.428272 53.977664 54.713455 53.114334 18498300.0 2016-01-26 54.154255 54.046338 54.684026 53.918800 12844800.0 In [59]: piece2 Out[59]: open close high low volume month 2016-01-22 57.283819 54.016907 57.774347 53.114334 43783400.0 1 2016-01-25 53.428272 53.977664 54.713455 53.114334 18498300.0 1 2016-01-26 54.154255 54.046338 54.684026 53.918800 12844800.0 1 In [60]: pd.concat([piece1,piece2],ignore_index=True) Out[60]: close high low month open volume 0 54.016907 57.774347 53.114334 NaN 57.283819 43783400.0 1 53.977664 54.713455 53.114334 NaN 53.428272 18498300.0 2 54.046338 54.684026 53.918800 NaN 54.154255 12844800.0 3 54.016907 57.774347 53.114334 1.0 57.283819 43783400.0 4 53.977664 54.713455 53.114334 1.0 53.428272 18498300.0 5 54.046338 54.684026 53.918800 1.0 54.154255 12844800.0 In [61]: pd.concat([piece1,piece2],ignore_index=False) Out[61]: close high low month open volume 2016-01-22 54.016907 57.774347 53.114334 NaN 57.283819 43783400.0 2016-01-25 53.977664 54.713455 53.114334 NaN 53.428272 18498300.0 2016-01-26 54.046338 54.684026 53.918800 NaN 54.154255 12844800.0 2016-01-22 54.016907 57.774347 53.114334 1.0 57.283819 43783400.0 2016-01-25 53.977664 54.713455 53.114334 1.0 53.428272 18498300.0 2016-01-26 54.046338 54.684026 53.918800 1.0 54.154255 12844800.0
7.3 Join -- 将 [AXP和KO近一年中每个月的交易总量表(包含code)] 与 [30只道琼斯股票信息] 合并
In [190]: AKdf Out[190]: volume code month month 1 190434600.0 AXP 1 2 159711400.0 AXP 2 3 129864200.0 AXP 3 4 99855500.0 AXP 4 5 96741700.0 AXP 5 6 121788800.0 AXP 6 7 90064900.0 AXP 7 8 77514100.0 AXP 8 9 95572800.0 AXP 9 10 116243400.0 AXP 10 11 99527200.0 AXP 11 12 75948200.0 AXP 12 1 244121600.0 KO 1 2 304133400.0 KO 2 3 347238100.0 KO 3 4 317282000.0 KO 4 5 230074600.0 KO 5 6 265483400.0 KO 6 7 235959400.0 KO 7 8 235118300.0 KO 8 9 251007200.0 KO 9 10 264839100.0 KO 10 11 316271200.0 KO 11 12 280949400.0 KO 12 In [191]: djidf Out[191]: code name lasttrade 0 AAPL Apple Inc. 120.000 1 AXP American Express Company 76.200 2 BA The Boeing Company 159.530 3 CAT Caterpillar Inc. 94.580 4 CSCO Cisco Systems, Inc. 30.100 5 CVX Chevron Corporation 115.600 6 DD E. I. du Pont de Nemours and Company 73.030 7 DIS The Walt Disney Company 107.660 8 GE General Electric Company 30.530 9 GS The Goldman Sachs Group, Inc. 232.200 10 HD The Home Depot, Inc. 135.600 11 IBM International Business Machines Corporation 170.550 12 INTC Intel Corporation 36.940 13 JNJ Johnson & Johnson 114.150 14 JPM JPMorgan Chase & Co. 83.670 15 KO The Coca-Cola Company 41.320 16 MCD McDonald's Corporation 122.260 17 MMM 3M Company 178.490 18 MRK Merck & Co., Inc. 62.530 19 MSFT Microsoft Corporation 62.740 20 NKE NIKE, Inc. 53.200 21 PFE Pfizer Inc. 31.770 22 PG The Procter & Gamble Company 87.450 23 TRV The Travelers Companies, Inc. 118.020 24 UNH UnitedHealth Group Incorporated 158.660 25 UTX United Technologies Corporation 110.790 26 V Visa Inc. 81.840 27 VZ Verizon Communications Inc. 52.720 28 WMT Wal-Mart Stores, Inc. 67.180 29 XOM Exxon Mobil Corporation 85.890 In [192]: pd.merge(djidf,AKdf,on='code').drop(['lasttrade'],axis=1) Out[192]: code name volume month 0 AXP American Express Company 190434600.0 1 1 AXP American Express Company 159711400.0 2 2 AXP American Express Company 129864200.0 3 3 AXP American Express Company 99855500.0 4 4 AXP American Express Company 96741700.0 5 5 AXP American Express Company 121788800.0 6 6 AXP American Express Company 90064900.0 7 7 AXP American Express Company 77514100.0 8 8 AXP American Express Company 95572800.0 9 9 AXP American Express Company 116243400.0 10 10 AXP American Express Company 99527200.0 11 11 AXP American Express Company 75948200.0 12 12 KO The Coca-Cola Company 244121600.0 1 13 KO The Coca-Cola Company 304133400.0 2 14 KO The Coca-Cola Company 347238100.0 3 15 KO The Coca-Cola Company 317282000.0 4 16 KO The Coca-Cola Company 230074600.0 5 17 KO The Coca-Cola Company 265483400.0 6 18 KO The Coca-Cola Company 235959400.0 7 19 KO The Coca-Cola Company 235118300.0 8 20 KO The Coca-Cola Company 251007200.0 9 21 KO The Coca-Cola Company 264839100.0 10 22 KO The Coca-Cola Company 316271200.0 11 23 KO The Coca-Cola Company 280949400.0 12
+ =