• pandas-10 pd.pivot_table()透视表功能


    pandas-10 pd.pivot_table()透视表功能

    和excel一样,pandas也有一个透视表的功能,具体demo如下:

    import numpy as np
    import pandas as pd
    from pandas import Series, DataFrame
    
    
    #显示所有列
    pd.set_option('display.max_columns', None)
    #显示所有行
    pd.set_option('display.max_rows', None)
    #设置value的显示长度为100,默认为50
    pd.set_option('max_colwidth',100)
    
    
    df = pd.read_excel('./sales-funnel.xlsx')
    print(df.head())
    '''
       Account                          Name            Rep       Manager  
    0   714466               Trantow-Barrows   Craig Booker  Debra Henley   
    1   714466               Trantow-Barrows   Craig Booker  Debra Henley   
    2   714466               Trantow-Barrows   Craig Booker  Debra Henley   
    3   737550  Fritsch, Russel and Anderson   Craig Booker  Debra Henley   
    4   146832                  Kiehn-Spinka  Daniel Hilton  Debra Henley   
    
           Product  Quantity  Price     Status  
    0          CPU         1  30000  presented  
    1     Software         1  10000  presented  
    2  Maintenance         2   5000    pending  
    3          CPU         1  35000   declined  
    4          CPU         2  65000        won
    '''
    
    print(pd.pivot_table(df, index=['Name']))
    '''
                                  Account   Price  Quantity
    Name                                                   
    Barton LLC                     740150   35000  1.000000
    Fritsch, Russel and Anderson   737550   35000  1.000000
    Herman LLC                     141962   65000  2.000000
    Jerde-Hilpert                  412290    5000  2.000000
    Kassulke, Ondricka and Metz    307599    7000  3.000000
    Keeling LLC                    688981  100000  5.000000
    Kiehn-Spinka                   146832   65000  2.000000
    Koepp Ltd                      729833   35000  2.000000
    Kulas Inc                      218895   25000  1.500000
    Purdy-Kunde                    163416   30000  1.000000
    Stokes LLC                     239344    7500  1.000000
    Trantow-Barrows                714466   15000  1.333333
    对名字进行了去重,将每个人的销售记录取进行统计,上例是求了均值。
    这是由aggfunc参数来决定的。
    '''
    
    print(pd.pivot_table(df, index=['Name'], aggfunc='sum'))
    '''
                                  Account   Price  Quantity
    Name                                                   
    Barton LLC                     740150   35000         1
    Fritsch, Russel and Anderson   737550   35000         1
    Herman LLC                     141962   65000         2
    Jerde-Hilpert                  412290    5000         2
    Kassulke, Ondricka and Metz    307599    7000         3
    Keeling LLC                    688981  100000         5
    Kiehn-Spinka                   146832   65000         2
    Koepp Ltd                     1459666   70000         4
    Kulas Inc                      437790   50000         3
    Purdy-Kunde                    163416   30000         1
    Stokes LLC                     478688   15000         2
    Trantow-Barrows               2143398   45000         4
    '''
    
    print(pd.pivot_table(df, index=['Name', 'Rep', 'Manager']))
    '''
                                                              Account    ...     Quantity
    Name                         Rep           Manager                   ...             
    Barton LLC                   John Smith    Debra Henley    740150    ...     1.000000
    Fritsch, Russel and Anderson Craig Booker  Debra Henley    737550    ...     1.000000
    Herman LLC                   Cedric Moss   Fred Anderson   141962    ...     2.000000
    Jerde-Hilpert                John Smith    Debra Henley    412290    ...     2.000000
    Kassulke, Ondricka and Metz  Wendy Yule    Fred Anderson   307599    ...     3.000000
    Keeling LLC                  Wendy Yule    Fred Anderson   688981    ...     5.000000
    Kiehn-Spinka                 Daniel Hilton Debra Henley    146832    ...     2.000000
    Koepp Ltd                    Wendy Yule    Fred Anderson   729833    ...     2.000000
    Kulas Inc                    Daniel Hilton Debra Henley    218895    ...     1.500000
    Purdy-Kunde                  Cedric Moss   Fred Anderson   163416    ...     1.000000
    Stokes LLC                   Cedric Moss   Fred Anderson   239344    ...     1.000000
    Trantow-Barrows              Craig Booker  Debra Henley    714466    ...     1.333333
    '''
    
    print(pd.pivot_table(df, index=['Manager', 'Rep']))
    # manager 和 rep 之间 存在 一对多的 关系
    '''
                                  Account         Price  Quantity
    Manager       Rep                                            
    Debra Henley  Craig Booker   720237.0  20000.000000  1.250000
                  Daniel Hilton  194874.0  38333.333333  1.666667
                  John Smith     576220.0  20000.000000  1.500000
    Fred Anderson Cedric Moss    196016.5  27500.000000  1.250000
                  Wendy Yule     614061.5  44250.000000  3.000000
    '''
    
    print(pd.pivot_table(df, index=['Manager', 'Rep'], values=['Price', 'Quantity']))
    '''
                                        Price  Quantity
    Manager       Rep                                  
    Debra Henley  Craig Booker   20000.000000  1.250000
                  Daniel Hilton  38333.333333  1.666667
                  John Smith     20000.000000  1.500000
    Fred Anderson Cedric Moss    27500.000000  1.250000
                  Wendy Yule     44250.000000  3.000000
    '''
    
    print(pd.pivot_table(df, index=['Manager', 'Rep'], values=['Price', 'Quantity'], columns=['Product']))
    '''
                                   Price               ...    Quantity         
    Product                          CPU Maintenance   ...     Monitor Software
    Manager       Rep                                  ...                     
    Debra Henley  Craig Booker   32500.0      5000.0   ...         NaN      1.0
                  Daniel Hilton  52500.0         NaN   ...         NaN      1.0
                  John Smith     35000.0      5000.0   ...         NaN      NaN
    Fred Anderson Cedric Moss    47500.0      5000.0   ...         NaN      1.0
                  Wendy Yule     82500.0      7000.0   ...         2.0      NaN
                  
    由以上输出可以看出,当column指定为product之后,price和quantity进行了细分,将每个product的详情列出。
    另外还可以设置一个fill_value的参数,可以将nan填充为某个值。
    '''
    
    '''
    总结:
        使用透视表之前,需要对原始数据有一个大概的了解,这样生成的透视表才能够有意义。
    '''
    
  • 相关阅读:
    8组-Alpha冲刺-2/6
    8组-Alpha冲刺-1/6
    8组 需求分析报告
    结对编程作业
    8组 团队展示
    第一次个人编程作业
    第一次博客作业
    面向对象程序设计寒假作业3
    面向对象程序设计寒假作业2
    面向对象程序设计寒假作业1
  • 原文地址:https://www.cnblogs.com/wenqiangit/p/11252770.html
Copyright © 2020-2023  润新知