• python groupby agg()


    • 构造数据
      import pandas as pd
       
      df = pd.DataFrame({'Country':['China','China', 'India', 'India', 'America', 'Japan', 'China', 'India'], 
                         'Income':[10000, 10000, 5000, 5002, 40000, 50000, 8000, 5000],
                          'Age':[5000, 4321, 1234, 4010, 250, 250, 4500, 4321]})
      

        结果如下:

       Age  Country  Income
      0  5000    China   10000
      1  4321    China   10000
      2  1234    India    5000
      3  4010    India    5002
      4   250  America   40000
      5   250    Japan   50000
      6  4500    China    8000
      7  4321    India    5000
      
    • 单列分组

    df_gb = df.groupby('Country')
    for index, data in df_gb:
        print(index)
        print(data)
    输出
    America
       Age  Country  Income
    4  250  America   40000
    China
        Age Country  Income
    0  5000   China   10000
    1  4321   China   10000
    6  4500   China    8000
    India
        Age Country  Income
    2  1234   India    5000
    3  4010   India    5002
    7  4321   India    5000
    Japan
       Age Country  Income
    5  250   Japan   50000
    

      

    •     多列分组
      df_gb = df.groupby(['Country', 'Income'])
      for (index1, index2), data in df_gb:
          print((index1, index2))
          print(data)
       
      输出
       
      ('America', 40000)
         Age  Country  Income
      4  250  America   40000
      ('China', 8000)
          Age Country  Income
      6  4500   China    8000
      ('China', 10000)
          Age Country  Income
      0  5000   China   10000
      1  4321   China   10000
      ('India', 5000)
          Age Country  Income
      2  1234   India    5000
      7  4321   India    5000
      ('India', 5002)
          Age Country  Income
      3  4010   India    5002
      ('Japan', 50000)
         Age Country  Income
      5  250   Japan   50000
      

       聚合函数,对分组后数据进行聚合

    •  

      df_agg = df.groupby('Country').agg(['min', 'mean', 'max'])
      print(df_agg)
      输出
         Age                    Income                     
                min         mean   max    min          mean    max
      Country                                                     
      America   250   250.000000   250  40000  40000.000000  40000
      China    4321  4607.000000  5000   8000   9333.333333  10000
      India    1234  3188.333333  4321   5000   5000.666667   5002
      Japan     250   250.000000   250  50000  50000.000000  50000
      

      对分组后的部分列进行聚合

    • num_agg = {'Age':['min', 'mean', 'max']}
      print(df.groupby('Country').agg(num_agg))
      输出
        Age                   
                min         mean   max
      Country                         
      America   250   250.000000   250
      China    4321  4607.000000  5000
      India    1234  3188.333333  4321
      Japan     250   250.000000   250
      

        

      num_agg = {'Age':['min', 'mean', 'max'], 'Income':['min', 'max']}
      print(df.groupby('Country').agg(num_agg))
      输出
            Age                    Income       
                min         mean   max    min    max
      Country                                       
      America   250   250.000000   250  40000  40000
      China    4321  4607.000000  5000   8000  10000
      India    1234  3188.333333  4321   5000   5002
      Japan     250   250.000000   250  50000  50000
      

        

        

  • 相关阅读:
    吴军博士:物联网和人工智能将再造一个英特尔和微软 | 万物互联
    速来膜拜!20位活跃在Github上的国内技术大牛
    创建带Mipmap的osg::Image
    C#文件系统管理【转】
    C#文本文件(.txt)读写 [转]
    C#连接SQL Server数据库进行简单操作[转]
    shell脚本把一些请求量非常高的ip给拒绝掉
    linux获取精准进程PID之pgrep命令
    Kubernetes的Cron Job
    StatefulSet和Deployment的区别
  • 原文地址:https://www.cnblogs.com/qijiujiu/p/13524553.html
Copyright © 2020-2023  润新知