• 数据处理——数据合并


    # 一样,数据处理就先给导入pandas先
    import pandas as pd
    
    # df1==df2
    df1 = pd.DataFrame({'一班':[90,80,66,75,99,55,76,78,98,None,90],
                       '二班':[75,98,100,None,77,45,None,66,56,80,57],
                       '三班':[45,89,77,67,65,100,None,75,64,88,99]})
    df2 = pd.DataFrame({'一班':[90,80,66,75,99,55,76,78,98,None,90],
                      '二班':[75,98,100,None,77,45,None,66,56,80,57],
                      '三班':[45,89,77,67,65,100,None,75,64,88,99]})

    1数据堆叠

      数据堆叠分为以下两种:

      • 行堆叠
      • 列堆叠

      pd.concat(objs, axis=0)

    • objs:参与合并的多个DataFrame。无默认
    • axis:表示轴向,axis=0表示行合并,axis=1表示列合并
    pd.concat([df1, df2, df3], axis=1)
     一班三班二班一班三班二班一班三班二班
    0 90.0 45.0 75.0 90.0 45.0 75.0 90.0 45.0 75.0
    1 80.0 89.0 98.0 80.0 89.0 98.0 80.0 89.0 98.0
    2 66.0 77.0 100.0 66.0 77.0 100.0 66.0 77.0 100.0
    3 75.0 67.0 NaN 75.0 67.0 NaN 75.0 67.0 NaN
    4 99.0 65.0 77.0 99.0 65.0 77.0 99.0 65.0 77.0
    5 55.0 100.0 45.0 55.0 100.0 45.0 55.0 100.0 45.0
    6 76.0 NaN NaN 76.0 NaN NaN 76.0 NaN NaN
    7 78.0 75.0 66.0 78.0 75.0 66.0 78.0 75.0 66.0
    8 98.0 64.0 56.0 98.0 64.0 56.0 98.0 64.0 56.0
    9 NaN 88.0 80.0 NaN 88.0 80.0 NaN 88.0 80.0
    10 90.0 99.0 57.0 90.0 99.0 57.0 90.0 99.0 57.0

      当然,如果axis=0(行堆叠)时,也可以使用append函数

    # append 直接在末尾追加,注意特征数目相同,并且数据类型相同
    df1.append(df2)
     一班三班二班
    0 90.0 45.0 75.0
    1 80.0 89.0 98.0
    2 66.0 77.0 100.0
    3 75.0 67.0 NaN
    4 99.0 65.0 77.0
    5 55.0 100.0 45.0
    6 76.0 NaN NaN
    7 78.0 75.0 66.0
    8 98.0 64.0 56.0
    9 NaN 88.0 80.0
    10 90.0 99.0 57.0
    0 90.0 45.0 75.0
    1 80.0 89.0 98.0
    2 66.0 77.0 100.0
    3 75.0 67.0 NaN
    4 99.0 65.0 77.0
    5 55.0 100.0 45.0
    6 76.0 NaN NaN
    7 78.0 75.0 66.0
    8 98.0 64.0 56.0
    9 NaN 88.0 80.0
    10 90.0 99.0 57.0

    2主键合并

      主键合并大概是应用最关的合并方式了,也是我最喜欢的方式。

    pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, suffixes=('_x', '_y'))

    • left:表示进行合并的左边的DataFrame。无默认。
    • right:表示进行合并的右边的DataFrame。无默认。
    • how:表示合并的方法。默认为'inner'。可取'left'(左连接),'right'(右连接),'inner'(内连接),'outer'(外连接)。
    • on:表示合并的主键。默认为空。
    • left_on:表示左边的合并主键。默认为空。
    • right_on:表示右边的合并主键。默认为空。
    • suffixes:表示列名相同的时候的后缀。默认为('_x', '_y')
    # 合并数据
    pd.merge(df1, df2, on='一班')
     一班三班_x二班_x三班_y二班_y
    0 90.0 45.0 75.0 45.0 75.0
    1 90.0 45.0 75.0 99.0 57.0
    2 90.0 99.0 57.0 45.0 75.0
    3 90.0 99.0 57.0 99.0 57.0
    4 80.0 89.0 98.0 89.0 98.0
    5 66.0 77.0 100.0 77.0 100.0
    6 75.0 67.0 NaN 67.0 NaN
    7 99.0 65.0 77.0 65.0 77.0
    8 55.0 100.0 45.0 100.0 45.0
    9 76.0 NaN NaN NaN NaN
    10 78.0 75.0 66.0 75.0 66.0
    11 98.0 64.0 56.0 64.0 56.0
    12 NaN 88.0 80.0 88.0 80.0
    pd.merge(df1, df2, left_on='一班', right_on='二班', suffixes=('_1', '_2))
     一班_1三班_1二班_1一班_2三班_2二班_2
    0 80.0 89.0 98.0 NaN 88.0 80.0
    1 66.0 77.0 100.0 78.0 75.0 66.0
    2 75.0 67.0 NaN 90.0 45.0 75.0
    3 98.0 64.0 56.0 80.0 89.0 98.0
    4 NaN 88.0 80.0 75.0 67.0 NaN
    5 NaN 88.0 80.0 76.0 NaN NaN

    3重叠合并

      不是特别建议,毕竟重叠合并没什么依据,而且浪费数据资源。

      DataFrame.combine_first(other) 重叠合并,当两者皆有以前者为准,为空时,则使用后者的补上。

    df1['一班'].combine_first(df1['二班'])
    0     90.0
    1     80.0
    2     66.0
    3     75.0
    4     99.0
    5     55.0
    6     76.0
    7     78.0
    8     98.0
    9     80.0
    10    90.0
    Name: 一班, dtype: float64

    一个佛系的博客更新者,随手写写,看心情吧 (っ•̀ω•́)っ✎⁾⁾
  • 相关阅读:
    CSS预编译:less入门
    JavaScript学习(五):函数表达式
    关于JavaScript new 的一些疑问
    JavaScript学习(四):面对对象的程序设计
    JavaScript学习(三):引用类型
    JavaScript学习(二):变量、作用域和内存问题
    JavaScript学习(一):基本概念
    匿名函数的this指向为什么是window?
    阿里云ECS在CentOS 6.8中使用Nginx提示:nginx: [emerg] socket() [::]:80 failed (97: Address family not supported by protocol)的解决方法
    Centos释放缓存
  • 原文地址:https://www.cnblogs.com/WoLykos/p/9382201.html
Copyright © 2020-2023  润新知