• python数据类型之pandas—DataFrame


    DataFrame定义:

    DataFrame是pandas的两个主要数据结构之一,另一个是Series

    —一个表格型的数据结构

    —含有一组有序的列

    —大致可看成共享同一个index的Series集合

    DataFrame创建方式:

    默认方式创建:

    >>> data = {'name':['Wangdachui','Linling','Niuyun'],'pay':[4000,5000,6000]}
    >>> frame = pd.DataFrame(data)
    >>> frame
             name   pay
    0  Wangdachui  4000
    1     Linling  5000
    2      Niuyun  6000

    传入索引的方式创建:

    >>> data = np.array([('Wangdachui',4000),('Linling',5000),('Niuyun',6000)])
    >>> frame = pd.DataFrame(data,index = range(1,4),columns=['name','pay'])
    >>> frame
             name   pay
    1  Wangdachui  4000
    2     Linling  5000
    3      Niuyun  6000
    >>> frame.index
    RangeIndex(start=1, stop=4, step=1)
    >>> frame.columns
    Index(['name', 'pay'], dtype='object')
    >>> frame.values
    array([['Wangdachui', '4000'],
           ['Linling', '5000'],
           ['Niuyun', '6000']], dtype=object)

    DataFrame的基本操作:

    取DataFrame对象的行和列

      >>> frame
      name pay
      1 Wangdachui 4000
      2 Linling 5000
      3 Niuyun 6000

    >>> frame['name']
    1    Wangdachui
    2       Linling
    3        Niuyun
    Name: name, dtype: object
    >>> frame.pay 1 4000 2 5000 3 6000 Name: pay, dtype: object

    取特定的行或列

    >>> frame.iloc[:2,1]#取第0,1行的第1列
    1    4000
    2    5000
    Name: pay, dtype: object
    >>> frame.iloc[:1,0]#取第0行的第0列
    1    Wangdachui
    Name: name, dtype: object
    >>> frame.iloc[2,1]#取第2行的第1列
    '6000'
    >>> frame.iloc[2]#取第2行
    name    Niuyun
    pay       6000
    Name: 3, dtype: object

    DataFrame对象的修改和删除

    >>> frame['name']= 'admin'
    >>> frame
        name   pay
    1  admin  4000
    2  admin  5000
    3  admin  6000
    >>> del frame['pay']
    >>> frame
        name
    1  admin
    2  admin
    3  admin

    DataFrame的统计功能

    找最低工资和工资大于5000的人

    >>> frame
             name   pay
    1  Wangdachui  4000
    2     Linling  5000
    3      Niuyun  6000
    >>> frame.pay.min()
    '4000'
    >>> frame[frame.pay >= '5000']
          name   pay
    2  Linling  5000
    3   Niuyun  6000

    案例:

    已知有一个列表中存放了一组音乐数据:

    music_data = [("the rolling stones","Satisfaction"),("Beatles","Let It Be"),("Guns N'Roses","Don't Cry"),("Metallica","Nothing Else Matters")],请根据这组数据创建一个如下的DataFrame:

                          singer         song_name
    1   the rolling stones         Satisfaction
    2   Beatles                        Let It Be
    3   Guns N'Roses       Don't Cry
    4   Metallica                      Nothing Else Matters

    方法如下:

    >>> import pandas as pd
    >>> music_data = [("the rolling stones","Satisfaction"),("Beatles","Let It Be"),("Guns N'Roses","Don't Cry"),("Metallica","Nothing Else Matters")]
    >>> music_table = pd.DataFrame(music_data)
    >>> music_table
                        0                     1
    0  the rolling stones          Satisfaction
    1             Beatles             Let It Be
    2        Guns N'Roses             Don't Cry
    3           Metallica  Nothing Else Matters
    >>> music_table.index = range(1,5)
    >>> music_table.columns = ['singer','song_name']
    >>> print(music_table)
                   singer             song_name
    1  the rolling stones          Satisfaction
    2             Beatles             Let It Be
    3        Guns N'Roses             Don't Cry
    4           Metallica  Nothing Else Matters
    

    DataFrame基本操作补充

    DataFrame对象如下:

    >>> frame
             name   pay
    1  Wangdachui  4000
    2     Linling  5000
    3      Niuyun  6000

    (1)添加列

    添加列可以直接赋值,例如给frame添加tax列:

    >>> frame['tax'] = [0.05,0.05,0.1]
    >>> frame
             name   pay   tax
    1  Wangdachui  4000  0.05
    2     Linling  5000  0.05
    3      Niuyun  6000  0.10

    (2)添加行

    添加行可以用loc(标签)和iloc(位置)索引,也可以用append()和concat()方法,这里用loc()方法

    >>> frame.loc[5] = {'name':'Liuxi','pay':5000,'tax':0.05}
    >>> frame
             name   pay   tax
    1  Wangdachui  4000  0.05
    2     Linling  5000  0.05
    3      Niuyun  6000  0.10
    5       Liuxi  5000  0.05

    (3)删除对象元素 

    删除数据可直接用“del数据”的方式进行,但这种方式是直接对原始数据操作,不安全,可利用drop()方法删除指定轴上的数据

    >>> frame.drop(5)
             name   pay   tax
    1  Wangdachui  4000  0.05
    2     Linling  5000  0.05
    3      Niuyun  6000  0.10
    >>> frame.drop('tax',axis = 1)
             name   pay
    1  Wangdachui  4000
    2     Linling  5000
    3      Niuyun  6000
    5       Liuxi  5000

    此时frame没有受影响

    >>> frame
             name   pay   tax
    1  Wangdachui  4000  0.05
    2     Linling  5000  0.05
    3      Niuyun  6000  0.10
    5       Liuxi  5000  0.05

    (4)修改

    继承上面的frame,对tax统一修改成0.03

    >>> frame['tax'] = 0.03
    >>> frame
             name   pay   tax
    1  Wangdachui  4000  0.03
    2     Linling  5000  0.03
    3      Niuyun  6000  0.03
    5       Liuxi  5000  0.03

    也可以直接用loc()修改

    >>> frame.loc[5] = ['Liuxi',9800,0.05]
    >>> frame
             name   pay   tax
    1  Wangdachui  4000  0.03
    2     Linling  5000  0.03
    3      Niuyun  6000  0.03
    5       Liuxi  9800  0.05
    人生苦短,何不用python
  • 相关阅读:
    51Nod
    [HDU-5172] 单点查询线段树
    HihoCoder
    CodeForces
    计蒜客-T1271 完美K倍子数组
    [CodeForces-629A 用阶乘会爆掉
    计蒜客-A1139 dfs
    Codeforces Global Round 7 D2. Prefix-Suffix Palindrome (Hard version)(Manacher算法+输出回文字符串)
    HDU
    操作系统习题——虚地址转换为内存地址计算
  • 原文地址:https://www.cnblogs.com/yqpy/p/8338032.html
Copyright © 2020-2023  润新知