DataFrame定义:
DataFrame是pandas的两个主要数据结构之一,另一个是Series
—一个表格型的数据结构
—含有一组有序的列
—大致可看成共享同一个index的Series集合
DataFrame创建方式:
默认方式创建:
>>> data = {'name':['Wangdachui','Linling','Niuyun'],'pay':[4000,5000,6000]} >>> frame = pd.DataFrame(data) >>> frame name pay 0 Wangdachui 4000 1 Linling 5000 2 Niuyun 6000
传入索引的方式创建:
>>> data = np.array([('Wangdachui',4000),('Linling',5000),('Niuyun',6000)]) >>> frame = pd.DataFrame(data,index = range(1,4),columns=['name','pay']) >>> frame name pay 1 Wangdachui 4000 2 Linling 5000 3 Niuyun 6000 >>> frame.index RangeIndex(start=1, stop=4, step=1) >>> frame.columns Index(['name', 'pay'], dtype='object') >>> frame.values array([['Wangdachui', '4000'], ['Linling', '5000'], ['Niuyun', '6000']], dtype=object)
DataFrame的基本操作:
取DataFrame对象的行和列
>>> frame
name pay
1 Wangdachui 4000
2 Linling 5000
3 Niuyun 6000
>>> frame['name'] 1 Wangdachui 2 Linling 3 Niuyun Name: name, dtype: object
>>> frame.pay 1 4000 2 5000 3 6000 Name: pay, dtype: object
取特定的行或列
>>> frame.iloc[:2,1]#取第0,1行的第1列 1 4000 2 5000 Name: pay, dtype: object >>> frame.iloc[:1,0]#取第0行的第0列 1 Wangdachui Name: name, dtype: object >>> frame.iloc[2,1]#取第2行的第1列 '6000' >>> frame.iloc[2]#取第2行 name Niuyun pay 6000 Name: 3, dtype: object
DataFrame对象的修改和删除
>>> frame['name']= 'admin' >>> frame name pay 1 admin 4000 2 admin 5000 3 admin 6000
>>> del frame['pay'] >>> frame name 1 admin 2 admin 3 admin
DataFrame的统计功能
找最低工资和工资大于5000的人
>>> frame name pay 1 Wangdachui 4000 2 Linling 5000 3 Niuyun 6000 >>> frame.pay.min() '4000' >>> frame[frame.pay >= '5000'] name pay 2 Linling 5000 3 Niuyun 6000
案例:
已知有一个列表中存放了一组音乐数据:
music_data = [("the rolling stones","Satisfaction"),("Beatles","Let It Be"),("Guns N'Roses","Don't Cry"),("Metallica","Nothing Else Matters")],请根据这组数据创建一个如下的DataFrame:
singer song_name
1 the rolling stones Satisfaction
2 Beatles Let It Be
3 Guns N'Roses Don't Cry
4 Metallica Nothing Else Matters
方法如下:
>>> import pandas as pd >>> music_data = [("the rolling stones","Satisfaction"),("Beatles","Let It Be"),("Guns N'Roses","Don't Cry"),("Metallica","Nothing Else Matters")] >>> music_table = pd.DataFrame(music_data) >>> music_table 0 1 0 the rolling stones Satisfaction 1 Beatles Let It Be 2 Guns N'Roses Don't Cry 3 Metallica Nothing Else Matters >>> music_table.index = range(1,5) >>> music_table.columns = ['singer','song_name'] >>> print(music_table) singer song_name 1 the rolling stones Satisfaction 2 Beatles Let It Be 3 Guns N'Roses Don't Cry 4 Metallica Nothing Else Matters
DataFrame基本操作补充
DataFrame对象如下:
>>> frame
name pay
1 Wangdachui 4000
2 Linling 5000
3 Niuyun 6000
(1)添加列
添加列可以直接赋值,例如给frame添加tax列:
>>> frame['tax'] = [0.05,0.05,0.1] >>> frame name pay tax 1 Wangdachui 4000 0.05 2 Linling 5000 0.05 3 Niuyun 6000 0.10
(2)添加行
添加行可以用loc(标签)和iloc(位置)索引,也可以用append()和concat()方法,这里用loc()方法
>>> frame.loc[5] = {'name':'Liuxi','pay':5000,'tax':0.05} >>> frame name pay tax 1 Wangdachui 4000 0.05 2 Linling 5000 0.05 3 Niuyun 6000 0.10 5 Liuxi 5000 0.05
(3)删除对象元素
删除数据可直接用“del数据”的方式进行,但这种方式是直接对原始数据操作,不安全,可利用drop()方法删除指定轴上的数据
>>> frame.drop(5)
name pay tax
1 Wangdachui 4000 0.05
2 Linling 5000 0.05
3 Niuyun 6000 0.10
>>> frame.drop('tax',axis = 1) name pay 1 Wangdachui 4000 2 Linling 5000 3 Niuyun 6000 5 Liuxi 5000
此时frame没有受影响
>>> frame
name pay tax
1 Wangdachui 4000 0.05
2 Linling 5000 0.05
3 Niuyun 6000 0.10
5 Liuxi 5000 0.05
(4)修改
继承上面的frame,对tax统一修改成0.03
>>> frame['tax'] = 0.03 >>> frame name pay tax 1 Wangdachui 4000 0.03 2 Linling 5000 0.03 3 Niuyun 6000 0.03 5 Liuxi 5000 0.03
也可以直接用loc()修改
>>> frame.loc[5] = ['Liuxi',9800,0.05] >>> frame name pay tax 1 Wangdachui 4000 0.03 2 Linling 5000 0.03 3 Niuyun 6000 0.03 5 Liuxi 9800 0.05