Pandas 是python用于数据处理的拓展包
1. series系列:比列表多了索引的概念
1.2 列表可以转换成series,如下所示:
import pandas as pd my_list=[1,'two','three','l4','z5','v6'] s=pd.Series(my_list) print(s) 输出结果是: 0 1 1 two 2 three 3 l4 4 z5 5 v6 dtype: object
1.3 在创建series的时候,也可以自己添加索引的值:
s1=pd.Series([1,'two','three','l4','z5','v6'], index=['A','B','C','D','E','F']) print(s1) 结果如下: A 1 B two C three D l4 E z5 F v6 dtype: object
1.4 使用字典来创建series:
import pandas as pd cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None} apts=pd.Series(cities,name='income') print(apts) 结果如下: Beijing 55000.0 Shanghai 60000.0 shenzhen 50000.0 Hangzhou 20000.0 Guangzhou 45000.0 Suzhou NaN Name: income, dtype: float64
1.5. 可以像对待一个list一样对待一个Series,完成各种切片的操作,其他操作类似。
2. DataFrame:
DataFrame 类型类似于数据库表结构的数据结构,其含有行索引和列索引,可以将DataFrame 想成是由相同索引的Series组成的Dict类型。在其底层是通过二维以及一维的数据块实现。DataFrame即有行索引也有列索引,可以被看做是由Series组成的字典。
2.1直接创建,代码如下:
from pandas import DataFrame df = DataFrame([ ['a','b','c','d'], [1,2,3,4] ]) df2 = DataFrame(df,index=['one','two'],columns=['aa','bb','cc','dd']) #index是行索引,columns是列索引 print(df2) print(df2.index) print(df2.columns)
结果如下:
aa bb cc dd
one NaN NaN NaN NaN
two NaN NaN NaN NaN
Index(['one', 'two'], dtype='object')
Index(['aa', 'bb', 'cc', 'dd'], dtype='object')
2.2.通过字典创建DataFrame
from pandas import DataFrame dict1 = dict(aprt=['101', '102', '103'], profits=[1000, 2000, 3000], year=[2001, 2002, 2003], month=8) df3 = DataFrame(dict1) df3.index=['one','two','three'] print(df3) ## 字典的键作为DataFrame的列索引,值作为列数据 结果如下: aprt profits year month one 101 1000 2001 8 two 102 2000 2002 8 three 103 3000 2003 8
2.3. DataFrame读取csv文件的函数如下:
df4=pd.read_csv('data1.csv', sep=';',encoding='UIF-8',header=None)