1.dataframe对象简述:
dataframe为pandas中一种有行列索引的二维数据结构,可以看成在普通二维结构上加上行列id标记
示例为创建一个2X3的dataframe:
1 import sys 2 import pandas as pd 3 import numpy as np 4 data = pd.DataFrame([[1, 2, 3],[4, 5, 6]], columns=['y0','y1','y2'], index=['x0','x1']) 5 print ("data: ",data) 6 7 ''' 8 data: 9 y0 y1 y2 10 x0 1 2 3 11 x1 4 5 6 12 '''
2.利用read函数读取数据到datafame:
pandas中的read函数可以从各种类型的文件中以及URL中读取数据到一个dataframe
示例为从一个txt文件中读取三个特征向量,表示长方体的长宽高:
1 import sys 2 import pandas as pd 3 import numpy as np 4 filepath = "D:\Code\PyCode" 5 filename = "in.txt" 6 column_names = ["length", "width", "high"] 7 #sep="..."规定了分隔符 8 data = pd.read_table(filepath +"\"+ filename,sep=" ", names = column_names ) 9 print (data," ","data.shape:",data.shape) 10 ''' 11 length width high 12 0 10 10 100 13 1 15 11 110 14 2 22 12 120 15 data.shape: (3, 3) 16 '''
注意:读取文件到dataframe时,若是指定列的标记,即在read函数中加入names=...,则读取到的data列索引为names指定的id,若是没有这个参数,列索引为源文件的第一行数据
示例:
1 import sys 2 import pandas as pd 3 import numpy as np 4 filepath = "D:\Code\PyCode" 5 filename = "in.txt" 6 column_names = ["length", "width", "high"] 7 #sep="..."规定了分隔符 8 data = pd.read_table(filepath +"\"+ filename,sep=",") 9 #data = pd.read_table(filepath +"\"+ filename,sep=",", names = column_names ) 10 print (data," ","data.shape:",data.shape) 11 ''' 12 10 10.1 100 13 0 15 11 110 14 1 22 12 120 15 data.shape: (2, 3) 16 '''
3.对dataframe进行列切片:
对上面读取到的三行三列的data选取其第二列到第三列:
1 data2 = data[column_names[1:3]] 2 print (data2) 3 print (data2.shape) 4 ''' 5 length width high 6 0 10 10 100 7 1 15 11 110 8 2 22 12 120 9 (3, 3) 10 width high 11 0 10 100 12 1 11 110 13 2 12 120 14 (3, 2) 15 ''' 16 data3 = data2[:n]#选取data2的前n行
4.pandas中读取文件的函数(截图来自《利用python进行数据分析》):