问题:
输出新建的DataFrame对象时,DataFrame中各列的显示顺序和DataFrame定义中的顺序不一致。
例如:
import pandas as pd
grades = [48,99,75,80,42,80,72,68,36,78]
df = pd.DataFrame( {'ID': ["x%d" % r for r in range(10)],
'Gender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
'ExamYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
'Class': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
'Participated': ['yes','yes','yes','yes','no','yes','yes','yes','yes','yes'],
'Passed': ['yes' if x > 50 else 'no' for x in grades],
'Employed': [True,True,True,False,False,False,False,True,True,False],
'Grade': grades})
print(df)
输出为:
Class Employed ExamYear Gender Grade ID Participated Passed
0 algebra True 2007 F 48 x0 yes no
1 stats True 2007 M 99 x1 yes yes
2 bio True 2007 F 75 x2 yes yes
3 algebra False 2008 M 80 x3 yes yes
4 algebra False 2008 F 42 x4 no no
5 stats False 2008 M 80 x5 yes yes
6 stats False 2008 F 72 x6 yes yes
7 algebra True 2009 M 68 x7 yes yes
8 bio True 2009 M 36 x8 yes no
9 bio False 2009 M 78 x9 yes yes
解决办法
在以上代码中增加以下代码:
cols=['ID','Gender','ExamYear','Class','Participated','Passed','Employed','Grade']
df=df.ix[:,cols]
df=df.ix[:,cols]语句表示,DataFrame的行索引不变,列索引是cols中给定的索引。
输出为:
ID Gender ExamYear Class Participated Passed Employed Grade
0 x0 F 2007 algebra yes no True 48
1 x1 M 2007 stats yes yes True 99
2 x2 F 2007 bio yes yes True 75
3 x3 M 2008 algebra yes yes False 80
4 x4 F 2008 algebra no no False 42
5 x5 M 2008 stats yes yes False 80
6 x6 F 2008 stats yes yes False 72
7 x7 M 2009 algebra yes yes True 68
8 x8 M 2009 bio yes no True 36
9 x9 M 2009 bio yes yes False 78
来源于https://www.zhangshengrong.com/p/ArXGrLDBNj/