• pandas index索引对象与重建索引


    一、Index

    Pandas中的索引对象Index用于存储轴标签和其它元数据。索引对象是不可变的,用户无法修改它。

    In [73]: obj = pd.Series(range(3),index = ['a','b','c'])
    In [74]: index = obj.index
    In [75]: index
    Out[75]: Index(['a', 'b', 'c'], dtype='object')
    In [76]: index[1:]
    Out[76]: Index(['b', 'c'], dtype='object')
    In [77]: index[1] = 'f'  # TypeError
    In [8]: index.size
    Out[8]: 3
    In [9]: index.shape
    Out[9]: (3,)
    In [10]: index.ndim
    Out[10]: 1
    In [11]: index.dtype
    Out[11]: dtype('O')

    索引对象的不可变特性,使得在多种数据结构中分享索引对象更安全:

    In [78]: labels = pd.Index(np.arange(3))
    In [79]: labels
    Out[79]: Int64Index([0, 1, 2], dtype='int64')
    In [80]: obj2 = pd.Series([2,3.5,0], index=labels)
    In [81]: obj2
    Out[81]:
    0    2.0
    1    3.5
    2    0.0
    dtype: float64
    In [82]: obj2.index is labels
    Out[82]: True

    索引对象,本质上也是一个容器对象,所以可以使用Python的in操作:

    In [84]: f2
    Out[84]:
    key    year     state  pop  debt
    order
    a      2000   beijing  1.5   NaN
    b      2001   beijing  1.7   NaN
    c      2002   beijing  3.6   1.0
    d      2001  shanghai  2.4   2.0
    e      2002  shanghai  2.9   NaN
    f      2003  shanghai  3.2   3.0
    In [86]: 'c' in f2.index
    Out[86]: True
    In [88]: 'pop' in f2.columns
    Out[88]: True

    而且最关键的是,pandas的索引对象可以包含重复的标签:

    In [89]: dup_lables = pd.Index(['foo','foo','bar','bar'])
    In [90]: dup_lables
    Out[90]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object')

    那么思考一下,DataFrame对象可不可以有重复的columns或者index呢?

    可以的!但是请尽量不要这么做!:

    In [91]: f2.index = ['a']*6
    In [92]: f2
    Out[92]:
    key  year     state  pop  debt
    a    2000   beijing  1.5   NaN
    a    2001   beijing  1.7   NaN
    a    2002   beijing  3.6   1.0
    a    2001  shanghai  2.4   2.0
    a    2002  shanghai  2.9   NaN
    a    2003  shanghai  3.2   3.0
    In [93]: f2.loc['a']
    Out[93]:
    key  year     state  pop  debt
    a    2000   beijing  1.5   NaN
    a    2001   beijing  1.7   NaN
    a    2002   beijing  3.6   1.0
    a    2001  shanghai  2.4   2.0
    a    2002  shanghai  2.9   NaN
    a    2003  shanghai  3.2   3.0
    In [94]: f2.columns = ['year']*4
    In [95]: f2
    Out[95]:
       year      year  year  year
    a  2000   beijing   1.5   NaN
    a  2001   beijing   1.7   NaN
    a  2002   beijing   3.6   1.0
    a  2001  shanghai   2.4   2.0
    a  2002  shanghai   2.9   NaN
    a  2003  shanghai   3.2   3.0
    In [96]: f2.index.is_unique  # 可以使用这个属性来判断是否是唯一的索引
    Out[96]: False

    index对象也可以进行集合的交、并、差和异或运算,类似Python的标准set数据结构。

    二、重建索引

    reindex方法用于重新为Pandas对象设置新索引。这不是就地修改,而是会参照原有数据,调整顺序。

    In [96]: obj=pd.Series([4.5,7.2,-5.3,3.6],index = ['d','b','a','c'])
    In [97]: obj
    Out[97]:
    d    4.5
    b    7.2
    a   -5.3
    c    3.6
    dtype: float64

    reindex会按照新的索引进行排列,不存在的索引将引入缺失值:

    In [99]: obj2 = obj.reindex(list('abcde'))
    In [100]: obj2
    Out[100]:
    a   -5.3
    b    7.2
    c    3.6
    d    4.5
    e    NaN
    dtype: float64

    也可以为缺失值指定填充方式method参数,比如ffill表示向前填充,bfill表示向后填充:

    In [101]: obj3 = pd.Series(['blue','purple','yellow'],index = [0,2,4])
    In [102]: obj3
    Out[102]:
    0      blue
    2    purple
    4    yellow
    dtype: object
    
    In [103]: obj3.reindex(range(6),method='ffill')
    Out[103]:
    0      blue
    1      blue
    2    purple
    3    purple
    4    yellow
    5    yellow
    dtype: object

    对于DataFrame这种二维对象,如果执行reindex方法时只提供一个列表参数,则默认是修改行索引。可以用关键字参数columns指定修改的是列索引:

    In [104]: f = pd.DataFrame(np.arange(9).reshape((3,3)),index=list('acd'),columns=['beijing','shanghai','guangzhou'])
    In [105]: f
    Out[105]:
       beijing  shanghai  guangzhou
    a        0         1          2
    c        3         4          5
    d        6         7          8
    In [106]: f2 = f.reindex(list('abcd'))
    In [107]: f2
    Out[107]:
       beijing  shanghai  guangzhou
    a      0.0       1.0        2.0
    b      NaN       NaN        NaN
    c      3.0       4.0        5.0
    d      6.0       7.0        8.0
    In [112]: f3 = f.reindex(columns=['beijing','shanghai','xian','guangzhou'])
    In [113]: f3
    Out[113]:
       beijing  shanghai  xian  guangzhou
    a        0         1   NaN          2
    c        3         4   NaN          5
    d        6         7   NaN          8
  • 相关阅读:
    UVa-272-TEX Quotes
    UVa-10881-蚂蚁
    UVa-1339-古老的密码
    POJ-1328-放置雷达
    POJ-3190-分配畜栏
    Openjudge-2787-算24
    WHYZOJ-#47. 滑行的窗口(单调队列)
    2017年9月16日18:03:54
    WHYZOJ-#93. 暗黑破坏神(垃圾01背包)
    WHYZOJ-#95 大逃亡(二分+BFS)(好题!!!)
  • 原文地址:https://www.cnblogs.com/lavender1221/p/12671173.html
Copyright © 2020-2023  润新知