• pandas 数据结构的基本功能


    操作Series和DataFrame中的数据的常用方法:

    导入python库:

    import numpy as np
    import pandas as pd

    测试的数据结构:

    Series:

    >>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
    >>> obj
    d    4.5
    b    7.2
    a   -5.3
    c    3.6
    dtype: float64

    DataFrame:

    >>> data = {
    ...     'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],
    ...     'year': [2000, 2001, 2002, 2001, 2002],
    ...     'pop': [1.5, 1.7, 3.6, 2.4, 2.9]
    ... }
    >>> frame = pd.DataFrame(data)
    >>> frame
       pop   state  year
    0  1.5    Ohio  2000
    1  1.7    Ohio  2001
    2  3.6    Ohio  2002
    3  2.4  Nevada  2001
    4  2.9  Nevada  2002

    重新索引 reindex():

      创建一个适应新索引的新对象:

      对于Series来说,只有列索引(数据标签):

      调用该Series的reindex将会根据新索引进行重排。如果某个索引值当前不存在,就引入缺失值

      例:将 ['d', 'b', 'a', 'c'] 替换为 ['a', 'b', 'c', 'd', 'e']   e不存在 ,自动引入缺失值NaN,可以使用fill_value手动选择缺失值

    >>> obj.reindex(['a', 'b', 'c', 'd', 'e'])
    a   -5.3
    b    7.2
    c    3.6
    d    4.5
    e    NaN
    dtype: float64
    >>> obj.reindex(['a', 'b', 'c', 'd', 'e'],fill_value=666)
    a     -5.3
    b      7.2
    c      3.6
    d      4.5
    e    666.0
    dtype: float64

      对于DataFrame来说,既有行索引也有列索引,默认是行索引,但也可同时进行重新索引(使用方法看例子和输出结果)。

      例:需要注意的是,int和str的区别,默认的索引类型是int型,

    >>> frame
       pop   state  year
    0  1.5    Ohio  2000
    1  1.7    Ohio  2001
    2  3.6    Ohio  2002
    3  2.4  Nevada  2001
    4  2.9  Nevada  2002
    >>> frame.reindex([4,3,2,1,0])
       pop   state  year
    4  2.9  Nevada  2002
    3  2.4  Nevada  2001
    2  3.6    Ohio  2002
    1  1.7    Ohio  2001
    0  1.5    Ohio  2000
    >>> frame.reindex(['4','3','2','1','0'])
       pop state  year
    4  NaN   NaN   NaN
    3  NaN   NaN   NaN
    2  NaN   NaN   NaN
    1  NaN   NaN   NaN
    0  NaN   NaN   NaN
    >>> frame.reindex(['a', 'b', 'c', 'd', 'e'])
       pop state  year
    a  NaN   NaN   NaN
    b  NaN   NaN   NaN
    c  NaN   NaN   NaN
    d  NaN   NaN   NaN
    e  NaN   NaN   NaN
    >>> frame.reindex([4,3,2,1,0],columns=['year', 'state', 'pop'])
       year   state  pop
    4  2002  Nevada  2.9
    3  2001  Nevada  2.4
    2  2002    Ohio  3.6
    1  2001    Ohio  1.7
    0  2000    Ohio  1.5
    >>> frame.reindex(index=[4,3,2,1,0],columns=['year', 'state', 'pop'])
       year   state  pop
    4  2002  Nevada  2.9
    3  2001  Nevada  2.4
    2  2002    Ohio  3.6
    1  2001    Ohio  1.7
    0  2000    Ohio  1.5

    删除指定行/列的项:

      对于Series来说,只有列的概念:

    >>> obj
    d    4.5
    b    7.2
    a   -5.3
    c    3.6
    dtype: float64
    >>> obj.drop(['d','a'])
    b    7.2
    c    3.6
    dtype: float64

      对于DataFrame来说,既有行也有列,默认是删除行,删除列时设置axis为1, 否则会报错(使用方法看例子和输出结果)。

       

    >>> frame
       pop   state  year
    0  1.5    Ohio  2000
    1  1.7    Ohio  2001
    2  3.6    Ohio  2002
    3  2.4  Nevada  2001
    4  2.9  Nevada  2002
    >>> frame.drop([0,1])
       pop   state  year
    2  3.6    Ohio  2002
    3  2.4  Nevada  2001
    4  2.9  Nevada  2002
    >>> frame.drop(['pop'])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 2530, in drop
        obj = obj._drop_axis(labels, axis, level=level, errors=errors)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/generic.py", line 2562, in _drop_axis
        new_axis = axis.drop(labels, errors=errors)
      File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3744, in drop
        labels[mask])
    ValueError: labels ['pop'] not contained in axis
    >>> frame.drop(['pop'],axis=1)
        state  year
    0    Ohio  2000
    1    Ohio  2001
    2    Ohio  2002
    3  Nevada  2001
    4  Nevada  2002

    索引 ,选取,过滤:

      Series:

        选取:

          series的选取类似于list;不同的是 series既可以使用数字索引选取,也可以使用自定标签索引选取。

    >>> obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
    >>> obj
    d    4.5
    b    7.2
    a   -5.3
    c    3.6
    dtype: float64
    >>> obj['d']
    4.5
    >>> obj[0]
    4.5

        赋值:赋值:

          与选取类似。

    >>> obj['d'] = 0
    >>> obj['d']
    0.0
    >>> obj
    d    0.0
    b    7.2
    a   -5.3
    c    3.6
    dtype: float64
    >>> obj[0] = 88
    >>> obj
    d    88.0
    b     7.2
    a    -5.3
    c     3.6
    dtype: float64

      DataFrame:

        选取:

          DataFrame默认的索引指的是列索引,并且只能使用列标签索引,不能使用数字索引会报错(返回Series对象)。

          DataFrame可以使用切片功能来进行 行索引选取(返回DataFrame对象)。

          DataFrame也可以使用DataFrame.ix[val]来进行具体选取(返回Series对象)。使用方法:frame.ix[0]返回第一行的Series对象。frame.ix[1,['year']]返回第二行,第year列的Series对象。

    例:列索引

    >>> frame
       year   state  pop
    0  2000    Ohio  1.5
    1  2001    Ohio  1.7
    2  2002    Ohio  3.6
    3  2001  Nevada  2.4
    4  2002  Nevada  2.9
    >>> frame['year']
    0    2000
    1    2001
    2    2002
    3    2001
    4    2002
    Name: year, dtype: int64
    >>> frame[0]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2525, in get_loc
        return self._engine.get_loc(key)
      File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
      File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
      File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
      File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
    KeyError: 0

    例:行索引

    >>> frame
       year   state  pop
    0  2000    Ohio  1.5
    1  2001    Ohio  1.7
    2  2002    Ohio  3.6
    3  2001  Nevada  2.4
    4  2002  Nevada  2.9
    >>> frame[0:2]
       year state  pop
    0  2000  Ohio  1.5
    1  2001  Ohio  1.7
    >>> frame[0:1]
       year state  pop
    0  2000  Ohio  1.5
    >>> frame.ix[0]
    year     2000
    state    Ohio
    pop       1.5
    Name: 0, dtype: object

    例:ix索引

    >>> frame.ix[0]
    year     2000
    state    Ohio
    pop       1.5
    Name: 0, dtype: object
    >>> frame.ix[1,['year']]
    year    2001
    Name: 1, dtype: object

    例:返回格式

    >>> type(frame['year'])
    <class 'pandas.core.series.Series'>
    
    
    >>> type(frame[0:2])
    <class 'pandas.core.frame.DataFrame'>
    
    
    >>> type(frame.ix[0])
    <class 'pandas.core.series.Series'>
    
    >>> type(frame.ix[0,['year']])
    <class 'pandas.core.series.Series'>

         赋值:

    例:DataFrame赋值

    #frame
    >>> frame
       year   state  pop
    0  2000    Ohio  1.5
    1  2001    Ohio  1.7
    2  2002    Ohio  3.6
    3  2001  Nevada  2.4
    4  2002  Nevada  2.9
    #对frame列赋值非list是会对整列赋值
    >>> frame['year'] = 5
    >>> frame
       year   state  pop
    0     5    Ohio  1.5
    1     5    Ohio  1.7
    2     5    Ohio  3.6
    3     5  Nevada  2.4
    4     5  Nevada  2.9
    >>> frame['year'] = 'test'
    >>> frame
       year   state  pop
    0  test    Ohio  1.5
    1  test    Ohio  1.7
    2  test    Ohio  3.6
    3  test  Nevada  2.4
    4  test  Nevada  2.9
    
    #对frame列赋值进行list整列赋值是必须保证list长度等于行的长度。
    >>> frame['year'] = range(5)
    >>> frame
       year   state  pop
    0     0    Ohio  1.5
    1     1    Ohio  1.7
    2     2    Ohio  3.6
    3     3  Nevada  2.4
    4     4  Nevada  2.9
    >>> frame['year'] = range(4)
    Traceback (most recent call last):
    ValueError: Length of values does not match length of index
    
    
    
    #行赋值
    >>> frame.ix[0] = 5
    >>> frame
       year   state  pop
    0     5       5  5.0
    1     1    Ohio  1.7
    2     2    Ohio  3.6
    3     3  Nevada  2.4
    4     4  Nevada  2.9

     算术运算:

          

  • 相关阅读:
    Java 自定义注解结合 Aop 切面和本地缓存实现接口防重复请求提交
    SpringBoot与SpringCloud的关系与区别
    【超详细全过程】JavaEE 开发环境安装全过程(jdk+tomcat+eclipse)
    Sentinel 限流降级,Sentinel持久化
    JAVA 调用泛型参数的静态方法
    搜索常见问题
    Leetcode 101. 对称二叉树 简单
    PaddlePaddle inference 源码分析(五)graph和pass
    现代C++之理解auto类型推断(转)
    c++11 新特性 (转)
  • 原文地址:https://www.cnblogs.com/JansXin/p/8124130.html
Copyright © 2020-2023  润新知