• pandas.DataFrame.reindex的使用介绍


    参考链接:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex

    DataFrame.reindex(labels=Noneindex=Nonecolumns=Noneaxis=Nonemethod=Nonecopy=Truelevel=Nonefill_value=nanlimit=Nonetolerance=None)[source]

    Conform Series/DataFrame to new index with optional filling logic.

    Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

    Parameters
    keywords for axesarray-like, optional

    New labels / index to conform to, should be specified using keywords. Preferably an Index object to avoid duplicating data.

    method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}

    Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

    • None (default): don’t fill gaps

    • pad / ffill: Propagate last valid observation forward to next valid.

    • backfill / bfill: Use next valid observation to fill gap.

    • nearest: Use nearest valid observations to fill gap.

    copybool, default True

    Return a new object, even if the passed indexes are the same.

    levelint or name

    Broadcast across a level, matching Index values on the passed MultiIndex level.

    fill_valuescalar, default np.NaN

    Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

    limitint, default None

    Maximum number of consecutive elements to forward or backward fill.

    toleranceoptional

    Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] target) <= tolerance.

    Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

    DataFrame.reindex supports two calling conventions

    • (index=index_labels, columns=column_labels, ...)

    • (labels, axis={'index', 'columns'}, ...)

    We highly recommend using keyword arguments to clarify your intent.

    通过查寻了解,这个主要是外部定义一个索引,返回一个新的df对象,对于新的索引的缺省项,可以设置一些默认值。

    可以通过两种方式传参,推荐使用第一种。

    参数col_level在我调试的版本中已经改为level

    书中示例代码,该方法主要用于重设index,并且为新的index中的内容添加默认值。

    In [123]: index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror'] 
         ...: df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301], 
         ...:                   'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]}, 
         ...:                   index=index)                                                                    
    
    In [124]: df                                                                                                
    Out[124]: 
               http_status  response_time
    Firefox            200           0.04
    Chrome             200           0.02
    Safari             404           0.07
    IE10               404           0.08
    Konqueror          301           1.00
    
    In [125]:     
    

      定义了一个df对象,定义了一个index

    后面将定义一个新的index对象,另外使用默认参数

    In [130]: new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10', 
         ...:              'Chrome']                                                                            
    
    In [131]: df                                                                                                
    Out[131]: 
               http_status  response_time
    Firefox            200           0.04
    Chrome             200           0.02
    Safari             404           0.07
    IE10               404           0.08
    Konqueror          301           1.00
    
    In [132]: df.reindex(index=new_index)                                                                       
    Out[132]: 
                   http_status  response_time
    Safari               404.0           0.07
    Iceweasel              NaN            NaN
    Comodo Dragon          NaN            NaN
    IE10                 404.0           0.08
    Chrome               200.0           0.02
    

      生成了一个新的df对象,添加的index

    我们也可以通过fill_value的选项来设置默认值

    In [133]: df.reindex(index=new_index, fill_value='missing')                                                 
    Out[133]: 
                  http_status response_time
    Safari                404          0.07
    Iceweasel         missing       missing
    Comodo Dragon     missing       missing
    IE10                  404          0.08
    Chrome                200          0.02
    

      也可以通过下面两种方式重设列的索引。

    In [134]: df.reindex(columns=['http_status', 'user_agent'])                                                 
    Out[134]: 
               http_status  user_agent
    Firefox            200         NaN
    Chrome             200         NaN
    Safari             404         NaN
    IE10               404         NaN
    Konqueror          301         NaN
    
    In [135]: df.reindex(['http_status', 'user_agent'], axis="columns")                                         
    Out[135]: 
               http_status  user_agent
    Firefox            200         NaN
    Chrome             200         NaN
    Safari             404         NaN
    IE10               404         NaN
    Konqueror          301         NaN
    

      为了进一步说明reindex的使用中,针对的有序索引,使用metho的参数,填写默认值。

    首先创建一个时间索引的df对象

    In [137]: date_index = pd.date_range('1/1/2010', periods=6, freq='D') 
         ...: df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]}, 
         ...:                    index=date_index) 
         ...:                                                                                                   
    
    In [138]: df2                                                                                               
    Out[138]: 
                prices
    2010-01-01   100.0
    2010-01-02   101.0
    2010-01-03     NaN
    2010-01-04   100.0
    2010-01-05    89.0
    2010-01-06    88.0
    

      然后通过reindex替换成一个时间周期更长的,并使用method参数。

    In [139]: date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')                                   
    
    In [140]: df2.reindex(index=date_index2)                                                                    
    Out[140]: 
                prices
    2009-12-29     NaN
    2009-12-30     NaN
    2009-12-31     NaN
    2010-01-01   100.0
    2010-01-02   101.0
    2010-01-03     NaN
    2010-01-04   100.0
    2010-01-05    89.0
    2010-01-06    88.0
    2010-01-07     NaN
    
    In [141]: df2.reindex(index=date_index2, method='bfill')                                                    
    Out[141]: 
                prices
    2009-12-29   100.0
    2009-12-30   100.0
    2009-12-31   100.0
    2010-01-01   100.0
    2010-01-02   101.0
    2010-01-03     NaN
    2010-01-04   100.0
    2010-01-05    89.0
    2010-01-06    88.0
    2010-01-07     NaN
    
    In [142]:         
    

      从输出可以看出,默认的还是NAN参数,使用了后面数据为默认数据,新的索引已经添加了数据,但老的索引内的数据并没有修改。

    如果需要更改,使用fillna的方法。

  • 相关阅读:
    启动Docker容器
    Docker 删除容器
    11.18数据库认证
    10.17权限认证
    9.16角色认证
    8.13数据库认证
    6.11Realm简介
    5.8认证流程分析
    4.7固定信息认证
    20张图表达程序员的心酸
  • 原文地址:https://www.cnblogs.com/sidianok/p/14367344.html
Copyright © 2020-2023  润新知