• pandas.DataFrame.stack抄书笔记


    首先学习stack

    来源链接:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html#pandas.DataFrame.stack

    pandas.DataFrame.stack

    DataFrame.stack(level=1dropna=True)[source]

    Stack the prescribed level(s) from columns to index.

    Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:

    • if the columns have a single level, the output is a Series;

    • if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.

    Parameters
    levelint, str, list, default -1

    Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.

    dropnabool, default True

    Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.

    Returns
    DataFrame or Series

    Stacked dataframe or series.

    简单理解就是从列中拿取一列来当行的索引,如果列是单一的,那返回的就是Series对象,如果是多层的,那返回的还是DataFrame对象。

    Notes

    The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).

    Examples

    Single level columns

    df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]],
                                        index=['cat', 'dog'],
                                        columns=['weight', 'height'])
    

      Stacking a dataframe with a single level column axis returns a Series:

    In [27]: df_single_level_cols                                                                                                            
    Out[27]: 
         weight  height
    cat       0       1
    dog       2       3
    
    In [28]: r = df_single_level_cols.stack()                                                                                                
    
    In [29]: r                                                                                                                               
    Out[29]: 
    cat  weight    0
         height    1
    dog  weight    2
         height    3
    dtype: int64
    
    In [30]: r.index                                                                                                                         
    Out[30]: 
    MultiIndex([('cat', 'weight'),
                ('cat', 'height'),
                ('dog', 'weight'),
                ('dog', 'height')],
               )
    
    In [31]:   
    

      从输出可以看出来返回的是将列索引转移到行索引上面,行索引变成了多层索引。

    Multi level columns: simple case

    multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
                                           ('weight', 'pounds')])
    df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]],
                                        index=['cat', 'dog'],
                                        columns=multicol1)
    

      输出

    In [38]: df_multi_level_cols1                                                                                                            
    Out[38]: 
        weight       
            kg pounds
    cat      1      2
    dog      2      4
    
    In [39]: df_multi_level_cols1.stack()                                                                                                    
    Out[39]: 
                weight
    cat kg           1
        pounds       2
    dog kg           2
        pounds       4
    

      从输出看出,stack抽走了最下面的一层column的index去当行标签了。

    Missing values

    multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'),
                                           ('height', 'm')])
    df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]],
                                        index=['cat', 'dog'],
                                        columns=multicol2)
    

      It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:

    In [41]: df_multi_level_cols2                                                                                                            
    Out[41]: 
        weight height
            kg      m
    cat    1.0    2.0
    dog    3.0    4.0
    
    In [42]: df_multi_level_cols2.stack()                                                                                                    
    Out[42]: 
            height  weight
    cat kg     NaN     1.0
        m      2.0     NaN
    dog kg     NaN     3.0
        m      4.0     NaN
    

      从最下面抽了一层给行便签组合成联合索引,很多空的数据默认用了NaN

    Prescribing the level(s) to be stacked

    The first parameter controls which level or levels are stacked:

    In [48]: df_multi_level_cols2.stack(level=0)                                                                                             
    Out[48]: 
                 kg    m
    cat height  NaN  2.0
        weight  1.0  NaN
    dog height  NaN  4.0
        weight  3.0  NaN
    
    In [49]: df_multi_level_cols2                                                                                                            
    Out[49]: 
        weight height
            kg      m
    cat    1.0    2.0
    dog    3.0    4.0
    
    In [50]: df_multi_level_cols2.stack(level=[0,1])                                                                                         
    Out[50]: 
    cat  height  m     2.0
         weight  kg    1.0
    dog  height  m     4.0
         weight  kg    3.0
    dtype: float64
    
    In [51]: df_multi_level_cols2.stack(level=[1,0])                                                                                         
    Out[51]: 
    cat  kg  weight    1.0
         m   height    2.0
    dog  kg  weight    3.0
         m   height    4.0
    dtype: float64
    

      你也可以指定需要抽的行索引,也可以把所有的行索引抽出来。

    Dropping missing values

    In [52]: df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]], 
        ...:                                     index=['cat', 'dog'], 
        ...:                                     columns=multicol2)                                                                          
    
    In [53]: df_multi_level_cols3                                                                                                            
    Out[53]: 
        weight height
            kg      m
    cat    NaN    1.0
    dog    2.0    3.0
    

      Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:

    当一行数据都为NaN的时候,可以通过dropna的选择来控制是否删除

    In [54]: df_multi_level_cols3.stack()                                                                                                    
    Out[54]: 
            height  weight
    cat m      1.0     NaN
    dog kg     NaN     2.0
        m      3.0     NaN
    
    In [55]: df_multi_level_cols3.stack(dropna=False)                                                                                        
    Out[55]: 
            height  weight
    cat kg     NaN     NaN
        m      1.0     NaN
    dog kg     NaN     2.0
        m      3.0     NaN
    

      默认为True,表示行数据为空的时候,不显示。

  • 相关阅读:
    ELK的学习与应用
    windows 常用命令
    Electron笔记
    C#基础
    IIS运行NetCore程序
    nuget打包
    web pack备忘
    基于并发订课系统的架构演变
    面试造核弹的童话
    Python3-接口自动化-11-使用join方法请求参数拼接,格式key1=value1&keys=value2....
  • 原文地址:https://www.cnblogs.com/sidianok/p/14475624.html
Copyright © 2020-2023  润新知