首先学习stack
来源链接:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html#pandas.DataFrame.stack
pandas.DataFrame.stack
DataFrame.
stack
(level=- 1, dropna=True)[source]-
Stack the prescribed level(s) from columns to index.
Return a reshaped DataFrame or Series having a multi-level index with one or more new inner-most levels compared to the current DataFrame. The new inner-most levels are created by pivoting the columns of the current dataframe:
-
if the columns have a single level, the output is a Series;
-
if the columns have multiple levels, the new index level(s) is (are) taken from the prescribed level(s) and the output is a DataFrame.
- Parameters
- levelint, str, list, default -1
-
Level(s) to stack from the column axis onto the index axis, defined as one index or label, or a list of indices or labels.
- dropnabool, default True
-
Whether to drop rows in the resulting Frame/Series with missing values. Stacking a column level onto the index axis can create combinations of index and column values that are missing from the original dataframe. See Examples section.
- Returns
- DataFrame or Series
-
Stacked dataframe or series.
-
简单理解就是从列中拿取一列来当行的索引,如果列是单一的,那返回的就是Series对象,如果是多层的,那返回的还是DataFrame对象。
Notes
The function is named by analogy with a collection of books being reorganized from being side by side on a horizontal position (the columns of the dataframe) to being stacked vertically on top of each other (in the index of the dataframe).
Examples
Single level columns
df_single_level_cols = pd.DataFrame([[0, 1], [2, 3]], index=['cat', 'dog'], columns=['weight', 'height'])
Stacking a dataframe with a single level column axis returns a Series:
In [27]: df_single_level_cols Out[27]: weight height cat 0 1 dog 2 3 In [28]: r = df_single_level_cols.stack() In [29]: r Out[29]: cat weight 0 height 1 dog weight 2 height 3 dtype: int64 In [30]: r.index Out[30]: MultiIndex([('cat', 'weight'), ('cat', 'height'), ('dog', 'weight'), ('dog', 'height')], ) In [31]:
从输出可以看出来返回的是将列索引转移到行索引上面,行索引变成了多层索引。
Multi level columns: simple case
multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'), ('weight', 'pounds')]) df_multi_level_cols1 = pd.DataFrame([[1, 2], [2, 4]], index=['cat', 'dog'], columns=multicol1)
输出
In [38]: df_multi_level_cols1 Out[38]: weight kg pounds cat 1 2 dog 2 4 In [39]: df_multi_level_cols1.stack() Out[39]: weight cat kg 1 pounds 2 dog kg 2 pounds 4
从输出看出,stack抽走了最下面的一层column的index去当行标签了。
Missing values
multicol2 = pd.MultiIndex.from_tuples([('weight', 'kg'), ('height', 'm')]) df_multi_level_cols2 = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]], index=['cat', 'dog'], columns=multicol2)
It is common to have missing values when stacking a dataframe with multi-level columns, as the stacked dataframe typically has more values than the original dataframe. Missing values are filled with NaNs:
In [41]: df_multi_level_cols2 Out[41]: weight height kg m cat 1.0 2.0 dog 3.0 4.0 In [42]: df_multi_level_cols2.stack() Out[42]: height weight cat kg NaN 1.0 m 2.0 NaN dog kg NaN 3.0 m 4.0 NaN
从最下面抽了一层给行便签组合成联合索引,很多空的数据默认用了NaN
Prescribing the level(s) to be stacked
The first parameter controls which level or levels are stacked:
In [48]: df_multi_level_cols2.stack(level=0) Out[48]: kg m cat height NaN 2.0 weight 1.0 NaN dog height NaN 4.0 weight 3.0 NaN In [49]: df_multi_level_cols2 Out[49]: weight height kg m cat 1.0 2.0 dog 3.0 4.0 In [50]: df_multi_level_cols2.stack(level=[0,1]) Out[50]: cat height m 2.0 weight kg 1.0 dog height m 4.0 weight kg 3.0 dtype: float64 In [51]: df_multi_level_cols2.stack(level=[1,0]) Out[51]: cat kg weight 1.0 m height 2.0 dog kg weight 3.0 m height 4.0 dtype: float64
你也可以指定需要抽的行索引,也可以把所有的行索引抽出来。
Dropping missing values
In [52]: df_multi_level_cols3 = pd.DataFrame([[None, 1.0], [2.0, 3.0]], ...: index=['cat', 'dog'], ...: columns=multicol2) In [53]: df_multi_level_cols3 Out[53]: weight height kg m cat NaN 1.0 dog 2.0 3.0
Note that rows where all values are missing are dropped by default but this behaviour can be controlled via the dropna keyword parameter:
当一行数据都为NaN的时候,可以通过dropna的选择来控制是否删除
In [54]: df_multi_level_cols3.stack() Out[54]: height weight cat m 1.0 NaN dog kg NaN 2.0 m 3.0 NaN In [55]: df_multi_level_cols3.stack(dropna=False) Out[55]: height weight cat kg NaN NaN m 1.0 NaN dog kg NaN 2.0 m 3.0 NaN
默认为True,表示行数据为空的时候,不显示。