• 【每日一学】pandas_透视表函数&交叉表函数


    每日一悟

    【分开工作内外8小时】
    前一个月,我经常把工作内的问题带到路上、地铁上、睡觉前,甚至是周末。
    然而很快发现,我工作外的成就几乎没有,而工作内的进展也并不理想。
    仔细想想,工作外是需要学新东西,产生新灵感。一方面是工作内的支撑,另一方面也是新的方向。而不是低效率地光在脑子里想工作内的解决方案。
    所以,我觉得有必要明确工作内外的目标和行动,比如工作外每周一本书,每天的原版技术书阅读;工作内做好事务优先级,处理前先想清楚思路再着手准备。
    高效且多产,这才是目的。
    

    pandas.pivot_table

    pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')

    简介:
    method of pandas.core.frame.DataFrame instance Create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
    pandas核心实例的方法,创建一个大宽表的透视表数据框,在这个结果数据框中的索引和列等级,将会被存储在多重索引对象中(分层索引)。

    应用格式:
    pandas.pivot_table(dataframe,Other parameters)
    等同于
    dataframe.pivot_table(Other parameters)

    参数:
    在看参数之前我们先看看Excel中透视表的结构,结构为筛选、列、行、值。除了筛选,列、行、值与下面要介绍的pandas.pivot_table功能一值。

    data : 要应用透视表的数据框;
    values: 可选,是要聚合的列,相当于“值”,例如 values=["Price"];
    index : 是要聚合值的分组,相当于“行”,多个层次格式例如 index=["Name","Rep","Manager"];
    columns : 是要聚合值的分组,相当于“列”;
    aggfunc : 是要应用的聚合函数,指定不同值使用不同聚合函数时可用字典格式,例如 aggfunc=[np.mean,len],aggfunc={"Quantity":len,"Price":[np.sum,np.mean]};
    fill_value : 有时候聚合结果里出现了NaN,想替换成0时,fill_value=0;
    margins : 是否添加所有行或列的小计/总计,margins=True;
    margins_name : 当margins设置为True时,设置总计的名称,默认是“ALL”。

    举例:
    见help(pandas.pivot_table)

    pandas.crosstab

    crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False,margins_name='All', dropna=True, normalize=False)

    
        Compute a simple cross-tabulation of two (or more) factors. By default
        computes a frequency table of the factors unless an array of values and an
        aggregation function are passed
        
        Parameters
        ----------
        index : array-like, Series, or list of arrays/Series
            Values to group by in the rows
        columns : array-like, Series, or list of arrays/Series
            Values to group by in the columns
        values : array-like, optional
            Array of values to aggregate according to the factors.
            Requires `aggfunc` be specified.
        aggfunc : function, optional
            If specified, requires `values` be specified as well
        rownames : sequence, default None
            If passed, must match number of row arrays passed
        colnames : sequence, default None
            If passed, must match number of column arrays passed
        margins : boolean, default False
            Add row/column margins (subtotals)
        margins_name : string, default 'All'
            Name of the row / column that will contain the totals
            when margins is True.
        
            .. versionadded:: 0.21.0
        
        dropna : boolean, default True
            Do not include columns whose entries are all NaN
        normalize : boolean, {'all', 'index', 'columns'}, or {0,1}, default False
            Normalize by dividing all values by the sum of values.
        
            - If passed 'all' or `True`, will normalize over all values.
            - If passed 'index' will normalize over each row.
            - If passed 'columns' will normalize over each column.
            - If margins is `True`, will also normalize margin values.
        
            .. versionadded:: 0.18.1
        
        
        Notes
        -----
        Any Series passed will have their name attributes used unless row or column
        names for the cross-tabulation are specified.
        
        Any input passed containing Categorical data will have **all** of its
        categories included in the cross-tabulation, even if the actual data does
        not contain any instances of a particular category.
        
        In the event that there aren't overlapping indexes an empty DataFrame will
        be returned.
        
        Examples
        --------
    a = np.array(["foo", "foo", "foo", "foo", "bar", "bar",
                  "bar", "bar", "foo", "foo", "foo"], dtype=object)
    b = np.array(["one", "one", "one", "two", "one", "one",
                  "one", "two", "two", "two", "one"], dtype=object)
    c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny",
                  "shiny", "dull", "shiny", "shiny", "shiny"],
                  dtype=object)
        
    pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
    # doctest: +NORMALIZE_WHITESPACE
        b   one        two
        c   dull shiny dull shiny
        a
        bar    1     2    1     0
        foo    2     2    1     2
        
    foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
    bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
    crosstab(foo, bar)  # 'c' and 'f' are not represented in the data,
                        # but they still will be counted in the output
    # doctest: +SKIP
        col_0  d  e  f
        row_0
        a      1  0  0
        b      0  1  0
        c      0  0  0
        
        Returns
        -------
        crosstab : DataFrame
    
    Without summary,you can't master it.
  • 相关阅读:
    初探 Linux
    操作系统简介
    1208. 尽可能使字符串相等
    643. 子数组最大平均数 I
    480. 滑动窗口中位数
    Bisect in Python
    HTTP 和 HTTPS 的区别
    URI和URL的区别
    HTTP 1.0和HTTP 1.1的主要区别是什么?
    MySQL游标的使用笔记大全
  • 原文地址:https://www.cnblogs.com/everda/p/9253273.html
Copyright © 2020-2023  润新知