• 萌新向Python数据分析及数据挖掘 第二章 pandas 第五节 Getting Started with pandas


    Getting Started with pandas

    In [1]:
     
     
     
     
     
    import pandas as pd
     
     
    In [2]:
     
     
     
     
     
    from pandas import Series, DataFrame
     
     
    In [3]:
     
     
     
     
     
    import numpy as np
    np.random.seed(12345)
    import matplotlib.pyplot as plt
    plt.rc('figure', figsize=(10, 6))
    PREVIOUS_MAX_ROWS = pd.options.display.max_rows
    pd.options.display.max_rows = 20
    np.set_printoptions(precision=4, suppress=True)
     
     
     

    Introduction to pandas Data Structures

     

    Series

    In [4]:
     
     
     
     
     
    obj = pd.Series([4, 7, -5, 3])
    obj
     
     
    Out[4]:
    0    4
    1    7
    2   -5
    3    3
    dtype: int64
    In [5]:
     
     
     
     
     
    obj.values
    obj.index  # like range(4)
     
     
    Out[5]:
    RangeIndex(start=0, stop=4, step=1)
    In [6]:
     
     
     
     
     
    obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])#设置索引
    obj2
    obj2.index
     
     
    Out[6]:
    Index(['d', 'b', 'a', 'c'], dtype='object')
     
     
     
     
     
     
    Init signature: pd.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
    Docstring:    
    One-dimensional ndarray with axis labels (including time series).
    Labels need not be unique but must be a hashable type. The object
    supports both integer- and label-based indexing and provides a host of
    methods for performing operations involving the index. Statistical
    methods from ndarray have been overridden to automatically exclude
    missing data (currently represented as NaN).
    Operations between Series (+, -, /, *, **) align values based on their
    associated index values-- they need not be the same length. The result
    index will be the sorted union of the two indexes.
    Parameters
    ----------
    data : array-like, dict, or scalar value
        Contains data stored in Series
    index : array-like or Index (1d)
        Values must be hashable and have the same length as `data`.
        Non-unique index values are allowed. Will default to
        RangeIndex(len(data)) if not provided. If both a dict and index
        sequence are used, the index will override the keys found in the
        dict.
    dtype : numpy.dtype or None
        If None, dtype will be inferred
    copy : boolean, default False
        Copy input data
     
    In [7]:
     
     
     
     
     
    obj2['a']
    obj2['d'] = 6
    obj2[['c', 'a', 'd']]
     
     
    Out[7]:
    c    3
    a   -5
    d    6
    dtype: int64
    In [10]:
     
     
     
     
     
    obj2[obj2 > 0]
     
     
    Out[10]:
    d    6
    b    7
    c    3
    dtype: int64
    In [11]:
     
     
     
     
     
    obj2 * 2
    np.exp(obj2)
     
     
    Out[11]:
    d     403.428793
    b    1096.633158
    a       0.006738
    c      20.085537
    dtype: float64
    In [ ]:
     
     
     
     
     
     
     
    In [12]:
     
     
     
     
     
    'b' in obj2
    'e' in obj2
     
     
    Out[12]:
    False
    In [13]:
     
     
     
     
     
    sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
    obj3 = pd.Series(sdata)
    obj3
     
     
    Out[13]:
    Ohio      35000
    Oregon    16000
    Texas     71000
    Utah       5000
    dtype: int64
    In [14]:
     
     
     
     
     
    states = ['California', 'Ohio', 'Oregon', 'Texas']
    obj4 = pd.Series(sdata, index=states)
    obj4
     
     
    Out[14]:
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64
    In [17]:
     
     
     
     
     
    pd.isnull(obj4)
     
     
    Out[17]:
    California     True
    Ohio          False
    Oregon        False
    Texas         False
    dtype: bool
     
     
     
     
     
     
    Signature: pd.isnull(obj)
    Docstring:
    Detect missing values (NaN in numeric arrays, None/NaN in object arrays)
    Parameters
    ----------
    arr : ndarray or object value
        Object to check for null-ness
    Returns
    -------
    isna : array-like of bool or bool
        Array or bool indicating whether an object is null or if an array is
        given which of the element is null.
    See also
    --------
    pandas.notna: boolean inverse of pandas.isna
    pandas.isnull: alias of isna
     
    In [18]:
     
     
     
     
     
    pd.notnull(obj4)
     
     
    Out[18]:
    California    False
    Ohio           True
    Oregon         True
    Texas          True
    dtype: bool
     
     
     
     
     
     
    Signature: pd.notnull(obj)
    Docstring:
    Replacement for numpy.isfinite / -numpy.isnan which is suitable for use
    on object arrays.
    Parameters
    ----------
    arr : ndarray or object value
        Object to check for *not*-null-ness
    Returns
    -------
    notisna : array-like of bool or bool
        Array or bool indicating whether an object is *not* null or if an array
        is given which of the element is *not* null.
    See also
    --------
    pandas.isna : boolean inverse of pandas.notna
    pandas.notnull : alias of notna
     
    In [ ]:
     
     
     
     
     
    obj4.isnull()
     
     
    In [19]:
     
     
     
     
     
    obj3
     
     
    Out[19]:
    Ohio      35000
    Oregon    16000
    Texas     71000
    Utah       5000
    dtype: int64
    In [20]:
     
     
     
     
     
    obj4
     
     
    Out[20]:
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64
    In [21]:
     
     
     
     
     
    obj3 + obj4 #同索引数值相加
     
     
    Out[21]:
    California         NaN
    Ohio           70000.0
    Oregon         32000.0
    Texas         142000.0
    Utah               NaN
    dtype: float64
    In [23]:
     
     
     
     
     
    obj4.name = 'population'
    obj4.index.name = 'state'#设置index名称
    obj4
     
     
    Out[23]:
    state
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    Name: population, dtype: float64
    In [24]:
     
     
     
     
     
    obj
     
     
    Out[24]:
    0    4
    1    7
    2   -5
    3    3
    dtype: int64
    In [25]:
     
     
     
     
     
    obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']#设置index
    obj
     
     
    Out[25]:
    Bob      4
    Steve    7
    Jeff    -5
    Ryan     3
    dtype: int64
     

    DataFrame

    In [26]:
     
     
     
     
     
    data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
            'year': [2000, 2001, 2002, 2001, 2002, 2003],
            'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
    frame = pd.DataFrame(data)
     
     
    In [27]:
     
     
     
     
     
    frame
     
     
    Out[27]:
     popstateyear
    0 1.5 Ohio 2000
    1 1.7 Ohio 2001
    2 3.6 Ohio 2002
    3 2.4 Nevada 2001
    4 2.9 Nevada 2002
    5 3.2 Nevada 2003
    In [29]:
     
     
     
     
     
    frame.head()
     
     
    Out[29]:
     popstateyear
    0 1.5 Ohio 2000
    1 1.7 Ohio 2001
    2 3.6 Ohio 2002
    3 2.4 Nevada 2001
    4 2.9 Nevada 2002
     
     
     
     
     
     
    Signature: frame.head(n=5)
    Docstring:
    Return the first n rows.
    Parameters
    ----------
    n : int, default 5
        Number of rows to select.
    Returns
    -------
    obj_head : type of caller
        The first n rows of the caller object.
     
    In [30]:
     
     
     
     
     
    pd.DataFrame(data, columns=['year', 'state', 'pop'])#设置列名
     
     
    Out[30]:
     yearstatepop
    0 2000 Ohio 1.5
    1 2001 Ohio 1.7
    2 2002 Ohio 3.6
    3 2001 Nevada 2.4
    4 2002 Nevada 2.9
    5 2003 Nevada 3.2
     
     
     
     
     
     
    Init signature: pd.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
    Docstring:    
    Two-dimensional size-mutable, potentially heterogeneous tabular data
    structure with labeled axes (rows and columns). Arithmetic operations
    align on both row and column labels. Can be thought of as a dict-like
    container for Series objects. The primary pandas data structure
    Parameters
    ----------
    data : numpy ndarray (structured or homogeneous), dict, or DataFrame
        Dict can contain Series, arrays, constants, or list-like objects
    index : Index or array-like
        Index to use for resulting frame. Will default to np.arange(n) if
        no indexing information part of input data and no index provided
    columns : Index or array-like
        Column labels to use for resulting frame. Will default to
        np.arange(n) if no column labels are provided
    dtype : dtype, default None
        Data type to force. Only a single dtype is allowed. If None, infer
    copy : boolean, default False
        Copy data from inputs. Only affects DataFrame / 2d ndarray input
    Examples
    --------
    Constructing DataFrame from a dictionary.
    >>> d = {'col1': [1, 2], 'col2': [3, 4]}
    >>> df = pd.DataFrame(data=d)
    >>> df
       col1  col2
    0     1     3
    1     2     4
    Notice that the inferred dtype is int64.
    >>> df.dtypes
    col1    int64
    col2    int64
    dtype: object
    To enforce a single dtype:
    >>> df = pd.DataFrame(data=d, dtype=np.int8)
    >>> df.dtypes
    col1    int8
    col2    int8
    dtype: object
    Constructing DataFrame from numpy ndarray:
    >>> df2 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
    ...                    columns=['a', 'b', 'c', 'd', 'e'])
    >>> df2
        a   b   c   d   e
    0   2   8   8   3   4
    1   4   2   9   0   9
    2   1   0   7   8   0
    3   5   1   7   1   3
    4   6   0   2   4   2
     
    In [31]:
     
     
     
     
     
    frame2 = pd.DataFrame(data, columns=['year', 'state', 'pop', 'debt'],#空数据会显示NaN
                          index=['one', 'two', 'three', 'four',
                                 'five', 'six'])
    frame2
     
     
    Out[31]:
     yearstatepopdebt
    one 2000 Ohio 1.5 NaN
    two 2001 Ohio 1.7 NaN
    three 2002 Ohio 3.6 NaN
    four 2001 Nevada 2.4 NaN
    five 2002 Nevada 2.9 NaN
    six 2003 Nevada 3.2 NaN
    In [32]:
     
     
     
     
     
    frame2.columns
     
     
    Out[32]:
    Index(['year', 'state', 'pop', 'debt'], dtype='object')
    In [34]:
     
     
     
     
     
    frame2['state']
     
     
    Out[34]:
    one        Ohio
    two        Ohio
    three      Ohio
    four     Nevada
    five     Nevada
    six      Nevada
    Name: state, dtype: object
    In [35]:
     
     
     
     
     
    frame2.year#列名有空格不可用
     
     
    Out[35]:
    one      2000
    two      2001
    three    2002
    four     2001
    five     2002
    six      2003
    Name: year, dtype: int64
    In [36]:
     
     
     
     
     
    frame2.loc['three']
     
     
    Out[36]:
    year     2002
    state    Ohio
    pop       3.6
    debt      NaN
    Name: three, dtype: object
    In [37]:
     
     
     
     
     
    frame2['debt'] = 16.5
    frame2
     
     
    Out[37]:
     yearstatepopdebt
    one 2000 Ohio 1.5 16.5
    two 2001 Ohio 1.7 16.5
    three 2002 Ohio 3.6 16.5
    four 2001 Nevada 2.4 16.5
    five 2002 Nevada 2.9 16.5
    six 2003 Nevada 3.2 16.5
    In [39]:
     
     
     
     
     
    frame2['debt'] = np.arange(6.)
    frame2
     
     
    Out[39]:
     yearstatepopdebt
    one 2000 Ohio 1.5 0.0
    two 2001 Ohio 1.7 1.0
    three 2002 Ohio 3.6 2.0
    four 2001 Nevada 2.4 3.0
    five 2002 Nevada 2.9 4.0
    six 2003 Nevada 3.2 5.0
     
     
     
     
     
     
    Docstring:
    arange([start,] stop[, step,], dtype=None)
    Return evenly spaced values within a given interval.
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
    but returns an ndarray rather than a list.
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use ``linspace`` for these cases.
    Parameters
    ----------
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off affects the length of `out`.
    step : number, optional
        Spacing between values.  For any output `out`, this is the distance
        between two adjacent values, ``out[i+1] - out[i]``.  The default
        step size is 1.  If `step` is specified as a position argument,
        `start` must also be given.
    dtype : dtype
        The type of the output array.  If `dtype` is not given, infer the data
        type from the other input arguments.
    Returns
    -------
    arange : ndarray
        Array of evenly spaced values.
        For floating point arguments, the length of the result is
        ``ceil((stop - start)/step)``.  Because of floating point overflow,
        this rule may result in the last element of `out` being greater
        than `stop`.
    See Also
    --------
    linspace : Evenly spaced numbers with careful handling of endpoints.
    ogrid: Arrays of evenly spaced numbers in N-dimensions.
    mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.
    Examples
    --------
    >>> np.arange(3)
    array([0, 1, 2])
    >>> np.arange(3.0)
    array([ 0.,  1.,  2.])
    >>> np.arange(3,7)
    array([3, 4, 5, 6])
    >>> np.arange(3,7,2)
    array([3, 5])
     
    In [41]:
     
     
     
     
     
    val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
    frame2['debt'] = val
    frame2#匹配index填充数据
     
     
    Out[41]:
     yearstatepopdebt
    one 2000 Ohio 1.5 NaN
    two 2001 Ohio 1.7 -1.2
    three 2002 Ohio 3.6 NaN
    four 2001 Nevada 2.4 -1.5
    five 2002 Nevada 2.9 -1.7
    six 2003 Nevada 3.2 NaN
    In [ ]:
     
     
     
     
     
     
     
    In [42]:
     
     
     
     
     
    frame2['eastern'] = frame2.state == 'Ohio'
    frame2
     
     
    Out[42]:
     yearstatepopdebteastern
    one 2000 Ohio 1.5 NaN True
    two 2001 Ohio 1.7 -1.2 True
    three 2002 Ohio 3.6 NaN True
    four 2001 Nevada 2.4 -1.5 False
    five 2002 Nevada 2.9 -1.7 False
    six 2003 Nevada 3.2 NaN False
    In [43]:
     
     
     
     
     
    del frame2['eastern']#删除列
    frame2.columns
     
     
    Out[43]:
    Index(['year', 'state', 'pop', 'debt'], dtype='object')
    In [45]:
     
     
     
     
     
    pop = {'Nevada': {2001: 2.4, 2002: 2.9},
           'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
     
     
    In [46]:
     
     
     
     
     
    frame3 = pd.DataFrame(pop)
    frame3
     
     
    Out[46]:
     NevadaOhio
    2000 NaN 1.5
    2001 2.4 1.7
    2002 2.9 3.6
    In [47]:
     
     
     
     
     
    frame3.T#转置
     
     
    Out[47]:
     200020012002
    Nevada NaN 2.4 2.9
    Ohio 1.5 1.7 3.6
    In [48]:
     
     
     
     
     
    pd.DataFrame(pop, index=[2001, 2002, 2003])
     
     
    Out[48]:
     NevadaOhio
    2001 2.4 1.7
    2002 2.9 3.6
    2003 NaN NaN
    In [49]:
     
     
     
     
     
    pdata = {'Ohio': frame3['Ohio'][:-1],
             'Nevada': frame3['Nevada'][:2]}
    pd.DataFrame(pdata)
     
     
    Out[49]:
     NevadaOhio
    2000 NaN 1.5
    2001 2.4 1.7
    In [50]:
     
     
     
     
     
    frame3.index.name = 'year'; frame3.columns.name = 'state'#设置列统称
    frame3
     
     
    Out[50]:
    stateNevadaOhio
    year  
    2000 NaN 1.5
    2001 2.4 1.7
    2002 2.9 3.6
    In [51]:
     
     
     
     
     
    frame3.values
     
     
    Out[51]:
    array([[nan, 1.5],
           [2.4, 1.7],
           [2.9, 3.6]])
    In [52]:
     
     
     
     
     
    frame2.values
     
     
    Out[52]:
    array([[2000, 'Ohio', 1.5, nan],
           [2001, 'Ohio', 1.7, -1.2],
           [2002, 'Ohio', 3.6, nan],
           [2001, 'Nevada', 2.4, -1.5],
           [2002, 'Nevada', 2.9, -1.7],
           [2003, 'Nevada', 3.2, nan]], dtype=object)
     

    Index Objects

    In [53]:
     
     
     
     
     
    obj = pd.Series(range(3), index=['a', 'b', 'c'])
    index = obj.index
    index
    index[1:]
     
     
    Out[53]:
    Index(['b', 'c'], dtype='object')
     

    index[1] = 'd' # TypeError

    In [54]:
     
     
     
     
     
    labels = pd.Index(np.arange(3))
    labels
     
     
    Out[54]:
    Int64Index([0, 1, 2], dtype='int64')
    In [55]:
     
     
     
     
     
    obj2 = pd.Series([1.5, -2.5, 0], index=labels)
    obj2
    obj2.index is labels
     
     
    Out[55]:
    True
    In [56]:
     
     
     
     
     
    frame3
    frame3.columns
     
     
    Out[56]:
    Index(['Nevada', 'Ohio'], dtype='object', name='state')
    In [57]:
     
     
     
     
     
    'Ohio' in frame3.columns
     
     
    Out[57]:
    True
    In [58]:
     
     
     
     
     
    2003 in frame3.index
     
     
    Out[58]:
    False
    In [59]:
     
     
     
     
     
    dup_labels = pd.Index(['foo', 'foo', 'bar', 'bar'])
    dup_labels
     
     
    Out[59]:
    Index(['foo', 'foo', 'bar', 'bar'], dtype='object')
     

    Essential Functionality

     

    Reindexing

    In [60]:
     
     
     
     
     
    obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
    obj
     
     
    Out[60]:
    d    4.5
    b    7.2
    a   -5.3
    c    3.6
    dtype: float64
    In [62]:
     
     
     
     
     
    obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
    obj2
     
     
    Out[62]:
    a   -5.3
    b    7.2
    c    3.6
    d    4.5
    e    NaN
    dtype: float64
     
     
     
     
     
     
    ignature: obj.reindex(index=None, **kwargs)
    Docstring:
    Conform Series to new index with optional filling logic, placing
    NA/NaN in locations having no value in the previous index. A new object
    is produced unless the new index is equivalent to the current one and
    copy=False
    Parameters
    ----------
    index : array-like, optional (should be specified using keywords)
        New labels / index to conform to. Preferably an Index object to
        avoid duplicating data
    method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
        method to use for filling holes in reindexed DataFrame.
        Please note: this is only  applicable to DataFrames/Series with a
        monotonically increasing/decreasing index.
        * default: don't fill gaps
        * pad / ffill: propagate last valid observation forward to next
          valid
        * backfill / bfill: use next valid observation to fill gap
        * nearest: use nearest valid observations to fill gap
    copy : boolean, default True
        Return a new object, even if the passed indexes are the same
    level : int or name
        Broadcast across a level, matching Index values on the
        passed MultiIndex level
    fill_value : scalar, default np.NaN
        Value to use for missing values. Defaults to NaN, but can be any
        "compatible" value
    limit : int, default None
        Maximum number of consecutive elements to forward or backward fill
    tolerance : optional
        Maximum distance between original and new labels for inexact
        matches. The values of the index at the matching locations most
        satisfy the equation ``abs(index[indexer] - target) <= tolerance``.
        Tolerance may be a scalar value, which applies the same tolerance
        to all values, or list-like, which applies variable tolerance per
        element. List-like includes list, tuple, array, Series, and must be
        the same size as the index and its dtype must exactly match the
        index's type.
        .. versionadded:: 0.17.0
        .. versionadded:: 0.21.0 (list-like tolerance)
    Examples
    --------
    ``DataFrame.reindex`` supports two calling conventions
    * ``(index=index_labels, columns=column_labels, ...)``
    * ``(labels, axis={'index', 'columns'}, ...)``
    We *highly* recommend using keyword arguments to clarify your
    intent.
    Create a dataframe with some fictional data.
    >>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
    >>> df = pd.DataFrame({
    ...      'http_status': [200,200,404,404,301],
    ...      'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
    ...       index=index)
    >>> df
               http_status  response_time
    Firefox            200           0.04
    Chrome             200           0.02
    Safari             404           0.07
    IE10               404           0.08
    Konqueror          301           1.00
    Create a new index and reindex the dataframe. By default
    values in the new index that do not have corresponding
    records in the dataframe are assigned ``NaN``.
    >>> new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
    ...             'Chrome']
    >>> df.reindex(new_index)
                   http_status  response_time
    Safari               404.0           0.07
    Iceweasel              NaN            NaN
    Comodo Dragon          NaN            NaN
    IE10                 404.0           0.08
    Chrome               200.0           0.02
    We can fill in the missing values by passing a value to
    the keyword ``fill_value``. Because the index is not monotonically
    increasing or decreasing, we cannot use arguments to the keyword
    ``method`` to fill the ``NaN`` values.
    >>> df.reindex(new_index, fill_value=0)
                   http_status  response_time
    Safari                 404           0.07
    Iceweasel                0           0.00
    Comodo Dragon            0           0.00
    IE10                   404           0.08
    Chrome                 200           0.02
    >>> df.reindex(new_index, fill_value='missing')
                  http_status response_time
    Safari                404          0.07
    Iceweasel         missing       missing
    Comodo Dragon     missing       missing
    IE10                  404          0.08
    Chrome                200          0.02
    We can also reindex the columns.
    >>> df.reindex(columns=['http_status', 'user_agent'])
               http_status  user_agent
    Firefox            200         NaN
    Chrome             200         NaN
    Safari             404         NaN
    IE10               404         NaN
    Konqueror          301         NaN
    Or we can use "axis-style" keyword arguments
    >>> df.reindex(['http_status', 'user_agent'], axis="columns")
               http_status  user_agent
    Firefox            200         NaN
    Chrome             200         NaN
    Safari             404         NaN
    IE10               404         NaN
    Konqueror          301         NaN
    To further illustrate the filling functionality in
    ``reindex``, we will create a dataframe with a
    monotonically increasing index (for example, a sequence
    of dates).
    >>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
    >>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
    ...                    index=date_index)
    >>> df2
                prices
    2010-01-01     100
    2010-01-02     101
    2010-01-03     NaN
    2010-01-04     100
    2010-01-05      89
    2010-01-06      88
    Suppose we decide to expand the dataframe to cover a wider
    date range.
    >>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
    >>> df2.reindex(date_index2)
                prices
    2009-12-29     NaN
    2009-12-30     NaN
    2009-12-31     NaN
    2010-01-01     100
    2010-01-02     101
    2010-01-03     NaN
    2010-01-04     100
    2010-01-05      89
    2010-01-06      88
    2010-01-07     NaN
    The index entries that did not have a value in the original data frame
    (for example, '2009-12-29') are by default filled with ``NaN``.
    If desired, we can fill in the missing values using one of several
    options.
    For example, to backpropagate the last valid value to fill the ``NaN``
    values, pass ``bfill`` as an argument to the ``method`` keyword.
    >>> df2.reindex(date_index2, method='bfill')
                prices
    2009-12-29     100
    2009-12-30     100
    2009-12-31     100
    2010-01-01     100
    2010-01-02     101
    2010-01-03     NaN
    2010-01-04     100
    2010-01-05      89
    2010-01-06      88
    2010-01-07     NaN
    Please note that the ``NaN`` value present in the original dataframe
    (at index value 2010-01-03) will not be filled by any of the
    value propagation schemes. This is because filling while reindexing
    does not look at dataframe values, but only compares the original and
    desired indexes. If you do want to fill in the ``NaN`` values present
    in the original dataframe, use the ``fillna()`` method.
    See the :ref:`user guide <basics.reindexing>` for more.
    Returns
    -------
     
    In [63]:
     
     
     
     
     
    obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
    obj3
     
     
    Out[63]:
    0      blue
    2    purple
    4    yellow
    dtype: object
    In [64]:
     
     
     
     
     
    obj3.reindex(range(6), method='ffill')#按index空缺填充
     
     
    Out[64]:
    0      blue
    1      blue
    2    purple
    3    purple
    4    yellow
    5    yellow
    dtype: object
    In [72]:
     
     
     
     
     
    frame = pd.DataFrame(np.arange(9).reshape((3, 3)),
                         index=['a', 'c', 'd'],
                         columns=['Ohio', 'Texas', 'California'])
    frame
     
     
    Out[72]:
     OhioTexasCalifornia
    a 0 1 2
    c 3 4 5
    d 6 7 8
    In [66]:
     
     
     
     
     
    frame2 = frame.reindex(['a', 'b', 'c', 'd'])
    frame2
     
     
    Out[66]:
     OhioTexasCalifornia
    a 0.0 1.0 2.0
    b NaN NaN NaN
    c 3.0 4.0 5.0
    d 6.0 7.0 8.0
    In [73]:
     
     
     
     
     
    states = ['Texas', 'Utah', 'California']
    frame.reindex(columns=states)
     
     
    Out[73]:
     TexasUtahCalifornia
    a 1 NaN 2
    c 4 NaN 5
    d 7 NaN 8
    In [74]:
     
     
     
     
     
    frame.loc[['a', 'b', 'c', 'd'], states]#这里提醒,传入列表或有找不到的标签的,以后会报错,用.reindex代替
     
     
     
    c:usersqq123anaconda3libsite-packagesipykernel_launcher.py:1: FutureWarning: 
    Passing list-likes to .loc or [] with any missing label will raise
    KeyError in the future, you can use .reindex() as an alternative.
    
    See the documentation here:
    http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
      """Entry point for launching an IPython kernel.
    
    Out[74]:
     TexasUtahCalifornia
    a 1.0 NaN 2.0
    b NaN NaN NaN
    c 4.0 NaN 5.0
    d 7.0 NaN 8.0
     

    Dropping Entries from an Axis

    In [91]:
     
     
     
     
     
    obj = pd.Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
    obj
     
     
    Out[91]:
    a    0.0
    b    1.0
    c    2.0
    d    3.0
    e    4.0
    dtype: float64
    In [80]:
     
     
     
     
     
    new_obj = obj.drop('c')
    new_obj
     
     
    Out[80]:
    a    0.0
    b    1.0
    d    3.0
    e    4.0
    dtype: float64
     
     
     
     
     
     
    Signature: obj.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
    Docstring:
    Return new object with labels in requested axis removed.
    Parameters
    ----------
    labels : single label or list-like
        Index or column labels to drop.
    axis : int or axis name
        Whether to drop labels from the index (0 / 'index') or
        columns (1 / 'columns').
    index, columns : single label or list-like
        Alternative to specifying `axis` (``labels, axis=1`` is
        equivalent to ``columns=labels``).
        .. versionadded:: 0.21.0
    level : int or level name, default None
        For MultiIndex
    inplace : bool, default False
        If True, do operation inplace and return None.
    errors : {'ignore', 'raise'}, default 'raise'
        If 'ignore', suppress error and existing labels are dropped.
    Returns
    -------
    dropped : type of caller
    Examples
    --------
    >>> df = pd.DataFrame(np.arange(12).reshape(3,4),
                          columns=['A', 'B', 'C', 'D'])
    >>> df
       A  B   C   D
    0  0  1   2   3
    1  4  5   6   7
    2  8  9  10  11
    Drop columns
    >>> df.drop(['B', 'C'], axis=1)
       A   D
    0  0   3
    1  4   7
    2  8  11
    >>> df.drop(columns=['B', 'C'])
       A   D
    0  0   3
    1  4   7
    2  8  11
    Drop a row by index
    >>> df.drop([0, 1])
       A  B   C   D
    2  8  9  10  11
    Notes
     
    In [79]:
     
     
     
     
     
    obj.drop(['d', 'c'])#
     
     
    Out[79]:
    a    0.0
    b    1.0
    e    4.0
    dtype: float64
    In [87]:
     
     
     
     
     
    data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                        index=['Ohio', 'Colorado', 'Utah', 'New York'],
                        columns=['one', 'two', 'three', 'four'])
    data
     
     
    Out[87]:
     onetwothreefour
    Ohio 0 1 2 3
    Colorado 4 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15
    In [88]:
     
     
     
     
     
    data.drop(['Colorado', 'Ohio'])
     
     
    Out[88]:
     onetwothreefour
    Utah 8 9 10 11
    New York 12 13 14 15
    In [89]:
     
     
     
     
     
    data.drop('two', axis=1)
    data.drop(['two', 'four'], axis='columns')
     
     
    Out[89]:
     onethree
    Ohio 0 2
    Colorado 4 6
    Utah 8 10
    New York 12 14
    In [92]:
     
     
     
     
     
    obj.drop('c', inplace=True)
     
     
    In [93]:
     
     
     
     
     
    obj
     
     
    Out[93]:
    a    0.0
    b    1.0
    d    3.0
    e    4.0
    dtype: float64
     

    Indexing, Selection, and Filtering

    In [94]:
     
     
     
     
     
    obj = pd.Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
    obj
     
     
    Out[94]:
    a    0.0
    b    1.0
    c    2.0
    d    3.0
    dtype: float64
    In [95]:
     
     
     
     
     
    obj['b']
     
     
    Out[95]:
    1.0
    In [96]:
     
     
     
     
     
    obj[1]
     
     
    Out[96]:
    1.0
    In [97]:
     
     
     
     
     
    obj[2:4]
     
     
    Out[97]:
    c    2.0
    d    3.0
    dtype: float64
    In [98]:
     
     
     
     
     
    obj[['b', 'a', 'd']]
     
     
    Out[98]:
    b    1.0
    a    0.0
    d    3.0
    dtype: float64
    In [99]:
     
     
     
     
     
    obj[[1, 3]]
     
     
    Out[99]:
    b    1.0
    d    3.0
    dtype: float64
    In [100]:
     
     
     
     
     
    obj[obj < 2]
     
     
    Out[100]:
    a    0.0
    b    1.0
    dtype: float64
    In [ ]:
     
     
     
     
     
     
     
    In [ ]:
     
     
     
     
     
     
     
    In [ ]:
     
     
     
     
     
     
     
    In [ ]:
     
     
     
     
     
     
     
    In [101]:
     
     
     
     
     
    obj['b':'c']
     
     
    Out[101]:
    b    1.0
    c    2.0
    dtype: float64
    In [105]:
     
     
     
     
     
    obj['b':'c'] = 5
     
     
    In [104]:
     
     
     
     
     
    obj
     
     
    Out[104]:
    a    0.0
    b    5.0
    c    5.0
    d    3.0
    dtype: float64
    In [106]:
     
     
     
     
     
    data = pd.DataFrame(np.arange(16).reshape((4, 4)),
                        index=['Ohio', 'Colorado', 'Utah', 'New York'],
                        columns=['one', 'two', 'three', 'four'])
    data
     
     
    Out[106]:
     onetwothreefour
    Ohio 0 1 2 3
    Colorado 4 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15
    In [107]:
     
     
     
     
     
    data['two']
     
     
    Out[107]:
    Ohio         1
    Colorado     5
    Utah         9
    New York    13
    Name: two, dtype: int32
    In [108]:
     
     
     
     
     
    data[['three', 'one']]
     
     
    Out[108]:
     threeone
    Ohio 2 0
    Colorado 6 4
    Utah 10 8
    New York 14 12
    In [109]:
     
     
     
     
     
    data[:2]
     
     
    Out[109]:
     onetwothreefour
    Ohio 0 1 2 3
    Colorado 4 5 6 7
    In [110]:
     
     
     
     
     
    data[data['three'] > 5]
     
     
    Out[110]:
     onetwothreefour
    Colorado 4 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15
    In [111]:
     
     
     
     
     
    data < 5
    data[data < 5] = 0
    data
     
     
    Out[111]:
     onetwothreefour
    Ohio 0 0 0 0
    Colorado 0 5 6 7
    Utah 8 9 10 11
    New York 12 13 14 15
     

    Selection with loc and iloc

    In [112]:
     
     
     
     
     
    data.loc['Colorado', ['two', 'three']]
     
     
    Out[112]:
    two      5
    three    6
    Name: Colorado, dtype: int32
    In [113]:
     
     
     
     
     
    data.iloc[2, [3, 0, 1]]
    data.iloc[2]
    data.iloc[[1, 2], [3, 0, 1]]
     
     
    Out[113]:
     fouronetwo
    Colorado 7 0 5
    Utah 11 8 9
    In [114]:
     
     
     
     
     
    data.loc[:'Utah', 'two']#标签多选
    data.iloc[:, :3][data.three > 5]#位置多选
     
     
    Out[114]:
     onetwothree
    Colorado 0 5 6
    Utah 8 9 10
    New York 12 13 14
     

    Integer Indexes

     

    ser = pd.Series(np.arange(3.)) ser ser[-1]

    In [115]:
     
     
     
     
     
    ser = pd.Series(np.arange(3.))
     
     
    In [116]:
     
     
     
     
     
    ser
     
     
    Out[116]:
    0    0.0
    1    1.0
    2    2.0
    dtype: float64
    In [117]:
     
     
     
     
     
    ser2 = pd.Series(np.arange(3.), index=['a', 'b', 'c'])
    ser2[-1]
     
     
    Out[117]:
    2.0
    In [118]:
     
     
     
     
     
    ser[:1]
     
     
    Out[118]:
    0    0.0
    dtype: float64
    In [119]:
     
     
     
     
     
    ser.loc[:1]
     
     
    Out[119]:
    0    0.0
    1    1.0
    dtype: float64
    In [120]:
     
     
     
     
     
    ser.iloc[:1]
     
     
    Out[120]:
    0    0.0
    dtype: float64
     

    Arithmetic and Data Alignment

    In [121]:
     
     
     
     
     
    s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
    s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1],
                   index=['a', 'c', 'e', 'f', 'g'])
    s1
     
     
    Out[121]:
    a    7.3
    c   -2.5
    d    3.4
    e    1.5
    dtype: float64
    In [122]:
     
     
     
     
     
    s2
     
     
    Out[122]:
    a   -2.1
    c    3.6
    e   -1.5
    f    4.0
    g    3.1
    dtype: float64
    In [123]:
     
     
     
     
     
    s1 + s2# 不匹配的不算单个,直接NaN
     
     
    Out[123]:
    a    5.2
    c    1.1
    d    NaN
    e    0.0
    f    NaN
    g    NaN
    dtype: float64
    In [124]:
     
     
     
     
     
    df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),
                       index=['Ohio', 'Texas', 'Colorado'])
    df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
                       index=['Utah', 'Ohio', 'Texas', 'Oregon'])
    df1
     
     
    Out[124]:
     bcd
    Ohio 0.0 1.0 2.0
    Texas 3.0 4.0 5.0
    Colorado 6.0 7.0 8.0
    In [125]:
     
     
     
     
     
    df2
     
     
    Out[125]:
     bde
    Utah 0.0 1.0 2.0
    Ohio 3.0 4.0 5.0
    Texas 6.0 7.0 8.0
    Oregon 9.0 10.0 11.0
    In [126]:
     
     
     
     
     
    df1 + df2
     
     
    Out[126]:
     bcde
    Colorado NaN NaN NaN NaN
    Ohio 3.0 NaN 6.0 NaN
    Oregon NaN NaN NaN NaN
    Texas 9.0 NaN 12.0 NaN
    Utah NaN NaN NaN NaN
    In [127]:
     
     
     
     
     
    df1 = pd.DataFrame({'A': [1, 2]})
    df2 = pd.DataFrame({'B': [3, 4]})
    df1
     
     
    Out[127]:
     A
    0 1
    1 2
    In [128]:
     
     
     
     
     
    df2
     
     
    Out[128]:
     B
    0 3
    1 4
    In [129]:
     
     
     
     
     
    df1 - df2 #需要 行标签 列表去都对上
     
     
    Out[129]:
     AB
    0 NaN NaN
    1 NaN NaN
     

    Arithmetic methods with fill values

    In [130]:
     
     
     
     
     
    df1 = pd.DataFrame(np.arange(12.).reshape((3, 4)),
                       columns=list('abcd'))
    df2 = pd.DataFrame(np.arange(20.).reshape((4, 5)),
                       columns=list('abcde'))
     
     
    In [131]:
     
     
     
     
     
    df2
     
     
    Out[131]:
     abcde
    0 0.0 1.0 2.0 3.0 4.0
    1 5.0 6.0 7.0 8.0 9.0
    2 10.0 11.0 12.0 13.0 14.0
    3 15.0 16.0 17.0 18.0 19.0
    In [132]:
     
     
     
     
     
    df2.loc[1, 'b'] = np.nan
     
     
    In [133]:
     
     
     
     
     
    df1
     
     
    Out[133]:
     abcd
    0 0.0 1.0 2.0 3.0
    1 4.0 5.0 6.0 7.0
    2 8.0 9.0 10.0 11.0
    In [134]:
     
     
     
     
     
    df1 + df2
     
     
    Out[134]:
     abcde
    0 0.0 2.0 4.0 6.0 NaN
    1 9.0 NaN 13.0 15.0 NaN
    2 18.0 20.0 22.0 24.0 NaN
    3 NaN NaN NaN NaN NaN
    In [135]:
     
     
     
     
     
    df1.add(df2, fill_value=0)#当找不到标签时等于0
     
     
    Out[135]:
     abcde
    0 0.0 2.0 4.0 6.0 4.0
    1 9.0 5.0 13.0 15.0 9.0
    2 18.0 20.0 22.0 24.0 14.0
    3 15.0 16.0 17.0 18.0 19.0
     
     
     
     
     
     
    Signature: df1.add(other, axis='columns', level=None, fill_value=None)
    Docstring:
    Addition of dataframe and other, element-wise (binary operator `add`).
    Equivalent to ``dataframe + other``, but with support to substitute a fill_value for
    missing data in one of the inputs.
    Parameters
    ----------
    other : Series, DataFrame, or constant
    axis : {0, 1, 'index', 'columns'}
        For Series input, axis to match Series index on
    fill_value : None or float value, default None
        Fill missing (NaN) values with this value. If both DataFrame
        locations are missing, the result will be missing
    level : int or name
        Broadcast across a level, matching Index values on the
        passed MultiIndex level
    Notes
    -----
    Mismatched indices will be unioned together
    Returns
    -------
    result : DataFrame
    See also
    --------
     
    In [136]:
     
     
     
     
     
    1 / df1
     
     
    Out[136]:
     abcd
    0 inf 1.000000 0.500000 0.333333
    1 0.250000 0.200000 0.166667 0.142857
    2 0.125000 0.111111 0.100000 0.090909
    In [138]:
     
     
     
     
     
    df1.rdiv(1)
     
     
    Out[138]:
     abcd
    0 inf 1.000000 0.500000 0.333333
    1 0.250000 0.200000 0.166667 0.142857
    2 0.125000 0.111111 0.100000 0.090909
     
     
     
     
     
     
    Signature: df1.rdiv(other, axis='columns', level=None, fill_value=None)
    Docstring:
    Floating division of dataframe and other, element-wise (binary operator `rtruediv`).
    Equivalent to ``other / dataframe``, but with support to substitute a fill_value for
    missing data in one of the inputs.
    Parameters
    ----------
    other : Series, DataFrame, or constant
    axis : {0, 1, 'index', 'columns'}
        For Series input, axis to match Series index on
    fill_value : None or float value, default None
        Fill missing (NaN) values with this value. If both DataFrame
        locations are missing, the result will be missing
    level : int or name
        Broadcast across a level, matching Index values on the
        passed MultiIndex level
    Notes
    -----
    Mismatched indices will be unioned together
    Returns
    -------
    result : DataFrame
    See also
    --------
     
    In [140]:
     
     
     
     
     
    df1.reindex(columns=df2.columns, fill_value=0)
     
     
    Out[140]:
     abcde
    0 0.0 1.0 2.0 3.0 0
    1 4.0 5.0 6.0 7.0 0
    2 8.0 9.0 10.0 11.0 0
    In [143]:
     
     
     
     
     
    df1.reindex(index=df2.index,columns=df2.columns, fill_value=np.pi)
     
     
    Out[143]:
     abcde
    0 0.000000 1.000000 2.000000 3.000000 3.141593
    1 4.000000 5.000000 6.000000 7.000000 3.141593
    2 8.000000 9.000000 10.000000 11.000000 3.141593
    3 3.141593 3.141593 3.141593 3.141593 3.141593
     

    Operations between DataFrame and Series

    In [144]:
     
     
     
     
     
    arr = np.arange(12.).reshape((3, 4))
    arr
     
     
    Out[144]:
    array([[ 0.,  1.,  2.,  3.],
           [ 4.,  5.,  6.,  7.],
           [ 8.,  9., 10., 11.]])
    In [145]:
     
     
     
     
     
    arr[0]
     
     
    Out[145]:
    array([0., 1., 2., 3.])
    In [146]:
     
     
     
     
     
    arr - arr[0]
     
     
    Out[146]:
    array([[0., 0., 0., 0.],
           [4., 4., 4., 4.],
           [8., 8., 8., 8.]])
    In [147]:
     
     
     
     
     
    frame = pd.DataFrame(np.arange(12.).reshape((4, 3)),
                         columns=list('bde'),
                         index=['Utah', 'Ohio', 'Texas', 'Oregon'])
     
     
    In [149]:
     
     
     
     
     
    series = frame.iloc[0]
    frame
     
     
    Out[149]:
     bde
    Utah 0.0 1.0 2.0
    Ohio 3.0 4.0 5.0
    Texas 6.0 7.0 8.0
    Oregon 9.0 10.0 11.0
    In [150]:
     
     
     
     
     
    series
     
     
    Out[150]:
    b    0.0
    d    1.0
    e    2.0
    Name: Utah, dtype: float64
    In [ ]:
     
     
     
     
     
     
     
    In [ ]:
     
     
     
     
     
     
     
    In [151]:
     
     
     
     
     
    frame - series
     
     
    Out[151]:
     bde
    Utah 0.0 0.0 0.0
    Ohio 3.0 3.0 3.0
    Texas 6.0 6.0 6.0
    Oregon 9.0 9.0 9.0
    In [152]:
     
     
     
     
     
    series2 = pd.Series(range(3), index=['b', 'e', 'f'])
    frame + series2
     
     
    Out[152]:
     bdef
    Utah 0.0 NaN 3.0 NaN
    Ohio 3.0 NaN 6.0 NaN
    Texas 6.0 NaN 9.0 NaN
    Oregon 9.0 NaN 12.0 NaN
    In [153]:
     
     
     
     
     
    series3 = frame['d']
    frame
     
     
    Out[153]:
     bde
    Utah 0.0 1.0 2.0
    Ohio 3.0 4.0 5.0
    Texas 6.0 7.0 8.0
    Oregon 9.0 10.0 11.0
    In [154]:
     
     
     
     
     
    series3
     
     
    Out[154]:
    Utah       1.0
    Ohio       4.0
    Texas      7.0
    Oregon    10.0
    Name: d, dtype: float64
    In [155]:
     
     
     
     
     
    frame.sub(series3, axis='index')#标签减法
     
     
    Out[155]:
     bde
    Utah -1.0 0.0 1.0
    Ohio -1.0 0.0 1.0
    Texas -1.0 0.0 1.0
    Oregon -1.0 0.0 1.0
     
     
     
     
     
     
    Signature: frame.sub(other, axis='columns', level=None, fill_value=None)
    Docstring:
    Subtraction of dataframe and other, element-wise (binary operator `sub`).
    Equivalent to ``dataframe - other``, but with support to substitute a fill_value for
    missing data in one of the inputs.
    Parameters
    ----------
    other : Series, DataFrame, or constant
    axis : {0, 1, 'index', 'columns'}
        For Series input, axis to match Series index on
    fill_value : None or float value, default None
        Fill missing (NaN) values with this value. If both DataFrame
        locations are missing, the result will be missing
    level : int or name
        Broadcast across a level, matching Index values on the
        passed MultiIndex level
    Notes
    -----
    Mismatched indices will be unioned together
    Returns
    -------
    result : DataFrame
    See also
    --------
    DataFrame.rsub
     
     

    Function Application and Mapping

    In [156]:
     
     
     
     
     
    frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
                         index=['Utah', 'Ohio', 'Texas', 'Oregon'])
    frame
     
     
    Out[156]:
     bde
    Utah -0.204708 0.478943 -0.519439
    Ohio -0.555730 1.965781 1.393406
    Texas 0.092908 0.281746 0.769023
    Oregon 1.246435 1.007189 -1.296221
    In [158]:
     
     
     
     
     
    np.abs(frame)#取绝对值
     
     
    Out[158]:
     bde
    Utah 0.204708 0.478943 0.519439
    Ohio 0.555730 1.965781 1.393406
    Texas 0.092908 0.281746 0.769023
    Oregon 1.246435 1.007189 1.296221
     
     
     
     
     
     
    Call signature:  np.abs(*args, **kwargs)
    Type:            ufunc
    String form:     <ufunc 'absolute'>
    File:            c:usersqq123anaconda3libsite-packages
    umpy\__init__.py
    Docstring:      
    absolute(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj])
    Calculate the absolute value element-wise.
    Parameters
    ----------
    x : array_like
        Input array.
    out : ndarray, None, or tuple of ndarray and None, optional
        A location into which the result is stored. If provided, it must have
        a shape that the inputs broadcast to. If not provided or `None`,
        a freshly-allocated array is returned. A tuple (possible only as a
        keyword argument) must have length equal to the number of outputs.
    where : array_like, optional
        Values of True indicate to calculate the ufunc at that position, values
        of False indicate to leave the value in the output alone.
    **kwargs
        For other keyword-only arguments, see the
        :ref:`ufunc docs <ufuncs.kwargs>`.
    Returns
    -------
    absolute : ndarray
        An ndarray containing the absolute value of
        each element in `x`.  For complex input, ``a + ib``, the
        absolute value is :math:`sqrt{ a^2 + b^2 }`.
    Examples
    --------
    >>> x = np.array([-1.2, 1.2])
    >>> np.absolute(x)
    array([ 1.2,  1.2])
    >>> np.absolute(1.2 + 1j)
    1.5620499351813308
    Plot the function over ``[-10, 10]``:
    >>> import matplotlib.pyplot as plt
    >>> x = np.linspace(start=-10, stop=10, num=101)
    >>> plt.plot(x, np.absolute(x))
    >>> plt.show()
    Plot the function over the complex plane:
    >>> xx = x + 1j * x[:, np.newaxis]
    >>> plt.imshow(np.abs(xx), extent=[-10, 10, -10, 10], cmap='gray')
    >>> plt.show()
    Class docstring:
    Functions that operate element by element on whole arrays.
    To see the documentation for a specific ufunc, use `info`.  For
    example, ``np.info(np.sin)``.  Because ufuncs are written in C
    (for speed) and linked into Python with NumPy's ufunc facility,
    Python's help() function finds this page whenever help() is called
    on a ufunc.
    A detailed explanation of ufuncs can be found in the docs for :ref:`ufuncs`.
    Calling ufuncs:
    ===============
    op(*x[, out], where=True, **kwargs)
    Apply `op` to the arguments `*x` elementwise, broadcasting the arguments.
    The broadcasting rules are:
    * Dimensions of length 1 may be prepended to either array.
    * Arrays may be repeated along dimensions of length 1.
    Parameters
    ----------
    *x : array_like
        Input arrays.
    out : ndarray, None, or tuple of ndarray and None, optional
        Alternate array object(s) in which to put the result; if provided, it
        must have a shape that the inputs broadcast to. A tuple of arrays
        (possible only as a keyword argument) must have length equal to the
        number of outputs; use `None` for outputs to be allocated by the ufunc.
    where : array_like, optional
        Values of True indicate to calculate the ufunc at that position, values
        of False indicate to leave the value in the output alone.
    **kwargs
        For other keyword-only arguments, see the :ref:`ufunc docs <ufuncs.kwargs>`.
    Returns
    -------
    r : ndarray or tuple of ndarray
        `r` will have the shape that the arrays in `x` broadcast to; if `out` is
        provided, `r` will be equal to `out`. If the function has more than one
        output, then the result will be a tuple of arrays.
     
    In [160]:
     
     
     
     
     
    f = lambda x: x.max() - x.min()
    frame.apply(f)#每列最大值减最小值
     
     
    Out[160]:
    b    1.802165
    d    1.684034
    e    2.689627
    dtype: float64
    In [161]:
     
     
     
     
     
    frame.apply(f,axis=1)#每一行最大值减最小值
     
     
    Out[161]:
    Utah      0.998382
    Ohio      2.521511
    Texas     0.676115
    Oregon    2.542656
    dtype: float64
    In [162]:
     
     
     
     
     
    frame.apply(f, axis='columns')
     
     
    Out[162]:
    Utah      0.998382
    Ohio      2.521511
    Texas     0.676115
    Oregon    2.542656
    dtype: float64
    In [164]:
     
     
     
     
     
    def f(x):
        return pd.Series([x.min(), x.max()], index=['min', 'max'])
    frame.apply(f)
     
     
    Out[164]:
     bde
    min -0.555730 0.281746 -1.296221
    max 1.246435 1.965781 1.393406
    In [165]:
     
     
     
     
     
    def f(x):
        return pd.Series([x.min(), x.max()], index=['min', 'max'])
    frame.apply(f,axis=1)
     
     
    Out[165]:
     minmax
    Utah -0.519439 0.478943
    Ohio -0.555730 1.965781
    Texas 0.092908 0.769023
    Oregon -1.296221 1.246435
    In [166]:
     
     
     
     
     
    format = lambda x: '%.2f' % x #设置格式
    frame.applymap(format)
     
     
    Out[166]:
     bde
    Utah -0.20 0.48 -0.52
    Ohio -0.56 1.97 1.39
    Texas 0.09 0.28 0.77
    Oregon 1.25 1.01 -1.30
    In [167]:
     
     
     
     
     
    frame['e'].map(format)
     
     
    Out[167]:
    Utah      -0.52
    Ohio       1.39
    Texas      0.77
    Oregon    -1.30
    Name: e, dtype: object
     

    Sorting and Ranking

    In [168]:
     
     
     
     
     
    obj = pd.Series(range(4), index=['d', 'a', 'b', 'c'])
    obj.sort_index()
     
     
    Out[168]:
    a    1
    b    2
    c    3
    d    0
    dtype: int64
    In [171]:
     
     
     
     
     
    frame = pd.DataFrame(np.arange(8).reshape((2, 4)),
                         index=['three', 'one'],
                         columns=['d', 'a', 'b', 'c'])
    frame.sort_index()
     
     
    Out[171]:
     dabc
    one 4 5 6 7
    three 0 1 2 3
     
     
     
     
     
     
    Signature: frame.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, by=None)
    Docstring:
    Sort object by labels (along an axis)
    Parameters
    ----------
    axis : index, columns to direct sorting
    level : int or level name or list of ints or list of level names
        if not None, sort on values in specified index level(s)
    ascending : boolean, default True
        Sort ascending vs. descending
    inplace : bool, default False
        if True, perform operation in-place
    kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
         Choice of sorting algorithm. See also ndarray.np.sort for more
         information.  `mergesort` is the only stable algorithm. For
         DataFrames, this option is only applied when sorting on a single
         column or label.
    na_position : {'first', 'last'}, default 'last'
         `first` puts NaNs at the beginning, `last` puts NaNs at the end.
         Not implemented for MultiIndex.
    sort_remaining : bool, default True
        if true and sorting by level and index is multilevel, sort by other
        levels too (in order) after sorting by specified level
    Returns
    -------
    sorted_obj : DataFrame
    File:      c:usersqq123anaconda3libsite-packagespandascoreframe.py
    Type:      method
     
    In [170]:
     
     
     
     
     
    frame.sort_index(axis=1)#列排序
     
     
    Out[170]:
     abcd
    three 1 2 3 0
    one 5 6 7 4
    In [172]:
     
     
     
     
     
    frame.sort_index(axis=1, ascending=False)#降序
     
     
    Out[172]:
     dcba
    three 0 3 2 1
    one 4 7 6 5
    In [ ]:
     
     
     
     
     
    obj = pd.Series([4, 7, -3, 2])
    obj.sort_values()
     
     
    In [ ]:
     
     
     
     
     
    obj = pd.Series([4, np.nan, 7, np.nan, -3, 2])
    obj.sort_values()
     
     
    In [173]:
     
     
     
     
     
    frame = pd.DataFrame({'b': [4, 7, -3, 2], 'a': [0, 1, 0, 1]})
    frame
     
     
    Out[173]:
     ab
    0 0 4
    1 1 7
    2 0 -3
    3 1 2
    In [174]:
     
     
     
     
     
    frame.sort_values(by='b')
     
     
    Out[174]:
     ab
    2 0 -3
    3 1 2
    0 0 4
    1 1 7
     
     
     
     
     
     
    Signature: frame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
    Docstring:
    Sort by the values along either axis
    .. versionadded:: 0.17.0
    Parameters
    ----------
    by : str or list of str
        Name or list of names which refer to the axis items.
    axis : {0 or 'index', 1 or 'columns'}, default 0
        Axis to direct sorting
    ascending : bool or list of bool, default True
         Sort ascending vs. descending. Specify list for multiple sort
         orders.  If this is a list of bools, must match the length of
         the by.
    inplace : bool, default False
         if True, perform operation in-place
    kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
         Choice of sorting algorithm. See also ndarray.np.sort for more
         information.  `mergesort` is the only stable algorithm. For
         DataFrames, this option is only applied when sorting on a single
         column or label.
    na_position : {'first', 'last'}, default 'last'
         `first` puts NaNs at the beginning, `last` puts NaNs at the end
    Returns
    -------
    sorted_obj : DataFrame
    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'],
    ...     'col2' : [2, 1, 9, 8, 7, 4],
    ...     'col3': [0, 1, 9, 4, 2, 3],
    ... })
    >>> df
        col1 col2 col3
    0   A    2    0
    1   A    1    1
    2   B    9    9
    3   NaN  8    4
    4   D    7    2
    5   C    4    3
    Sort by col1
    >>> df.sort_values(by=['col1'])
        col1 col2 col3
    0   A    2    0
    1   A    1    1
    2   B    9    9
    5   C    4    3
    4   D    7    2
    3   NaN  8    4
    Sort by multiple columns
    >>> df.sort_values(by=['col1', 'col2'])
        col1 col2 col3
    1   A    1    1
    0   A    2    0
    2   B    9    9
    5   C    4    3
    4   D    7    2
    3   NaN  8    4
    Sort Descending
    >>> df.sort_values(by='col1', ascending=False)
        col1 col2 col3
    4   D    7    2
    5   C    4    3
    2   B    9    9
    0   A    2    0
    1   A    1    1
    3   NaN  8    4
    Putting NAs first
    >>> df.sort_values(by='col1', ascending=False, na_position='first')
        col1 col2 col3
    3   NaN  8    4
    4   D    7    2
    5   C    4    3
    2   B    9    9
    0   A    2    0
    1   A    1    1
     
    In [176]:
     
     
     
     
     
    frame.sort_values(by=['a', 'b'])
     
     
    Out[176]:
     ab
    2 0 -3
    0 0 4
    3 1 2
    1 1 7
    In [175]:
     
     
     
     
     
    obj = pd.Series([7, -5, 7, 4, 2, 0, 4])
     
     
    In [179]:
     
     
     
     
     
    obj.rank()
     
     
    Out[179]:
    0    6.5
    1    1.0
    2    6.5
    3    4.5
    4    3.0
    5    2.0
    6    4.5
    dtype: float64
    In [180]:
     
     
     
     
     
    obj.rank(pct=True)
     
     
    Out[180]:
    0    0.928571
    1    0.142857
    2    0.928571
    3    0.642857
    4    0.428571
    5    0.285714
    6    0.642857
    dtype: float64
     

    Signature: obj.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False) Docstring: Compute numerical data ranks (1 through n) along axis. Equal values are assigned a rank that is the average of the ranks of those values

    Parameters

    axis : {0 or 'index', 1 or 'columns'}, default 0 index to direct ranking method : {'average', 'min', 'max', 'first', 'dense'}

    * average: average rank of group
    * min: lowest rank in group
    * max: highest rank in group
    * first: ranks assigned in order they appear in the array
    * dense: like 'min', but rank always increases by 1 between groups

    numeric_only : boolean, default None Include only float, int, boolean data. Valid only for DataFrame or Panel objects na_option : {'keep', 'top', 'bottom'}

    * keep: leave NA values where they are
    * top: smallest rank if ascending
    * bottom: smallest rank if descending

    ascending : boolean, default True False for ranks by high (1) to low (N) pct : boolean, default False Computes percentage rank of data

    Returns

    In [181]:
     
     
     
     
     
    obj.rank(method='first')
     
     
    Out[181]:
    0    6.0
    1    1.0
    2    7.0
    3    4.0
    4    3.0
    5    2.0
    6    5.0
    dtype: float64
    In [182]:
     
     
     
     
     
    # 数值相同去最大排名
    obj.rank(ascending=False, method='max')
     
     
    Out[182]:
    0    2.0
    1    7.0
    2    2.0
    3    4.0
    4    5.0
    5    6.0
    6    4.0
    dtype: float64
    In [183]:
     
     
     
     
     
    frame = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1],
                          'c': [-2, 5, 8, -2.5]})
    frame
    frame.rank(axis='columns')#每行大小排名
     
     
    Out[183]:
     abc
    0 2.0 3.0 1.0
    1 1.0 3.0 2.0
    2 2.0 1.0 3.0
    3 2.0 3.0 1.0
     

    Axis Indexes with Duplicate Labels

    In [184]:
     
     
     
     
     
    obj = pd.Series(range(5), index=['a', 'a', 'b', 'b', 'c'])
    obj
     
     
    Out[184]:
    a    0
    a    1
    b    2
    b    3
    c    4
    dtype: int64
    In [185]:
     
     
     
     
     
    obj.index.is_unique#判断行标签是否无重复
     
     
    Out[185]:
    False
    In [187]:
     
     
     
     
     
    obj['a']
     
     
    Out[187]:
    a    0
    a    1
    dtype: int64
    In [188]:
     
     
     
     
     
    obj['c']
     
     
    Out[188]:
    4
    In [189]:
     
     
     
     
     
    df = pd.DataFrame(np.random.randn(4, 3), index=['a', 'a', 'b', 'b'])
    df
    df.loc['b']
     
     
    Out[189]:
     012
    b 1.669025 -0.438570 -0.539741
    b 0.476985 3.248944 -1.021228
     

    Summarizing and Computing Descriptive Statistics

    In [190]:
     
     
     
     
     
    df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],
                       [np.nan, np.nan], [0.75, -1.3]],
                      index=['a', 'b', 'c', 'd'],
                      columns=['one', 'two'])
    df
     
     
    Out[190]:
     onetwo
    a 1.40 NaN
    b 7.10 -4.5
    c NaN NaN
    d 0.75 -1.3
    In [198]:
     
     
     
     
     
    df.sum()
     
     
    Out[198]:
    one    9.25
    two   -5.80
    dtype: float64
     
     
     
     
     
     
    Signature: df.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
    Docstring:
     
    In [192]:
     
     
     
     
     
    df.sum(axis='columns')
     
     
    Out[192]:
    a    1.40
    b    2.60
    c    0.00
    d   -0.55
    dtype: float64
    In [193]:
     
     
     
     
     
    df.mean(axis='columns', skipna=False)#强制不跳过NaN
     
     
    Out[193]:
    a      NaN
    b    1.300
    c      NaN
    d   -0.275
    dtype: float64
    In [196]:
     
     
     
     
     
    df.idxmax()
     
     
    Out[196]:
    one    b
    two    d
    dtype: object
    In [197]:
     
     
     
     
     
    df.idxmax(axis=1)
     
     
    Out[197]:
    a    one
    b    one
    c    NaN
    d    one
    dtype: object
     
     
     
     
     
     
    Signature: df.idxmax(axis=0, skipna=True)
    Docstring:
    Return index of first occurrence of maximum over requested axis.
    NA/null values are excluded.
    Parameters
    ----------
    axis : {0 or 'index', 1 or 'columns'}, default 0
        0 or 'index' for row-wise, 1 or 'columns' for column-wise
    skipna : boolean, default True
        Exclude NA/null values. If an entire row/column is NA, the result
        will be NA.
    Raises
    ------
    ValueError
        * If the row/column is empty
    Returns
    -------
    idxmax : Series
    Notes
    -----
    This method is the DataFrame version of ``ndarray.argmax``.
    See Also
    --------
    Series.idxmax
     
    In [200]:
     
     
     
     
     
    df.cumsum()
     
     
    Out[200]:
     onetwo
    a 1.40 NaN
    b 8.50 -4.5
    c NaN NaN
    d 9.25 -5.8
     
     
     
     
     
     
    Signature: df.cumsum(axis=None, skipna=True, *args, **kwargs)
    Docstring:
    Return cumulative sum over requested axis.
    Parameters
    ----------
    axis : {index (0), columns (1)}
    skipna : boolean, default True
        Exclude NA/null values. If an entire row/column is NA, the result
        will be NA
    Returns
    -------
    cumsum : Series
     
    In [202]:
     
     
     
     
     
    df.describe()
     
     
    Out[202]:
     onetwo
    count 3.000000 2.000000
    mean 3.083333 -2.900000
    std 3.493685 2.262742
    min 0.750000 -4.500000
    25% 1.075000 -3.700000
    50% 1.400000 -2.900000
    75% 4.250000 -2.100000
    max 7.100000 -1.300000
     

    Signature: df.describe(percentiles=None, include=None, exclude=None) Docstring: Generates descriptive statistics that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values.

    Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.

    Parameters

    percentiles : list-like of numbers, optional The percentiles to include in the output. All should fall between 0 and 1. The default is [.25, .5, .75], which returns the 25th, 50th, and 75th percentiles. include : 'all', list-like of dtypes or None (default), optional A white list of data types to include in the result. Ignored for Series. Here are the options:

    - 'all' : All columns of the input will be included in the output.
    - A list-like of dtypes : Limits the results to the
      provided data types.
      To limit the result to numeric types submit
      ``numpy.number``. To limit it instead to object columns submit
      the ``numpy.object`` data type. Strings
      can also be used in the style of
      ``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
      select pandas categorical columns, use ``'category'``
    - None (default) : The result will include all numeric columns.

    exclude : list-like of dtypes or None (default), optional, A black list of data types to omit from the result. Ignored for Series. Here are the options:

    - A list-like of dtypes : Excludes the provided data types
      from the result. To exclude numeric types submit
      ``numpy.number``. To exclude object columns submit the data
      type ``numpy.object``. Strings can also be used in the style of
      ``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
      exclude pandas categorical columns, use ``'category'``
    - None (default) : The result will exclude nothing.

    Returns

    summary: Series/DataFrame of summary statistics

    Notes

    For numeric data, the result's index will include countmeanstdminmax as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50percentile is the same as the median.

    For object data (e.g. strings or timestamps), the result's index will include countuniquetop, and freq. The top is the most common value. The freq is the most common value's frequency. Timestamps also include the first and last items.

    If multiple object values have the highest count, then the count and top results will be arbitrarily chosen from among those with the highest count.

    For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

    The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.

    Examples

    Describing a numeric Series.

    s = pd.Series([1, 2, 3]) s.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0

    Describing a categorical Series.

    s = pd.Series(['a', 'a', 'b', 'c']) s.describe() count 4 unique 3 top a freq 2 dtype: object

    Describing a timestamp Series.

    s = pd.Series([ ... np.datetime64("2000-01-01"), ... np.datetime64("2010-01-01"), ... np.datetime64("2010-01-01") ... ]) s.describe() count 3 unique 2 top 2010-01-01 00:00:00 freq 2 first 2000-01-01 00:00:00 last 2010-01-01 00:00:00 dtype: object

    Describing a DataFrame. By default only numeric fields are returned.

    df = pd.DataFrame({ 'object': ['a', 'b', 'c'], ... 'numeric': [1, 2, 3], ... 'categorical': pd.Categorical(['d','e','f']) ... }) df.describe() numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0

    Describing all columns of a DataFrame regardless of data type.

    df.describe(include='all') categorical numeric object count 3 3.0 3 unique 3 NaN 3 top f NaN c freq 1 NaN 1 mean NaN 2.0 NaN std NaN 1.0 NaN min NaN 1.0 NaN 25% NaN 1.5 NaN 50% NaN 2.0 NaN 75% NaN 2.5 NaN max NaN 3.0 NaN

    Describing a column from a DataFrame by accessing it as an attribute.

    df.numeric.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 Name: numeric, dtype: float64

    Including only numeric columns in a DataFrame description.

    df.describe(include=[np.number]) numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0

    Including only string columns in a DataFrame description.

    df.describe(include=[np.object]) object count 3 unique 3 top c freq 1

    Including only categorical columns from a DataFrame description.

    df.describe(include=['category']) categorical count 3 unique 3 top f freq 1

    Excluding numeric columns from a DataFrame description.

    df.describe(exclude=[np.number]) categorical object count 3 3 unique 3 3 top f c freq 1 1

    Excluding object columns from a DataFrame description.

    df.describe(exclude=[np.object]) categorical numeric count 3 3.0 unique 3 NaN top f NaN freq 1 NaN mean NaN 2.0 std NaN 1.0 min NaN 1.0 25% NaN 1.5 50% NaN 2.0 75% NaN 2.5 max NaN 3.0

    See Also

    DataFrame.count DataFrame.max DataFrame.min DataFrame.mean DataFrame.std DataFrame.select_dtypes

    In [203]:
     
     
     
     
     
    obj = pd.Series(['a', 'a', 'b', 'c'] * 4)
    obj.describe()
     
     
    Out[203]:
    count     16
    unique     3
    top        a
    freq       8
    dtype: object
     

    Correlation and Covariance

     

    conda install pandas-datareader

    In [204]:
     
     
     
     
     
    price = pd.read_pickle('examples/yahoo_price.pkl')
    volume = pd.read_pickle('examples/yahoo_volume.pkl')
     
     
     

    import pandas_datareader.data as web all_data = {ticker: web.get_data_yahoo(ticker) for ticker in ['AAPL', 'IBM', 'MSFT', 'GOOG']}

    price = pd.DataFrame({ticker: data['Adj Close'] for ticker, data in all_data.items()}) volume = pd.DataFrame({ticker: data['Volume'] for ticker, data in all_data.items()})

    In [205]:
     
     
     
     
     
    returns = price.pct_change()
    returns.tail()
     
     
    Out[205]:
     AAPLGOOGIBMMSFT
    Date    
    2016-10-17 -0.000680 0.001837 0.002072 -0.003483
    2016-10-18 -0.000681 0.019616 -0.026168 0.007690
    2016-10-19 -0.002979 0.007846 0.003583 -0.002255
    2016-10-20 -0.000512 -0.005652 0.001719 -0.004867
    2016-10-21 -0.003930 0.003011 -0.012474 0.042096
    In [207]:
     
     
     
     
     
    returns['MSFT'].corr(returns['IBM'])#相关性
     
     
    Out[207]:
    0.4997636114415116
    In [208]:
     
     
     
     
     
    returns['MSFT'].cov(returns['IBM'])#协方差
     
     
    Out[208]:
    8.870655479703549e-05
    In [209]:
     
     
     
     
     
    returns.MSFT.corr(returns.IBM)
     
     
    Out[209]:
    0.4997636114415116
    In [214]:
     
     
     
     
     
    returns.corr()
     
     
    Out[214]:
     AAPLGOOGIBMMSFT
    AAPL 1.000000 0.407919 0.386817 0.389695
    GOOG 0.407919 1.000000 0.405099 0.465919
    IBM 0.386817 0.405099 1.000000 0.499764
    MSFT 0.389695 0.465919 0.499764 1.000000
    In [212]:
     
     
     
     
     
    returns.cov()
     
     
    Out[212]:
     AAPLGOOGIBMMSFT
    AAPL 0.000277 0.000107 0.000078 0.000095
    GOOG 0.000107 0.000251 0.000078 0.000108
    IBM 0.000078 0.000078 0.000146 0.000089
    MSFT 0.000095 0.000108 0.000089 0.000215
    In [ ]:
     
     
     
     
     
    returns.corrwith(returns.IBM)
     
     
    In [ ]:
     
     
     
     
     
    returns.corrwith(volume)
     
     
     

    Unique Values, Value Counts, and Membership

    In [215]:
     
     
     
     
     
    obj = pd.Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])
     
     
    In [216]:
     
     
     
     
     
    uniques = obj.unique()
    uniques
     
     
    Out[216]:
    array(['c', 'a', 'd', 'b'], dtype=object)
    In [217]:
     
     
     
     
     
    obj.value_counts()
     
     
    Out[217]:
    c    3
    a    3
    b    2
    d    1
    dtype: int64
    In [218]:
     
     
     
     
     
    pd.value_counts(obj.values, sort=False)
     
     
    Out[218]:
    b    2
    a    3
    c    3
    d    1
    dtype: int64
    In [ ]:
     
     
     
     
     
    obj
    mask = obj.isin(['b', 'c'])
    mask
    obj[mask]
     
     
    In [220]:
     
     
     
     
     
    to_match = pd.Series(['c', 'a', 'b', 'b', 'c', 'a'])
    unique_vals = pd.Series(['c', 'b', 'a'])
    pd.Index(unique_vals).get_indexer(to_match)
     
     
    Out[220]:
    array([0, 2, 1, 1, 0, 2], dtype=int64)
    In [221]:
     
     
     
     
     
    unique_vals
     
     
    Out[221]:
    0    c
    1    b
    2    a
    dtype: object
    In [222]:
     
     
     
     
     
    data = pd.DataFrame({'Qu1': [1, 3, 4, 3, 4],
                         'Qu2': [2, 3, 1, 2, 3],
                         'Qu3': [1, 5, 2, 4, 4]})
    data
     
     
    Out[222]:
     Qu1Qu2Qu3
    0 1 2 1
    1 3 3 5
    2 4 1 2
    3 3 2 4
    4 4 3 4
    In [225]:
     
     
     
     
     
    result = data.apply(pd.value_counts).fillna(0)
    result
     
     
    Out[225]:
     Qu1Qu2Qu3
    1 1.0 1.0 1.0
    2 0.0 2.0 1.0
    3 2.0 2.0 0.0
    4 2.0 0.0 2.0
    5 0.0 0.0 1.0
     

    Conclusion

    In [226]:
     
     
     
     
     
    pd.options.display.max_rows = PREVIOUS_MAX_ROWS
     
     
    In [ ]:
     
     
     
     
     
  • 相关阅读:
    BZOJ3573: [Hnoi2014]米特运输
    BZOJ3531: [Sdoi2014]旅行
    BZOJ3505: [Cqoi2014]数三角形
    BZOJ3309: DZY Loves Math
    BZOJ3260: 跳
    BZOJ3252: 攻略
    BZOJ3226: [Sdoi2008]校门外的区间
    BZOJ3155: Preprefix sum
    BZOJ2843: 极地旅行社
    BZOJ2671: Calc
  • 原文地址:https://www.cnblogs.com/romannista/p/10689353.html
Copyright © 2020-2023  润新知