• cut qcut


    factors = np.random.randn(30)
    
    In [11]:
    pd.cut(factors, 5)
    Out[11]:
    [(-0.411, 0.575], (-0.411, 0.575], (-0.411, 0.575], (-0.411, 0.575], (0.575, 1.561], ..., (-0.411, 0.575], (-1.397, -0.411], (0.575, 1.561], (-2.388, -1.397], (-0.411, 0.575]]
    Length: 30
    Categories (5, object): [(-2.388, -1.397] < (-1.397, -0.411] < (-0.411, 0.575] < (0.575, 1.561] < (1.561, 2.547]]
    
    In [14]:
    pd.qcut(factors, 5)
    Out[14]:
    [(-0.348, 0.0899], (-0.348, 0.0899], (0.0899, 1.19], (0.0899, 1.19], (0.0899, 1.19], ..., (0.0899, 1.19], (-1.137, -0.348], (1.19, 2.547], [-2.383, -1.137], (-0.348, 0.0899]]
    Length: 30
    Categories (5, object): [[-2.383, -1.137] < (-1.137, -0.348] < (-0.348, 0.0899] < (0.0899, 1.19] < (1.19, 2.547]]`

    cut是等距,qcut是等频

    qcut方法,参考链接:http://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html

      1).参数:pandas.qcut(xqlabels=Noneretbins=Falseprecision=3duplicates='raise')

        >>>x 要进行分组的数据,数据类型为一维数组,或Series对象

        >>>q 组数,即要将数据分成几组,后边举例说明

        >>>labels 可以理解为组标签,这里注意标签个数要和组数相等

        >>>retbins 默认为False,当为False时,返回值是Categorical类型(具有value_counts()方法),为True是返回值是元组

    pandas.cut:

    pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)

    参数:

    1. x,类array对象,且必须为一维,待切割的原形式
    2. bins, 整数、序列尺度、或间隔索引。如果bins是一个整数,它定义了x宽度范围内的等宽面元数量,但是在这种情况下,x的范围在每个边上被延长1%,以保证包括x的最小值或最大值。如果bin是序列,它定义了允许非均匀bin宽度的bin边缘。在这种情况下没有x的范围的扩展。
    3. right,布尔值。是否是左开右闭区间
    4. labels,用作结果箱的标签。必须与结果箱相同长度。如果FALSE,只返回整数指标面元。
    5. retbins,布尔值。是否返回面元组
    6. precision,整数。返回面元的小数点几位
    7. include_lowest,布尔值。第一个区间的左端点是否包含

    返回值:

      1. ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]
      2.  
         
      3.  
        bins = [18, 25, 35, 60, 100]
      4.  
        cats = pd.cut(ages, bins)
      5.  

    若labels为False则返回整数填充的Categorical或数组或Series 
    若retbins为True还返回用浮点数填充的N维数组

    demo:

    >>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3, retbins=True)
    ... 
    ([(0.19, 3.367], (0.19, 3.367], (0.19, 3.367], (3.367, 6.533], ...
    Categories (3, interval[float64]): [(0.19, 3.367] < (3.367, 6.533] ...
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    >>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]),
    ...        3, labels=["good", "medium", "bad"])
    ... 
    [good, good, good, medium, bad, good]
    Categories (3, object): [good < medium < bad] 
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    >>> pd.cut(np.ones(5), 4, labels=False)
    array([1, 1, 1, 1, 1])

    pandas.qcut

    pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates=’raise’)

    参数:

    1.x 
    2.q,整数或分位数组成的数组。 
    3.labels, 
    4.retbins 
    5.precisoon 
    6.duplicates 
    结果中超过边界的值将会变成NA

    demo:

    >>> pd.qcut(range(5), 4)
    ... 
    [(-0.001, 1.0], (-0.001, 1.0], (1.0, 2.0], (2.0, 3.0], (3.0, 4.0]]
    Categories (4, interval[float64]): [(-0.001, 1.0] < (1.0, 2.0] ...
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    >>> pd.qcut(range(5), 3, labels=["good", "medium", "bad"])
    ... 
    [good, good, medium, bad, bad]
    Categories (3, object): [good < medium < bad]
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    pd.qcut(range(5), 4, labels=False)
    array([0, 0, 1, 2, 3])
     
  • 相关阅读:
    Python Twelfth Day
    Python Tenth Day
    Python Ninth Day
    Python Eighth Day
    Python Seventh Day
    Python Sixth Day
    Python Fifth Day
    Python Fourth Day
    Python Third Day
    金融量化分析-python量化分析系列之---使用python的tushare包获取股票历史数据和实时分笔数据
  • 原文地址:https://www.cnblogs.com/fujian-code/p/9263449.html
Copyright © 2020-2023  润新知