• 03. Pandas 2| 时间序列


     时间序列

    1.时间模块:datetime

    datetime模块,主要掌握:datetime.date(), datetime.datetime(), datetime.timedelta()

    日期解析方法:parser.parse

    datetime.date:date对象

    import datetime #也可以写成 from datetime import date
    today = datetime.date.today()
    print(today, type(today)) #2018-08-21 <class 'datetime.date'>
    print(str(today), type(str(today)))#2018-08-21 <class 'str'>
    t = datetime.date(2018, 12, 8)
    print(t)#2018-12-08

     datetime.date.today()  返回今日
     输出格式为 date类

    datetime.datetime:datetime对象

    now = datetime.datetime.now()
    print(now, type(now)) #2018-08-21 19:22:47.296548 <class 'datetime.datetime'>
    print(str(now), type(str(now))) #2018-08-21 19:23:26.139769 <class 'str'>
    t1 = datetime.datetime(2018, 8, 1)
    t2 = datetime.datetime(2014, 9, 1, 12, 12, 12)
    print(t1, t2) #2018-08-01 00:00:00  2014-09-01 12:12:12
    print(t1 - t2) #1429 days, 11:47:48

    datetime.datetime.now()方法,输出当前时间
    输出格式为 datetime类
    可通过str()转化为字符串

    datetime.timedelta:时间差

    today = datetime.datetime.today()
    yestoday = today - datetime.timedelta(1) #日
    print(today, yestoday) #2018-08-21 19:32:25.068595    2018-08-20 19:32:25.068595
    print(today - datetime.timedelta(7)) #2018-08-14 19:32:25.068595

    datetime.timedelta() 时间差主要用作时间的加减法,相当于可被识别的时间“差值”

    parser.parse:日期字符串转换(parse() 转换为datetime类型)

    from dateutil.parser import parse
    date = '12-15-2018'
    t = parse(date)
    print(t, type(t))               #2018-12-15 00:00:00  <class 'datetime.datetime'>
    print(parse('2009-1-2'),'
    ', #2009-01-02 00:00:00
          parse('5/3/2009'),'
    ', # 2009-05-03 00:00:00
          parse('5/3/2009',dayfirst = True),'
    ', # 2009-03-05 00:00:00 # 国际通用格式中,日在月之前,可以通过dayfirst来设置,如果是False就是 2009-05-03 00:00:00
          parse('22/1/2014'),'
    ',         # 2014-01-22 00:00:00
          parse('Jan 31, 1997 10:45 PM') # 1997-01-31 22:45:00
          )

    2.Pandas时刻数据(时间点)

    时刻数据代表时间点(可以是一年、一个月、一天、一分钟、一秒等),是pandas的数据类型,是将值与时间点相关联的最基本类型的时间序列数据

    时间戳(timestamp),一个能表示一份数据在某个特定时间之前已经存在的、 完整的、 可验证的数据,通常是一个字符序列,唯一地标识某一刻的时间。

    pandas.Timestamp()

     pd.Timestamp( )  ---> 单个时间戳-创建方式

    datetime.datetime(2016, 12, 2, 22, 15, 59)  datetime类型    |     ‘2018-12-7 12:07:47 ’  字符串类型  只能是单个时间数据 

    import numpy as np
    import pandas as pd
    date1 = datetime.datetime(2016,12,1,12,45,30)  #它是datetime类型
    date2 = '2018-11-18'                #‘20181118’、‘2/3/2018’、‘2018-11-18 12:08:13’等这些字符串都是可以识别的
    t1 = pd.Timestamp(date1)
    t2 = pd.Timestamp(date2)
    print(t1, type(t1))           #2016-12-01 12:45:30 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    print(t2)                                      #2018-11-18 00:00:00
    print(pd.Timestamp('2017-12-09 15:09:21'))     #2017-12-09 15:09:21

     >>> print(date1, type(date1))
     2016-12-01 12:45:30 <class 'datetime.datetime'>

      直接生成pandas的时刻数据 → 时间戳   数据类型为 pandas的Timestamp

     pd.to_datetime --    pd.to_datetime→多个时间数据转换时间戳索引

    pd.to_datetime():如果是单个时间数据,转换成pandas的时刻数据,数据类型为Timestamp;多个时间数据,将会转换为pandas的DatetimeIndex

    datetime类型和Timestamp类型的区别;

    Timestamp和DatetimeIndex的区别;

    转换为pandas时刻数据的两个方法:直接Timestamp、to_datetime

    from datetime import datetime
    import pandas as pd
    date1 = datetime(2018, 12, 2, 12, 24, 30)
    date2 = '2017-07-21'
    t1 = pd.to_datetime(date1)
    t2 = pd.to_datetime(date2)
    print(t1, type(t1)) #2018-12-02 12:24:30 <class 'pandas._libs.tslibs.timestamps.Timestamp'> 单个数据跟Timestamp没什么区别
    print(t2, type(t2)) #2017-07-21 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    
    lst_date = ['2017-12-9', '2017-10-19', '2018-9-9']                         #如果时间是个序列,多个数据,就有区别了
    t3 = pd.to_datetime(lst_date)
    print(t3, type(t3)) 
    #DatetimeIndex(['2017-12-09', '2017-10-19', '2018-09-09'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

    pd.to_datetime( data, errors='ignore' | errors='coerce' ) 

    >>> import numpy as np
    >>> import pandas as pd
    >>> from datetime import datetime   #如果不加这句话就要datetime.datetime
    >>> date1 = [datetime(2018, 6, 1), datetime(2018, 7,1), datetime(2018,8,1)] #datetime类型 >>> date2 = ['2017-2-1','2017-2-2','2017-2-3','2017-2-4','2017-2-5','2017-2-6'] #列表
    >>> print(date1)
    [datetime.datetime(2018, 6, 1, 0, 0), datetime.datetime(2018, 7, 1, 0, 0), datetime.datetime(2018, 8, 1, 0, 0)] >>> print(date2) ['2017-2-1', '2017-2-2', '2017-2-3', '2017-2-4', '2017-2-5', '2017-2-6'] >>> t1 = pd.to_datetime(date1) >>> t2 = pd.to_datetime(date2) >>> print(t1) DatetimeIndex(['2018-06-01', '2018-07-01', '2018-08-01'], dtype='datetime64[ns]', freq=None) >>> print(t2) DatetimeIndex(['2017-02-01', '2017-02-02', '2017-02-03', '2017-02-04', '2017-02-05', '2017-02-06'], dtype='datetime64[ns]', freq=None) >>> date3 = ['2017-9-1', '2018-11-10','Hello world!','2018-10-9', '2017-7-1'] >>> t3 = pd.to_datetime(date3, errors='ignore') #加上它就不会去解析它是否是时间序列了 ;当一组时间序列中夹杂其他格式数据时,可用errors参数返回。
                            #errors = 'ignore':不可解析时返回原始输入,这里就是直接生成一般数组
    >>> print(t3, type(t3)) ['2017-9-1' '2018-11-10' 'Hello world!' '2018-10-9' '2017-7-1'] <class 'numpy.ndarray'> >>> >>> t4 = pd.to_datetime(date3, errors='coerce') #会把不是时间序列的参数给去掉,当做缺失值,但它已经是时间序列了,DatetimeIndex类型
                            # errors = 'coerce':不可扩展,缺失值返回NaT(Not a Time),结果认为DatetimeIndex
    >>> print(t4, type(t4)) DatetimeIndex(['2017-09-01', '2018-11-10', 'NaT', '2018-10-09', '2017-07-01'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

    3.Pandas时间戳索引

    DatetimeIndex

    核心:pd.date_range()

    3.1 pd.DatetimeIndex() (时间戳索引)与TimeSeries时间序列

     pd.DatatimeIndex([多个时间序列])  

    
    
    rng = pd.DatetimeIndex(['12/1/2018', '12/2/2018', '12/3/2018', '12/4/2018'])
    pd.Series(np.random.rand(len(rng)),index = rng) #以DatetimeIndex为index的Series,为TimeSeries,时间序列。
    >>> rng = pd.DatetimeIndex(['12/1/2018', '12/2/2018', '12/3/2018', '12/4/2018']) #DatetimeIndex这样一个直接把它变成DatetimeIndex类型的一个方法
    >>> print(rng, type(rng))
    DatetimeIndex(['2018-12-01', '2018-12-02', '2018-12-03', '2018-12-04'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
    >>> print(rng[0], type(rng[0]))
    2018-12-01 00:00:00 <class 'pandas._libs.tslibs.timestamps.Timestamp'>
    
    >>> # 直接生成时间戳索引,支持str、datetime.datetime
    ... #rng[0] 单个时间戳为Timestamp,  rng[0:3] 多个时间戳为DatetimeIndex
    
    >>> st = pd.Series(np.random.rand(len(rng)),index = rng) #以DatetimeIndex为index的Series,为TimeSeries,时间序列。
    >>> print(st, type(st))
    2018-12-01    0.063915
    2018-12-02    0.726902
    2018-12-03    0.135305
    2018-12-04    0.237609
    dtype: float64 <class 'pandas.core.series.Series'>
    >>> print(st.index)
    DatetimeIndex(['2018-12-01', '2018-12-02', '2018-12-03', '2018-12-04'], dtype='datetime64[ns]', freq=None)
    >>>

    3.2 pd.date_range()-日期范围:生成日期范围

    date_range()  2种生成方式:①start + end; ②start/end + periods

    pd.date_range('6/10/2018','10/5/2018') 、  pd.date_range('6/10/2018',periods=10)   、  pd.date_range(end='6/10/2018',periods=10)
    默认频率:day

     直接生成DatetimeIndex
    # pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
    # start:开始时间
    # end:结束时间
    # periods:偏移量
    # freq:频率,默认天,pd.date_range()默认频率为日历日,pd.bdate_range()默认频率为工作日
    # tz:时区
    # normalize 默认False,为True时就把时间给你变成00:00:00,但不会显示出来
    #rng1 = pd.date_range('12/1/2018', '4/10/2017', normalize=True) #DatetimeIndex([], dtype='datetime64[ns]', freq='D')   <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
    rng1 = pd.date_range('1/1/2017','1/10/2017', normalize=True)  #normalize=True就是把时间给你变成00:00:00,但不会显示出来
    rng2 = pd.date_range(start='1/1/2018', periods=10)            #start=也可以不写的
    rng3 = pd.date_range(end='1/30/2017 14:20:00', periods=10)
    
    >>> print(rng1, type(rng1))
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
                   '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
                   '2017-01-09', '2017-01-10'],
                  dtype='datetime64[ns]', freq='D') <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
    >>> print(rng2)
    DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
                   '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08',
                   '2018-01-09', '2018-01-10'],
                  dtype='datetime64[ns]', freq='D')
    >>> print(rng3)
    DatetimeIndex(['2017-01-21 14:20:00', '2017-01-22 14:20:00',
                   '2017-01-23 14:20:00', '2017-01-24 14:20:00',
                   '2017-01-25 14:20:00', '2017-01-26 14:20:00',
                   '2017-01-27 14:20:00', '2017-01-28 14:20:00',
                   '2017-01-29 14:20:00', '2017-01-30 14:20:00'],
                  dtype='datetime64[ns]', freq='D')
    
    # pd.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
    # start:开始时间
    # end:结束时间
    # periods:偏移量
    # freq:频率,默认天,pd.date_range()默认频率为日历日,pd.bdate_range()默认频率为工作日
    # tz:时区
    
    rng4 = pd.date_range(start='1/1/2017 15:30', periods=10, name='Hello world!', normalize=True) #它就会把15:30归为00:00,它不显示出来。name就是一个参数。
    >>> print(rng4)
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04',
                   '2017-01-05', '2017-01-06', '2017-01-07', '2017-01-08',
                   '2017-01-09', '2017-01-10'],
                  dtype='datetime64[ns]', name='Hello world!', freq='D')
    >>>
    
    >>> print(pd.date_range('20170101','20170104'))
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
    >>> print(pd.date_range('20170101','20170104',closed='right'))
    DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
    >>> print(pd.date_range('20170101','20170104',closed='left'))
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq='D')
    >>>
    
    
    >>> print(pd.date_range('20170101','20170107'))
    DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05',
                   '2017-01-06'],
                  dtype='datetime64[ns]', freq='B')
    >>> print(list(pd.date_range(start='1/1/2017',periods=10)))#由多个时间戳组成的序列 [Timestamp('2017-01-01 00:00:00', freq='D'), Timestamp('2017-01-02 00:00:00', freq='D'), Timestamp('2017-01-03 00:00:00', freq='D'), Timestamp('2017-01-04 00:00:00', freq='D'), Timestamp('2017-01-05 0 0:00:00', freq='D'), Timestamp('2017-01-06 00:00:00', freq='D'), Timestamp('2017-01-07 00:00:00', freq='D'), Timestamp('2017-01-08 00:00:00', freq='D'), Timestamp('2017-01-09 00:00:00', freq='D'), Tim estamp('2017-01-10 00:00:00', freq='D')] >>>

    pd.date_range()-日期范围:freq 频率(1)

    freq = 'B' 、‘H’、T、S、L、U、W-MON、 

    >>> print(pd.date_range('2017/1/1','2017/1/4'))                     #默认freq = 'D':每日历日
    DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
    >>> print(pd.date_range('2017/1/1','2017/1/4',freq='B'))                # B:每工作日
    DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='B') 
    >>> print(pd.date_range('2017/1/1','2017/1/4',freq='H')) # H:每小时
    DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 01:00:00',
                   '2017-01-01 02:00:00', '2017-01-01 03:00:00',
                   '2017-01-01 04:00:00', '2017-01-01 05:00:00',
                   '2017-01-01 06:00:00', '2017-01-01 07:00:00',
                   '2017-01-01 08:00:00', '2017-01-01 09:00:00',
                   '2017-01-01 10:00:00', '2017-01-01 11:00:00',
                   '2017-01-01 12:00:00', '2017-01-01 13:00:00',
                   '2017-01-01 14:00:00', '2017-01-01 15:00:00',
                   '2017-01-01 16:00:00', '2017-01-01 17:00:00',
                   '2017-01-01 18:00:00', '2017-01-01 19:00:00',
                   '2017-01-01 20:00:00', '2017-01-01 21:00:00',
                   '2017-01-01 22:00:00', '2017-01-01 23:00:00',
                   '2017-01-02 00:00:00', '2017-01-02 01:00:00',
                   '2017-01-02 02:00:00', '2017-01-02 03:00:00',
                   '2017-01-02 04:00:00', '2017-01-02 05:00:00',
                   '2017-01-02 06:00:00', '2017-01-02 07:00:00',
                   '2017-01-02 08:00:00', '2017-01-02 09:00:00',
                   '2017-01-02 10:00:00', '2017-01-02 11:00:00',
                   '2017-01-02 12:00:00', '2017-01-02 13:00:00',
                   '2017-01-02 14:00:00', '2017-01-02 15:00:00',
                   '2017-01-02 16:00:00', '2017-01-02 17:00:00',
                   '2017-01-02 18:00:00', '2017-01-02 19:00:00',
                   '2017-01-02 20:00:00', '2017-01-02 21:00:00',
                   '2017-01-02 22:00:00', '2017-01-02 23:00:00',
                   '2017-01-03 00:00:00', '2017-01-03 01:00:00',
                   '2017-01-03 02:00:00', '2017-01-03 03:00:00',
                   '2017-01-03 04:00:00', '2017-01-03 05:00:00',
                   '2017-01-03 06:00:00', '2017-01-03 07:00:00',
                   '2017-01-03 08:00:00', '2017-01-03 09:00:00',
                   '2017-01-03 10:00:00', '2017-01-03 11:00:00',
                   '2017-01-03 12:00:00', '2017-01-03 13:00:00',
                   '2017-01-03 14:00:00', '2017-01-03 15:00:00',
                   '2017-01-03 16:00:00', '2017-01-03 17:00:00',
                   '2017-01-03 18:00:00', '2017-01-03 19:00:00',
                   '2017-01-03 20:00:00', '2017-01-03 21:00:00',
                   '2017-01-03 22:00:00', '2017-01-03 23:00:00',
                   '2017-01-04 00:00:00'],
                  dtype='datetime64[ns]', freq='H')
    >>> print(pd.date_range('2017/1/1 12:00','2017/1/1 12:10',freq='T')) # T/MIN:每分
    DatetimeIndex(['2017-01-01 12:00:00', '2017-01-01 12:01:00',
                   '2017-01-01 12:02:00', '2017-01-01 12:03:00',
                   '2017-01-01 12:04:00', '2017-01-01 12:05:00',
                   '2017-01-01 12:06:00', '2017-01-01 12:07:00',
                   '2017-01-01 12:08:00', '2017-01-01 12:09:00',
                   '2017-01-01 12:10:00'],
                  dtype='datetime64[ns]', freq='T')
    >>> print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10',freq='S'))  # S:每秒
    DatetimeIndex(['2017-01-01 12:00:00', '2017-01-01 12:00:01',
                   '2017-01-01 12:00:02', '2017-01-01 12:00:03',
                   '2017-01-01 12:00:04', '2017-01-01 12:00:05',
                   '2017-01-01 12:00:06', '2017-01-01 12:00:07',
                   '2017-01-01 12:00:08', '2017-01-01 12:00:09',
                   '2017-01-01 12:00:10'],
                  dtype='datetime64[ns]', freq='S')
    >>> print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10',freq='L')) # L:每毫秒(千分之一秒)
    DatetimeIndex([       '2017-01-01 12:00:00', '2017-01-01 12:00:00.001000',
                   '2017-01-01 12:00:00.002000', '2017-01-01 12:00:00.003000',
                   '2017-01-01 12:00:00.004000', '2017-01-01 12:00:00.005000',
                   '2017-01-01 12:00:00.006000', '2017-01-01 12:00:00.007000',
                   '2017-01-01 12:00:00.008000', '2017-01-01 12:00:00.009000',
                   ...
                   '2017-01-01 12:00:09.991000', '2017-01-01 12:00:09.992000',
                   '2017-01-01 12:00:09.993000', '2017-01-01 12:00:09.994000',
                   '2017-01-01 12:00:09.995000', '2017-01-01 12:00:09.996000',
                   '2017-01-01 12:00:09.997000', '2017-01-01 12:00:09.998000',
                   '2017-01-01 12:00:09.999000',        '2017-01-01 12:00:10'],
                  dtype='datetime64[ns]', length=10001, freq='L')
    >>> print(pd.date_range('2017/1/1 12:00:00','2017/1/1 12:00:10',freq='U')) # U:每微秒(百万分之一秒)
    DatetimeIndex([       '2017-01-01 12:00:00', '2017-01-01 12:00:00.000001',
                   '2017-01-01 12:00:00.000002', '2017-01-01 12:00:00.000003',
                   '2017-01-01 12:00:00.000004', '2017-01-01 12:00:00.000005',
                   '2017-01-01 12:00:00.000006', '2017-01-01 12:00:00.000007',
                   '2017-01-01 12:00:00.000008', '2017-01-01 12:00:00.000009',
                   ...
                   '2017-01-01 12:00:09.999991', '2017-01-01 12:00:09.999992',
                   '2017-01-01 12:00:09.999993', '2017-01-01 12:00:09.999994',
                   '2017-01-01 12:00:09.999995', '2017-01-01 12:00:09.999996',
                   '2017-01-01 12:00:09.999997', '2017-01-01 12:00:09.999998',
                   '2017-01-01 12:00:09.999999',        '2017-01-01 12:00:10'],
                  dtype='datetime64[ns]', length=10000001, freq='U')
    >>> print(pd.date_range('2017/1/1','2017/2/1',freq='W-MON'))  #W-MON:从指定星期几开始算起,每周  星期几缩写:MON/TUE/WED/THU/FRI/SAT/SUN
    DatetimeIndex(['2017-01-02', '2017-01-09', '2017-01-16', '2017-01-23',
                   '2017-01-30'],
                  dtype='datetime64[ns]', freq='W-MON')
    >>> print(pd.date_range('2017/1/1','2017/5/1',freq='WOM-2MON')) # WOM-2MON:每月的第几个星期几开始算,这里是每月第二个星期一
    DatetimeIndex(['2017-01-09', '2017-02-13', '2017-03-13', '2017-04-10'], dtype='datetime64[ns]', freq='WOM-2MON')
    >>>

    pd.date_range()-日期范围:freq 频率(2)

    freq = 'M'、'Q-DEC'、‘A-DEC’、‘BM’、‘BQ-DEC’、‘BA-DEC’ 、'MS' 、‘QS-DEC’、‘AS-DEC’、‘BMS’、‘BQS-DEC’ 、‘BAS-DEC’ 

    ##########某个时刻的最后一个日历日
    >>> print(pd.date_range('2017','2018',freq='M')) # M:每月最后一个日历日 DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-30', '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31', '2017-09-30', '2017-10-31', '2017-11-30', '2017-12-31'], dtype='datetime64[ns]', freq='M') >>> print(pd.date_range('2017','2020',freq='Q-DEC')) # Q-月:指定月为季度末,每个季度末最后一月的最后一个日历日 所以Q-月只有三种情况:1-4-7-10,2-5-8-11,3-6-9-12 DatetimeIndex(['2017-03-31', '2017-06-30', '2017-09-30', '2017-12-31', '2018-03-31', '2018-06-30', '2018-09-30', '2018-12-31', '2019-03-31', '2019-06-30', '2019-09-30', '2019-12-31'], dtype='datetime64[ns]', freq='Q-DEC') >>> print(pd.date_range('2017','2020',freq='A-DEC')) # A-月:每年指定月份的最后一个日历日 # 月缩写:JAN/FEB/MAR/APR/MAY/JUN/JUL/AUG/SEP/OCT/NOV/DEC DatetimeIndex(['2017-12-31', '2018-12-31', '2019-12-31'], dtype='datetime64[ns]', freq='A-DEC') >>>#################某个时刻的最后工作日 >>> print(pd.date_range('2017','2020',freq='BM')) # BM:每月最后一个工作日 DatetimeIndex(['2017-01-31', '2017-02-28', '2017-03-31', '2017-04-28', '2017-05-31', '2017-06-30', '2017-07-31', '2017-08-31', '2017-09-29', '2017-10-31', '2017-11-30', '2017-12-29', '2018-01-31', '2018-02-28', '2018-03-30', '2018-04-30', '2018-05-31', '2018-06-29', '2018-07-31', '2018-08-31', '2018-09-28', '2018-10-31', '2018-11-30', '2018-12-31', '2019-01-31', '2019-02-28', '2019-03-29', '2019-04-30', '2019-05-31', '2019-06-28', '2019-07-31', '2019-08-30', '2019-09-30', '2019-10-31', '2019-11-29', '2019-12-31'], dtype='datetime64[ns]', freq='BM') >>> print(pd.date_range('2017','2020',freq='BQ-DEC')) # BQ-月:指定月为季度末,每个季度末最后一月的最后一个工作日 DatetimeIndex(['2017-03-31', '2017-06-30', '2017-09-29', '2017-12-29', '2018-03-30', '2018-06-29', '2018-09-28', '2018-12-31', '2019-03-29', '2019-06-28', '2019-09-30', '2019-12-31'], dtype='datetime64[ns]', freq='BQ-DEC') >>> print(pd.date_range('2017','2020',freq='BA-DEC')) # BA-月:每年指定月份的最后一个工作日 DatetimeIndex(['2017-12-29', '2018-12-31', '2019-12-31'], dtype='datetime64[ns]', freq='BA-DEC') >>> ################某个时刻的第一个日历日 >>> print(pd.date_range('2017','2018',freq='MS')) # M:每月第一个日历日 DatetimeIndex(['2017-01-01', '2017-02-01', '2017-03-01', '2017-04-01', '2017-05-01', '2017-06-01', '2017-07-01', '2017-08-01', '2017-09-01', '2017-10-01', '2017-11-01', '2017-12-01', '2018-01-01'], dtype='datetime64[ns]', freq='MS') >>> print(pd.date_range('2017','2020',freq='QS-DEC')) # Q-月:指定月为季度末,每个季度末最后一月的第一个日历日 DatetimeIndex(['2017-03-01', '2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01', '2018-06-01', '2018-09-01', '2018-12-01', '2019-03-01', '2019-06-01', '2019-09-01', '2019-12-01'], dtype='datetime64[ns]', freq='QS-DEC') >>> print(pd.date_range('2017','2020',freq='AS-DEC')) # A-月:每年指定月份的第一个日历日 DatetimeIndex(['2017-12-01', '2018-12-01', '2019-12-01'], dtype='datetime64[ns]', freq='AS-DEC') >>>##############某个时刻的第一个日历日 >>> print(pd.date_range('2017','2018',freq='BMS')) # BM:每月第一个工作日 DatetimeIndex(['2017-01-02', '2017-02-01', '2017-03-01', '2017-04-03', '2017-05-01', '2017-06-01', '2017-07-03', '2017-08-01', '2017-09-01', '2017-10-02', '2017-11-01', '2017-12-01', '2018-01-01'], dtype='datetime64[ns]', freq='BMS') >>> print(pd.date_range('2017','2020',freq='BQS-DEC')) # BQ-月:指定月为季度末,每个季度末最后一月的第一个工作日 DatetimeIndex(['2017-03-01', '2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01', '2018-06-01', '2018-09-03', '2018-12-03', '2019-03-01', '2019-06-03', '2019-09-02', '2019-12-02'], dtype='datetime64[ns]', freq='BQS-DEC') >>> print(pd.date_range('2017','2020',freq='BAS-DEC')) # BA-月:每年指定月份的第一个工作日 DatetimeIndex(['2017-12-01', '2018-12-03', '2019-12-02'], dtype='datetime64[ns]', freq='BAS-DEC') >>>

    pd.date_range()-日期范围:freq 复合频率

    freq = '7D' 、‘2M’ 、‘2h30min’

    >>> print(pd.date_range('2017/1/1','2017/2/1',freq='7D'))  # 7天
    DatetimeIndex(['2017-01-01', '2017-01-08', '2017-01-15', '2017-01-22',
                   '2017-01-29'],
                  dtype='datetime64[ns]', freq='7D')
    >>> print(pd.date_range('2017/1/1','2017/1/2',freq='2h30min')) # 2小时30分钟
    DatetimeIndex(['2017-01-01 00:00:00', '2017-01-01 02:30:00',
                   '2017-01-01 05:00:00', '2017-01-01 07:30:00',
                   '2017-01-01 10:00:00', '2017-01-01 12:30:00',
                   '2017-01-01 15:00:00', '2017-01-01 17:30:00',
                   '2017-01-01 20:00:00', '2017-01-01 22:30:00'],
                  dtype='datetime64[ns]', freq='150T')
    >>> print(pd.date_range('2017','2018',freq='2M'))  # 2月,每月最后一个日历日
    DatetimeIndex(['2017-01-31', '2017-03-31', '2017-05-31', '2017-07-31',
                   '2017-09-30', '2017-11-30'],
                  dtype='datetime64[ns]', freq='2M')
    >>>

    asfreq:时期频率转换

    ts.asfreq('4H', method='ffill') 

    >>> ts = pd.Series(np.random.rand(4),index=pd.date_range('20170101','20170104'))
    >>> print(ts)
    2017-01-01    0.516999
    2017-01-02    0.882315
    2017-01-03    0.775276
    2017-01-04    0.440545
    Freq: D, dtype: float64
    >>>
    >>> print(ts.asfreq('4H'))
    2017-01-01 00:00:00    0.516999
    2017-01-01 04:00:00         NaN
    2017-01-01 08:00:00         NaN
    2017-01-01 12:00:00         NaN
    2017-01-01 16:00:00         NaN
    2017-01-01 20:00:00         NaN
    2017-01-02 00:00:00    0.882315
    2017-01-02 04:00:00         NaN
    2017-01-02 08:00:00         NaN
    2017-01-02 12:00:00         NaN
    2017-01-02 16:00:00         NaN
    2017-01-02 20:00:00         NaN
    2017-01-03 00:00:00    0.775276
    2017-01-03 04:00:00         NaN
    2017-01-03 08:00:00         NaN
    2017-01-03 12:00:00         NaN
    2017-01-03 16:00:00         NaN
    2017-01-03 20:00:00         NaN
    2017-01-04 00:00:00    0.440545
    Freq: 4H, dtype: float64
    >>> print(ts.asfreq('4H',method='ffill'))  #改变频率,这里是D改为4H;   method:插值模式,None不插值,ffill用之前的值填充,bfill用之后的值填充。
    2017-01-01 00:00:00    0.516999
    2017-01-01 04:00:00    0.516999
    2017-01-01 08:00:00    0.516999
    2017-01-01 12:00:00    0.516999
    2017-01-01 16:00:00    0.516999
    2017-01-01 20:00:00    0.516999
    2017-01-02 00:00:00    0.882315
    2017-01-02 04:00:00    0.882315
    2017-01-02 08:00:00    0.882315
    2017-01-02 12:00:00    0.882315
    2017-01-02 16:00:00    0.882315
    2017-01-02 20:00:00    0.882315
    2017-01-03 00:00:00    0.775276
    2017-01-03 04:00:00    0.775276
    2017-01-03 08:00:00    0.775276
    2017-01-03 12:00:00    0.775276
    2017-01-03 16:00:00    0.775276
    2017-01-03 20:00:00    0.775276
    2017-01-04 00:00:00    0.440545
    Freq: 4H, dtype: float64
    
    >>> print(ts.asfreq('4H',method='bfill'))
    2017-01-01 00:00:00    0.516999
    2017-01-01 04:00:00    0.882315
    2017-01-01 08:00:00    0.882315
    2017-01-01 12:00:00    0.882315
    2017-01-01 16:00:00    0.882315
    2017-01-01 20:00:00    0.882315
    2017-01-02 00:00:00    0.882315
    2017-01-02 04:00:00    0.775276
    2017-01-02 08:00:00    0.775276
    2017-01-02 12:00:00    0.775276
    2017-01-02 16:00:00    0.775276
    2017-01-02 20:00:00    0.775276
    2017-01-03 00:00:00    0.775276
    2017-01-03 04:00:00    0.440545
    2017-01-03 08:00:00    0.440545
    2017-01-03 12:00:00    0.440545
    2017-01-03 16:00:00    0.440545
    2017-01-03 20:00:00    0.440545
    2017-01-04 00:00:00    0.440545
    Freq: 4H, dtype: float64

    pd.date_range()-日期范围:超前/ 滞后数据 .shift( )

     ts.shift(1) 把昨天的数据移动     ts.shift(1, freq = 'D')对时间戳进行移动而不是数值了

    >>> ts = pd.Series(np.random.rand(4),index=pd.date_range('20170101','20170104'))
    >>> print(ts)
    2017-01-01    0.421724
    2017-01-02    0.102916
    2017-01-03    0.411452
    2017-01-04    0.626978
    Freq: D, dtype: float64
    >>> print(ts.shift(2)) # 正数:数值后移(滞后);负数:数值前移(超前)
    2017-01-01         NaN
    2017-01-02         NaN
    2017-01-03    0.421724
    2017-01-04    0.102916
    Freq: D, dtype: float64
    >>> print(ts.shift(-2))
    2017-01-01    0.411452
    2017-01-02    0.626978
    2017-01-03         NaN
    2017-01-04         NaN
    Freq: D, dtype: float64
    >>>
    >>> per = ts/ts.shift(1) - 1  #计算变化百分比,这里计算:该时间戳与上一个时间戳相比,变化百分比;ts为今天的数据,ts.shift(1)为昨天的数据,ts/ts.shift(1)为百分比。再-1就是变化百分比了。
    >>> print(per)
    2017-01-01         NaN
    2017-01-02   -0.755963
    2017-01-03    2.997923
    2017-01-04    0.523818
    Freq: D, dtype: float64
    >>>
    >>> print(ts.shift(2,freq='D')) #加上freq参数:对时间戳进行位移,而不是对数值进行位移
    2017-01-03    0.421724
    2017-01-04    0.102916
    2017-01-05    0.411452
    2017-01-06    0.626978
    Freq: D, dtype: float64
    >>> print(ts.shift(2,freq='T'))
    2017-01-01 00:02:00    0.421724
    2017-01-02 00:02:00    0.102916
    2017-01-03 00:02:00    0.411452
    2017-01-04 00:02:00    0.626978
    Freq: D, dtype: float64
    >>>

    4.Pandas时期:Period

    pd.Period()

    核心:pd.Period()  ---->时间段、时间构造器;    时间节面、时间戳、每个时期

    pd.Period()参数:一个时间戳 + freq 参数 → freq 用于指明该 period 的长度,时间戳则说明该 period 在时间轴上的位置。

    pd.Period('2017',freq = 'M') + 1   
    ##pd.Period()创建时期 
    >>> p = pd.Period('2017',freq = 'M') # 生成一个以2017-01开始,月为频率的时间构造器 >>> t = pd.DatetimeIndex(['2017-1-1']) >>> print(p, type(p)) 2017-01 <class 'pandas._libs.tslibs.period.Period'> >>> print(t, type(t)) DatetimeIndex(['2017-01-01'], dtype='datetime64[ns]', freq=None) <class 'pandas.core.indexes.datetimes.DatetimeIndex'> >>> >>> print(p + 1) # 通过加减整数,将周期整体移动 2017-02 >>> print(p - 2) 2016-11 >>> print(pd.Period('2012',freq = 'A-DEC') - 1) #这里是按照 月、年 移动 2011 >>>

     pd.period_range() 创建时期范围 

      Period('2011', freq = 'A-DEC')可以看成多个时间期的时间段中的游标

    pd.Period('2017',freq = 'M') + 1   ;Period()和period_range()是两种不同的索引方式,一个为时间戳、另外一个为时期。
    pd.period_range('1/1/2011', '1/1/2012', freq='M') 、pd.date_range('1/1/2011', '1/1/2012',freq='M')
    period_range为PeriodIndex类型包含年月,没有日哦; date_range为DatetimeIndex类型,包含年月日;
    Timestamp、DatetimeIndex都表示一个时间戳,是一个时间截面;Period是一个时期,是一个时间段!!但两者作为index时区别不大
    ##period_range()创建时期范围
    >>> prng = pd.period_range('1/1/2011', '1/1/2012', freq='M') #只包含年、月 >>> rng = pd.date_range('1/1/2011', '1/1/2012',freq='M') #包含年、月、日 >>> print(prng, type(prng)) PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', #之前叫DatetimeIndex '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01'], dtype='period[M]', freq='M') <class 'pandas.core.indexes.period.PeriodIndex'> >>> print(rng, type(rng)) DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-30', '2011-05-31', '2011-06-30', '2011-07-31', '2011-08-31', '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-31'], dtype='datetime64[ns]', freq='M') <class 'pandas.core.indexes.datetimes.DatetimeIndex'> >>> >>> print(prng[0], type(prng[0])) #数据格式为PeriodIndex,单个数值为Period 2011-01 <class 'pandas._libs.tslibs.period.Period'> >>> >>> ts = pd.Series(np.random.rand(len(prng)),index=prng) #两者作为index时区别不大 >>> ts2 = pd.Series(np.random.rand(len(rng)),index=rng) >>> print(ts, type(ts)) 2011-01 0.889509 2011-02 0.967148 2011-03 0.579234 2011-04 0.409504 2011-05 0.180216 2011-06 0.004549 2011-07 0.606768 2011-08 0.599321 2011-09 0.281182 2011-10 0.383243 2011-11 0.437894 2011-12 0.099335 2012-01 0.125945 Freq: M, dtype: float64 <class 'pandas.core.series.Series'> >>> print(ts2, type(ts2)) 2011-01-31 0.058635 2011-02-28 0.899287 2011-03-31 0.806039 2011-04-30 0.520745 2011-05-31 0.855713 2011-06-30 0.057417 2011-07-31 0.508203 2011-08-31 0.846018 2011-09-30 0.465259 2011-10-31 0.535451 2011-11-30 0.630897 2011-12-31 0.031109 Freq: M, dtype: float64 <class 'pandas.core.series.Series'> >>> print(ts.index) # 时间序列 PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', '2012-01'], dtype='period[M]', freq='M') >>> print(ts2.index) DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-30', '2011-05-31', '2011-06-30', '2011-07-31', '2011-08-31', '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-31'], dtype='datetime64[ns]', freq='M') >>> >>>

    asfreq:频率转换

     通过p.asfreq( freq,  method=None, how=None)方法转换成别的频率 

    >>> p = pd.Period('2017','A-DEC')
    >>> print(p)
    2017
    >>> print(p.asfreq('M',how = 'start')) #也可以写成how = 's'
    2017-01
    >>> print(p.asfreq('D',how = 'end')) #也可以写成how = 'e'
    2017-12-31
    >>>
    >>> prng = pd.period_range('2017', '2018', freq='M')
    >>> ts1 = pd.Series(np.random.rand(len(prng)),index=prng)
    >>> print(ts1.head(), len(ts1))
    2017-01    0.061827
    2017-02    0.138509
    2017-03    0.862916
    2017-04    0.226967
    2017-05    0.910585
    Freq: M, dtype: float64 13
    >>> ts2 = pd.Series(np.random.rand(len(prng)),index=prng.asfreq('D',how = 'start')) asfreq也可以转换为TimeSeries的index
    >>> print(ts2.head(), len(ts2))
    2017-01-01    0.476774
    2017-02-01    0.625230
    2017-03-01    0.281017
    2017-04-01    0.165561
    2017-05-01    0.429782
    Freq: D, dtype: float64 13

    时间戳与时期之间的转换:pd.to_period()、pd.to_timestamp()

    ts.to_period()  转化为每月最后一日;  ts.timestamp() 转化为每月第一日

    rng.to_period() 将 原来的DatetimeIndex转化为PeriodIndex;    prng.to_timestamp() 将PeriodIndex转化为DatetimeIndex

    >>> rng = pd.date_range('2017/1/1',periods = 10, freq = 'M')
    >>> prng = pd.period_range('2017','2018',freq = 'M')
    >>> ts1 = pd.Series(np.random.rand(len(rng)),index=rng)
    >>> print(ts1.head())
    2017-01-31    0.735182
    2017-02-28    0.791190
    2017-03-31    0.366768
    2017-04-30    0.316335
    2017-05-31    0.909333
    Freq: M, dtype: float64
    >>> print(ts1.to_period().head()) # 每月最后一日,转化为每月
    2017-01    0.735182
    2017-02    0.791190
    2017-03    0.366768
    2017-04    0.316335
    2017-05    0.909333
    Freq: M, dtype: float64
    >>>
    >>> ts1 = pd.Series(np.random.rand(len(prng)),index=prng)
    >>> print(ts2.head())
    2017-01-01    0.476774
    2017-02-01    0.625230
    2017-03-01    0.281017
    2017-04-01    0.165561
    2017-05-01    0.429782
    Freq: D, dtype: float64
    >>> print(ts2.to_timestamp().head()) #每月,转化为每月第一天
    2017-01-01    0.476774
    2017-02-01    0.625230
    2017-03-01    0.281017
    2017-04-01    0.165561
    2017-05-01    0.429782
    Freq: MS, dtype: float64
    >>>

     5.时间序列TimeSeries - 索引及切片

    TimeSeries是Series的一个子类,所以Series索引及数据选取方面的方法基本一样

    同时TimeSeries通过时间序列有更便捷的方法做索引和切片

    pd.Series(np.random.rand(len(pd.period_range('1/1/2011', '1/1/2012'))),index=(pd.period_range('1/1/2011', '1/1/2012')))
    pd.Series(np.random.rand(len(pd.date_range('2017/1','2017/3'))),index=(pd.date_range('2017/1','2017/3')))

    索引   ts[0]    ts[:2]下标位置索引       ts[ '2017/1/2' ]时间序列标签索引 

    >>> rng = pd.date_range('2017/1','2017/3')
    >>> ts = pd.Series(np.random.rand(len(rng)),index=rng)
    >>> print(ts.head())
    2017-01-01    0.407246
    2017-01-02    0.104561
    2017-01-03    0.140087
    2017-01-04    0.988668
    2017-01-05    0.733602
    Freq: D, dtype: float64
    >>> print(ts[0])
    0.40724601715639686
    >>> print(ts[:2])   # 基本下标位置索引,末端取不到
    2017-01-01    0.407246
    2017-01-02    0.104561
    Freq: D, dtype: float64
    >>>
    >>> print(ts['2017/1/2'])
    0.10456068527347884
    >>> print(ts['20170103'])
    0.14008702206007018
    >>> print(ts['1/10/2017'])
    0.7621543091477885
    >>> print(ts[datetime(2017,1,20)]) # 时间序列标签索引,支持各种时间字符串,以及datetime.datetime
    0.8743928943800818
    >>>

    时间序列由于按照时间先后排序,故不用考虑顺序问题
     索引方法同样适用于Dataframe

    切片 ts['2017/1/5: 2017/1/10' ]按照index索引原理,末端包含哦

    >>> rng = pd.date_range('2017/1','2017/3',freq = '12H')
    >>> ts = pd.Series(np.random.rand(len(rng)), index = rng)
    >>> print(ts['2017/1/5':'2017/1/10'])  # 和Series按照index索引原理一样 ,也是末端包含; 也可以加 ts.loc['2017/1/5':'2017/1/10']
    2017-01-05 00:00:00    0.864954
    2017-01-05 12:00:00    0.270408
    2017-01-06 00:00:00    0.979987
    2017-01-06 12:00:00    0.426279
    2017-01-07 00:00:00    0.403995
    2017-01-07 12:00:00    0.731792
    2017-01-08 00:00:00    0.018432
    2017-01-08 12:00:00    0.728155
    2017-01-09 00:00:00    0.190817
    2017-01-09 12:00:00    0.501240
    2017-01-10 00:00:00    0.893398
    2017-01-10 12:00:00    0.977586
    Freq: 12H, dtype: float64
    >>>
    >>> print(ts['2017/2'].head()) # 传入月,直接得到一个切片; print(ts['1/2017'] 会把1月给你全部显示出来 可以直接切片.[::2]
    2017-02-01 00:00:00    0.635405
    2017-02-01 12:00:00    0.282502
    2017-02-02 00:00:00    0.774583
    2017-02-02 12:00:00    0.306548
    2017-02-03 00:00:00    0.817818
    Freq: 12H, dtype: float64
    >>>

    重复索引的时间序列

     ts.is_unique 如果values值唯一,但index值不唯一,同样也会返回True;

    >>> dates = pd.DatetimeIndex(['1/1/2015','1/2/2015','1/3/2015','1/4/2015','1/1/2015','1/2/2015'])
    >>> ts = pd.Series(np.random.rand(6), index = dates)
    >>> print(ts)
    2015-01-01    0.943037
    2015-01-02    0.426762
    2015-01-03    0.838297
    2015-01-04    0.963703
    2015-01-01    0.080439
    2015-01-02    0.997752
    dtype: float64
    >>> print(ts.is_unique,ts.index.is_unique)  # index有重复,values没有重复的;  is_unique是检查 → values唯一,index不唯一就返回True。
    True False
    >>> print(ts['20150101'],type(ts['20150101'])) # index有重复的将返回多个值
    2015-01-01    0.943037
    2015-01-01    0.080439
    dtype: float64 <class 'pandas.core.series.Series'>
    >>> print(ts['20150104'],type(ts['20150104']))
    2015-01-04    0.963703
    dtype: float64 <class 'pandas.core.series.Series'>
    >>> print(ts.groupby(level = 0).mean())  # 通过groupby做分组,重复的值这里用平均值处理
    2015-01-01    0.511738
    2015-01-02    0.712257
    2015-01-03    0.838297
    2015-01-04    0.963703
    dtype: float64
    >>>

    6.时间序列 - 重采样

    从一个频率转化为另外一个频率,而且会有数据的聚合

    将时间序列从一个频率转换为另一个频率的过程,且会有数据的结合

    降采样:高频数据 → 低频数据,eg.以天为频率的数据转为以月为频率的数据
    升采样:低频数据 → 高频数据,eg.以年为频率的数据转为以月为频率的数据

    重采样:.resample()

    创建一个以天为频率的TimeSeries,重采样为按2天为频率

    ts.resample('2D').sum()   / .mean()  /.max() / .min() / .median() / .first() / .last() / .ohlc() 

    >>> rng = pd.date_range('20170101', periods = 12)
    >>> ts = pd.Series(np.arange(12), index = rng)
    >>> print(ts)
    2017-01-01     0
    2017-01-02     1
    2017-01-03     2
    2017-01-04     3
    2017-01-05     4
    2017-01-06     5
    2017-01-07     6
    2017-01-08     7
    2017-01-09     8
    2017-01-10     9
    2017-01-11    10
    2017-01-12    11
    Freq: D, dtype: int32
    >>> ts_re = ts.resample('5D')  #按照5天做一个重采样  ts.resample('5D'):  得到一个重采样构建器,频率改为5天  freq:重采样频率 → ts.resample('5D')
    >>> ts_re2 = ts.resample('5D').sum() #做聚合,加个sum()  ts.resample('5D').sum():得到一个新的聚合后的Series,聚合方式为求和   .sum():聚合方法
    >>> print(ts_re, type(ts_re))  #得到的是一个构建器,并不是一个值
    DatetimeIndexResampler [freq=<5 * Days>, axis=0, closed=left, label=left, convention=start, base=0] <class 'pandas.core.resample.DatetimeIndexResampler'>
    >>> print(ts_re2, type(ts_re2))
    2017-01-01    10
    2017-01-06    35
    2017-01-11    21
    dtype: int32 <class 'pandas.core.series.Series'>
    >>> print(ts.resample('5D').mean(),'→ 求平均值
    ')
    2017-01-01     2.0
    2017-01-06     7.0
    2017-01-11    10.5
    dtype: float64 → 求平均值
    
    >>> print(ts.resample('5D').max(),'→ 求最大值
    ')
    2017-01-01     4
    2017-01-06     9
    2017-01-11    11
    dtype: int32 → 求最大值
    
    >>> print(ts.resample('5D').min(),'→ 求最小值
    ')
    2017-01-01     0
    2017-01-06     5
    2017-01-11    10
    dtype: int32 → 求最小值
    
    >>> print(ts.resample('5D').median(),'→ 求中值
    ')
    2017-01-01     2.0
    2017-01-06     7.0
    2017-01-11    10.5
    dtype: float64 → 求中值
    
    >>> print(ts.resample('5D').first(),'→ 返回第一个值
    ')
    2017-01-01     0
    2017-01-06     5
    2017-01-11    10
    dtype: int32 → 返回第一个值
    
    >>> print(ts.resample('5D').last(),'→ 返回最后一个值
    ')
    2017-01-01     4
    2017-01-06     9
    2017-01-11    11
    dtype: int32 → 返回最后一个值
    
    >>> print(ts.resample('5D').ohlc(),'→ OHLC重采样
    ')  # OHLC:金融领域的时间序列聚合方式 → open开盘、high最大值、low最小值、close收盘
                open  high  low  close
    2017-01-01     0     4    0      4
    2017-01-06     5     9    5      9
    2017-01-11    10    11   10     11 → OHLC重采样

    降采样

    ts.resample('5D', closed = 'left').sum() , #closed='left'为默认值也可以不写; left指定间隔左边为结束 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
    ts.resample('5D', closed = 'right').sum(),  #closed='right' right指定间隔右边为结束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]
     
    >>> rng = pd.date_range('20170101', periods = 12)
    >>> ts = pd.Series(np.arange(1,13), index = rng)
    >>> print(ts)
    2017-01-01     1
    2017-01-02     2
    2017-01-03     3
    2017-01-04     4
    2017-01-05     5
    2017-01-06     6
    2017-01-07     7
    2017-01-08     8
    2017-01-09     9
    2017-01-10    10
    2017-01-11    11
    2017-01-12    12
    Freq: D, dtype: int32
    >>> print(ts.resample('5D').sum(),'→ 默认
    ') # 详解:这里values为0-11,按照5D重采样 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
    2017-01-01    15
    2017-01-06    40
    2017-01-11    23
    dtype: int32 → 默认
    # closed:各时间段哪一端是闭合(即包含)的,默认 左闭右闭 
    >>> print(ts.resample('5D', closed = 'left').sum(),'→ left
    ') # left指定间隔左边为结束 → [1,2,3,4,5],[6,7,8,9,10],[11,12]
    2017-01-01    15
    2017-01-06    40
    2017-01-11    23
    dtype: int32 → left
    
    >>> print(ts.resample('5D', closed = 'right').sum(),'→ right
    ') # right指定间隔右边为结束 → [1],[2,3,4,5,6],[7,8,9,10,11],[12]
    2016-12-27     1
    2017-01-01    20
    2017-01-06    45
    2017-01-11    12
    dtype: int32 → right
    
    >>> print(ts.resample('5D', label = 'left').sum(),'→ leftlabel
    ')  # label:聚合值的index,默认为分组之后的取左 # 值采样认为默认(这里closed默认)
    2017-01-01    15
    2017-01-06    40
    2017-01-11    23
    dtype: int32 → leftlabel
    
    >>> print(ts.resample('5D', label = 'right').sum(),'→ rightlabel
    ')  #index标签取重采样之后的那个2017-01-06,left是默认的取2017-01-01
    2017-01-06    15
    2017-01-11    40
    2017-01-16    23
    dtype: int32 → rightlabel
    
    >>>

    升采样及插值

    ts.resample('15T').asfreq() 低频转高频, .asfreq():不做填充,返回Nan;   .ffill():向上填充 ;  .bfill():向下填充
    >>> rng = pd.date_range('2017/1/1 0:0:0', periods = 5, freq = 'H')
    >>> ts = pd.DataFrame(np.arange(15).reshape(5,3),
    ...                   index = rng,
    ...                   columns = ['a','b','c'])
    >>> print(ts)
                          a   b   c
    2017-01-01 00:00:00   0   1   2
    2017-01-01 01:00:00   3   4   5
    2017-01-01 02:00:00   6   7   8
    2017-01-01 03:00:00   9  10  11
    2017-01-01 04:00:00  12  13  14
    >>> print(ts.resample('15T').asfreq())  # 低频转高频,主要是如何插值 # .asfreq():不做填充,返回Nan
                            a     b     c
    2017-01-01 00:00:00   0.0   1.0   2.0
    2017-01-01 00:15:00   NaN   NaN   NaN
    2017-01-01 00:30:00   NaN   NaN   NaN
    2017-01-01 00:45:00   NaN   NaN   NaN
    2017-01-01 01:00:00   3.0   4.0   5.0
    2017-01-01 01:15:00   NaN   NaN   NaN
    2017-01-01 01:30:00   NaN   NaN   NaN
    2017-01-01 01:45:00   NaN   NaN   NaN
    2017-01-01 02:00:00   6.0   7.0   8.0
    2017-01-01 02:15:00   NaN   NaN   NaN
    2017-01-01 02:30:00   NaN   NaN   NaN
    2017-01-01 02:45:00   NaN   NaN   NaN
    2017-01-01 03:00:00   9.0  10.0  11.0
    2017-01-01 03:15:00   NaN   NaN   NaN
    2017-01-01 03:30:00   NaN   NaN   NaN
    2017-01-01 03:45:00   NaN   NaN   NaN
    2017-01-01 04:00:00  12.0  13.0  14.0
    >>> print(ts.resample('15T').ffill())  # .ffill():向上填充
                          a   b   c
    2017-01-01 00:00:00   0   1   2
    2017-01-01 00:15:00   0   1   2
    2017-01-01 00:30:00   0   1   2
    2017-01-01 00:45:00   0   1   2
    2017-01-01 01:00:00   3   4   5
    2017-01-01 01:15:00   3   4   5
    2017-01-01 01:30:00   3   4   5
    2017-01-01 01:45:00   3   4   5
    2017-01-01 02:00:00   6   7   8
    2017-01-01 02:15:00   6   7   8
    2017-01-01 02:30:00   6   7   8
    2017-01-01 02:45:00   6   7   8
    2017-01-01 03:00:00   9  10  11
    2017-01-01 03:15:00   9  10  11
    2017-01-01 03:30:00   9  10  11
    2017-01-01 03:45:00   9  10  11
    2017-01-01 04:00:00  12  13  14
    >>> print(ts.resample('15T').bfill()) # .bfill():向下填充
                          a   b   c
    2017-01-01 00:00:00   0   1   2
    2017-01-01 00:15:00   3   4   5
    2017-01-01 00:30:00   3   4   5
    2017-01-01 00:45:00   3   4   5
    2017-01-01 01:00:00   3   4   5
    2017-01-01 01:15:00   6   7   8
    2017-01-01 01:30:00   6   7   8
    2017-01-01 01:45:00   6   7   8
    2017-01-01 02:00:00   6   7   8
    2017-01-01 02:15:00   9  10  11
    2017-01-01 02:30:00   9  10  11
    2017-01-01 02:45:00   9  10  11
    2017-01-01 03:00:00   9  10  11
    2017-01-01 03:15:00  12  13  14
    2017-01-01 03:30:00  12  13  14
    2017-01-01 03:45:00  12  13  14
    2017-01-01 04:00:00  12  13  14
    >>>

    时期重采样 - Period

    >>> prng = pd.period_range('2016','2017',freq = 'M')
    >>> ts = pd.Series(np.arange(len(prng)), index = prng)
    >>> print(ts)
    2016-01     0
    2016-02     1
    2016-03     2
    2016-04     3
    2016-05     4
    2016-06     5
    2016-07     6
    2016-08     7
    2016-09     8
    2016-10     9
    2016-11    10
    2016-12    11
    2017-01    12
    Freq: M, dtype: int32 
    >>> print(ts.resample('3M').sum()) #降采样
    2016-01-31     0
    2016-04-30     6
    2016-07-31    15
    2016-10-31    24
    2017-01-31    33
    Freq: 3M, dtype: int32
    >>> print(ts.resample('15D').ffill())  # 升采样
    2016-01-01     0
    2016-01-16     0
    2016-01-31     0
    2016-02-15     1
    2016-03-01     2
    2016-03-16     2
    2016-03-31     2
    2016-04-15     3
    2016-04-30     3
    2016-05-15     4
    2016-05-30     4
    2016-06-14     5
    2016-06-29     5
    2016-07-14     6
    2016-07-29     6
    2016-08-13     7
    2016-08-28     7
    2016-09-12     8
    2016-09-27     8
    2016-10-12     9
    2016-10-27     9
    2016-11-11    10
    2016-11-26    10
    2016-12-11    11
    2016-12-26    11
    2017-01-10    12
    2017-01-25    12
    Freq: 15D, dtype: int32
    >>>
  • 相关阅读:
    C#实战Microsoft Messaging Queue(MSMQ)消息队列(干货)
    实现动态的XML文件读写操作(依然带干货)
    多线程下访问控件的方式(您一定会用到,附源码啦!)
    Microsoft.VisualBasic.dll的妙用(开发中肯定会用到哦)
    vue使用element-ui的el-input监听不了键盘事件解决
    vue强制刷新组件
    asp.net微信公众平台本地调试设置
    武大女硕士面试被拒,改简历冒充本科生找工作的感想(原创)
    完整的站内搜索Demo(Lucene.Net+盘古分词)
    ASP.NET多线程下使用HttpContext.Current为null解决方案
  • 原文地址:https://www.cnblogs.com/shengyang17/p/9514557.html
Copyright © 2020-2023  润新知