3.11 时间处理对象

pandas 时间对象处理¶

时间序列类型

时间戳：特定时刻
固定时期：如2017年7月
时间间隔：起始时间-结束时间


Python标准库处理时间对象：datetime
灵活处理时间对象：dateutil
dateutil.parser.parse()
成组处理时间对象：pandas
pd.to_datetime()

In [11]:

import datetime
import pandas as pd
import numpy as np

In [2]:

datetime.datetime.strptime('2010-01-01','%Y-%m-%d')

Out[2]:

datetime.datetime(2010, 1, 1, 0, 0)

In [3]:

datetime.datetime.strptime('2010/01/01','%Y/%m/%d')

Out[3]:

datetime.datetime(2010, 1, 1, 0, 0)

In [4]:

import dateutil

In [6]:

dateutil.parser.parse('03/08/2020 14:35')

Out[6]:

datetime.datetime(2020, 3, 8, 14, 35)

In [7]:

dateutil.parser.parse('2020-Mar-8')

Out[7]:

datetime.datetime(2020, 3, 8, 0, 0)

In [13]:

pd.to_datetime(['2001-01-01','2020/Mar/08'])

Out[13]:

DatetimeIndex(['2001-01-01', '2020-03-08'], dtype='datetime64[ns]', freq=None)

pandas-时间对象处理

产生时间对象数组	pd.date_range
start	开始时间
end	结束时间
periods	时间长度
freq	时间频率，默认为D，可选Hour,Week,Business,Sem,Month,（min）T（es）,S（econd）,A（year）

In [15]:

pd.date_range('2019/7/23','2021/7/23')

Out[15]:

DatetimeIndex(['2019-07-23', '2019-07-24', '2019-07-25', '2019-07-26',
               '2019-07-27', '2019-07-28', '2019-07-29', '2019-07-30',
               '2019-07-31', '2019-08-01',
               ...
               '2021-07-14', '2021-07-15', '2021-07-16', '2021-07-17',
               '2021-07-18', '2021-07-19', '2021-07-20', '2021-07-21',
               '2021-07-22', '2021-07-23'],
              dtype='datetime64[ns]', length=732, freq='D')

In [16]:

pd.date_range('2019-7-23',periods=720)

Out[16]:

DatetimeIndex(['2019-07-23', '2019-07-24', '2019-07-25', '2019-07-26',
               '2019-07-27', '2019-07-28', '2019-07-29', '2019-07-30',
               '2019-07-31', '2019-08-01',
               ...
               '2021-07-02', '2021-07-03', '2021-07-04', '2021-07-05',
               '2021-07-06', '2021-07-07', '2021-07-08', '2021-07-09',
               '2021-07-10', '2021-07-11'],
              dtype='datetime64[ns]', length=720, freq='D')

In [17]:

pd.date_range('2019/7/23',periods=30,freq='M')

Out[17]:

DatetimeIndex(['2019-07-31', '2019-08-31', '2019-09-30', '2019-10-31',
               '2019-11-30', '2019-12-31', '2020-01-31', '2020-02-29',
               '2020-03-31', '2020-04-30', '2020-05-31', '2020-06-30',
               '2020-07-31', '2020-08-31', '2020-09-30', '2020-10-31',
               '2020-11-30', '2020-12-31', '2021-01-31', '2021-02-28',
               '2021-03-31', '2021-04-30', '2021-05-31', '2021-06-30',
               '2021-07-31', '2021-08-31', '2021-09-30', '2021-10-31',
               '2021-11-30', '2021-12-31'],
              dtype='datetime64[ns]', freq='M')

In [18]:

pd.date_range('2019-7-23',periods=30,freq='W-MON')

Out[18]:

DatetimeIndex(['2019-07-29', '2019-08-05', '2019-08-12', '2019-08-19',
               '2019-08-26', '2019-09-02', '2019-09-09', '2019-09-16',
               '2019-09-23', '2019-09-30', '2019-10-07', '2019-10-14',
               '2019-10-21', '2019-10-28', '2019-11-04', '2019-11-11',
               '2019-11-18', '2019-11-25', '2019-12-02', '2019-12-09',
               '2019-12-16', '2019-12-23', '2019-12-30', '2020-01-06',
               '2020-01-13', '2020-01-20', '2020-01-27', '2020-02-03',
               '2020-02-10', '2020-02-17'],
              dtype='datetime64[ns]', freq='W-MON')

B   business day frequency
C   custom business day frequency (experimental)
D   calendar day frequency
W   weekly frequency
M   month end frequency
SM  semi-month end frequency (15th and end of month)
BM  business month end frequency
CBM custom business month end frequency
MS  month start frequency
SMS semi-month start frequency (1st and 15th)
BMS business month start frequency
CBMS    custom business month start frequency
Q   quarter end frequency
BQ  business quarter endfrequency
QS  quarter start frequency
BQS business quarter start frequency
A   year end frequency
BA  business year end frequency
AS  year start frequency
BAS business year start frequency
BH  business hour frequency
H   hourly frequency
T, min  minutely frequency
S   secondly frequency
L, ms   milliseconds
U, us   microseconds
N   nanoseconds

In [23]:

pd.date_range('2019-7-23',periods=60,freq='B')    #B Business Day

Out[23]:

DatetimeIndex(['2019-07-23', '2019-07-24', '2019-07-25', '2019-07-26',
               '2019-07-29', '2019-07-30', '2019-07-31', '2019-08-01',
               '2019-08-02', '2019-08-05', '2019-08-06', '2019-08-07',
               '2019-08-08', '2019-08-09', '2019-08-12', '2019-08-13',
               '2019-08-14', '2019-08-15', '2019-08-16', '2019-08-19',
               '2019-08-20', '2019-08-21', '2019-08-22', '2019-08-23',
               '2019-08-26', '2019-08-27', '2019-08-28', '2019-08-29',
               '2019-08-30', '2019-09-02', '2019-09-03', '2019-09-04',
               '2019-09-05', '2019-09-06', '2019-09-09', '2019-09-10',
               '2019-09-11', '2019-09-12', '2019-09-13', '2019-09-16',
               '2019-09-17', '2019-09-18', '2019-09-19', '2019-09-20',
               '2019-09-23', '2019-09-24', '2019-09-25', '2019-09-26',
               '2019-09-27', '2019-09-30', '2019-10-01', '2019-10-02',
               '2019-10-03', '2019-10-04', '2019-10-07', '2019-10-08',
               '2019-10-09', '2019-10-10', '2019-10-11', '2019-10-14'],
              dtype='datetime64[ns]', freq='B')

In [24]:

dt = _
dt[0]

Out[24]:

Timestamp('2019-07-23 00:00:00', freq='B')

In [27]:

dt[0].to_pydatetime()

Out[27]:

datetime.datetime(2019, 7, 23, 0, 0)

pandas 时间序列¶

时间序列就是以时间对象为索引的 Series 或 Dataframe。
datetime对象作为索引时是存储在 DatetimeIndex对象中的。
时间序列特殊功能
    传入“年”或“年月”作为切片方式
    传入日期范围作为切片方式
    丰富的函数支持：resample, truncate，

In [29]:

sr = pd.Series(np.arange(100),index=pd.date_range('2020-3-8',periods=100))

In [30]:

sr

Out[30]:

2020-03-08     0
2020-03-09     1
2020-03-10     2
2020-03-11     3
2020-03-12     4
              ..
2020-06-11    95
2020-06-12    96
2020-06-13    97
2020-06-14    98
2020-06-15    99
Freq: D, Length: 100, dtype: int32

In [31]:

sr.index

Out[31]:

DatetimeIndex(['2020-03-08', '2020-03-09', '2020-03-10', '2020-03-11',
               '2020-03-12', '2020-03-13', '2020-03-14', '2020-03-15',
               '2020-03-16', '2020-03-17', '2020-03-18', '2020-03-19',
               '2020-03-20', '2020-03-21', '2020-03-22', '2020-03-23',
               '2020-03-24', '2020-03-25', '2020-03-26', '2020-03-27',
               '2020-03-28', '2020-03-29', '2020-03-30', '2020-03-31',
               '2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04',
               '2020-04-05', '2020-04-06', '2020-04-07', '2020-04-08',
               '2020-04-09', '2020-04-10', '2020-04-11', '2020-04-12',
               '2020-04-13', '2020-04-14', '2020-04-15', '2020-04-16',
               '2020-04-17', '2020-04-18', '2020-04-19', '2020-04-20',
               '2020-04-21', '2020-04-22', '2020-04-23', '2020-04-24',
               '2020-04-25', '2020-04-26', '2020-04-27', '2020-04-28',
               '2020-04-29', '2020-04-30', '2020-05-01', '2020-05-02',
               '2020-05-03', '2020-05-04', '2020-05-05', '2020-05-06',
               '2020-05-07', '2020-05-08', '2020-05-09', '2020-05-10',
               '2020-05-11', '2020-05-12', '2020-05-13', '2020-05-14',
               '2020-05-15', '2020-05-16', '2020-05-17', '2020-05-18',
               '2020-05-19', '2020-05-20', '2020-05-21', '2020-05-22',
               '2020-05-23', '2020-05-24', '2020-05-25', '2020-05-26',
               '2020-05-27', '2020-05-28', '2020-05-29', '2020-05-30',
               '2020-05-31', '2020-06-01', '2020-06-02', '2020-06-03',
               '2020-06-04', '2020-06-05', '2020-06-06', '2020-06-07',
               '2020-06-08', '2020-06-09', '2020-06-10', '2020-06-11',
               '2020-06-12', '2020-06-13', '2020-06-14', '2020-06-15'],
              dtype='datetime64[ns]', freq='D')

In [32]:

sr['2020-3']

Out[32]:

2020-03-08     0
2020-03-09     1
2020-03-10     2
2020-03-11     3
2020-03-12     4
2020-03-13     5
2020-03-14     6
2020-03-15     7
2020-03-16     8
2020-03-17     9
2020-03-18    10
2020-03-19    11
2020-03-20    12
2020-03-21    13
2020-03-22    14
2020-03-23    15
2020-03-24    16
2020-03-25    17
2020-03-26    18
2020-03-27    19
2020-03-28    20
2020-03-29    21
2020-03-30    22
2020-03-31    23
Freq: D, dtype: int32

In [33]:

sr['2020-3':'2020-4']

Out[33]:

2020-03-08     0
2020-03-09     1
2020-03-10     2
2020-03-11     3
2020-03-12     4
2020-03-13     5
2020-03-14     6
2020-03-15     7
2020-03-16     8
2020-03-17     9
2020-03-18    10
2020-03-19    11
2020-03-20    12
2020-03-21    13
2020-03-22    14
2020-03-23    15
2020-03-24    16
2020-03-25    17
2020-03-26    18
2020-03-27    19
2020-03-28    20
2020-03-29    21
2020-03-30    22
2020-03-31    23
2020-04-01    24
2020-04-02    25
2020-04-03    26
2020-04-04    27
2020-04-05    28
2020-04-06    29
2020-04-07    30
2020-04-08    31
2020-04-09    32
2020-04-10    33
2020-04-11    34
2020-04-12    35
2020-04-13    36
2020-04-14    37
2020-04-15    38
2020-04-16    39
2020-04-17    40
2020-04-18    41
2020-04-19    42
2020-04-20    43
2020-04-21    44
2020-04-22    45
2020-04-23    46
2020-04-24    47
2020-04-25    48
2020-04-26    49
2020-04-27    50
2020-04-28    51
2020-04-29    52
2020-04-30    53
Freq: D, dtype: int32

In [34]:

sr.resample('W').sum()

Out[34]:

2020-03-08      0
2020-03-15     28
2020-03-22     77
2020-03-29    126
2020-04-05    175
2020-04-12    224
2020-04-19    273
2020-04-26    322
2020-05-03    371
2020-05-10    420
2020-05-17    469
2020-05-24    518
2020-05-31    567
2020-06-07    616
2020-06-14    665
2020-06-21     99
Freq: W-SUN, dtype: int32

In [36]:

sr.resample('M').sum()

Out[36]:

2020-03-31     276
2020-04-30    1155
2020-05-31    2139
2020-06-30    1380
Freq: M, dtype: int32

In [37]:

sr.resample('M').mean()

Out[37]:

2020-03-31    11.5
2020-04-30    38.5
2020-05-31    69.0
2020-06-30    92.0
Freq: M, dtype: float64

In [38]:

sr.truncate(before='2020-4-1')

Out[38]:

2020-04-01    24
2020-04-02    25
2020-04-03    26
2020-04-04    27
2020-04-05    28
              ..
2020-06-11    95
2020-06-12    96
2020-06-13    97
2020-06-14    98
2020-06-15    99
Freq: D, Length: 76, dtype: int32

pandas 文件处理¶

数据文件常用格式：csv（以某间隔符分割数据）
pandas读取文件：从文件名、URL、文件对象中加载数据
read_csv        默认分隔符为逗号
read_table    默认分隔符为制表符

read_csv、read_table	函数主要参数:
sep	指定分隔符，可用正则表达式如's+'
header=None	指定文件无列名
name	指定列名
index_col	指定某列作为索引
skip_row	指定跳过某些行
na_values	指定某些字符串表示缺失值
parse_dates	指定某些列是否被解析为日期，类型为布尔值或列表

In [39]:

pd.read_csv('600519.csv')

Out[39]:

	Unnamed: 0	date	open	close	high	low	volume	code
0	0	2001-08-27	5.468	5.633	5.986	5.205	406318.00	600519
1	1	2001-08-28	5.544	5.840	5.863	5.484	129647.79	600519
2	2	2001-08-29	5.859	5.764	5.863	5.720	53252.75	600519
3	3	2001-08-30	5.749	5.878	5.943	5.704	48013.06	600519
4	4	2001-08-31	5.886	5.864	5.961	5.831	23231.48	600519
...	...	...	...	...	...	...	...	...
3876	3876	2017-12-11	631.000	650.990	651.950	631.000	72849.00	600519
3877	3877	2017-12-12	658.700	651.320	658.770	651.020	47889.00	600519
3878	3878	2017-12-13	654.990	668.210	670.000	650.720	48502.00	600519
3879	3879	2017-12-14	669.980	664.550	671.300	660.500	31967.00	600519
3880	3880	2017-12-15	664.000	653.790	667.950	650.780	32255.00	600519

3881 rows × 8 columns

In [40]:

pd.read_csv('600519.csv',index_col=0)

Out[40]:

	date	open	close	high	low	volume	code
0	2001-08-27	5.468	5.633	5.986	5.205	406318.00	600519
1	2001-08-28	5.544	5.840	5.863	5.484	129647.79	600519
2	2001-08-29	5.859	5.764	5.863	5.720	53252.75	600519
3	2001-08-30	5.749	5.878	5.943	5.704	48013.06	600519
4	2001-08-31	5.886	5.864	5.961	5.831	23231.48	600519
...	...	...	...	...	...	...	...
3876	2017-12-11	631.000	650.990	651.950	631.000	72849.00	600519
3877	2017-12-12	658.700	651.320	658.770	651.020	47889.00	600519
3878	2017-12-13	654.990	668.210	670.000	650.720	48502.00	600519
3879	2017-12-14	669.980	664.550	671.300	660.500	31967.00	600519
3880	2017-12-15	664.000	653.790	667.950	650.780	32255.00	600519

3881 rows × 7 columns

In [41]:

pd.read_csv('600519.csv',index_col='date')

Out[41]:

	Unnamed: 0	open	close	high	low	volume	code
date
2001-08-27	0	5.468	5.633	5.986	5.205	406318.00	600519
2001-08-28	1	5.544	5.840	5.863	5.484	129647.79	600519
2001-08-29	2	5.859	5.764	5.863	5.720	53252.75	600519
2001-08-30	3	5.749	5.878	5.943	5.704	48013.06	600519
2001-08-31	4	5.886	5.864	5.961	5.831	23231.48	600519
...	...	...	...	...	...	...	...
2017-12-11	3876	631.000	650.990	651.950	631.000	72849.00	600519
2017-12-12	3877	658.700	651.320	658.770	651.020	47889.00	600519
2017-12-13	3878	654.990	668.210	670.000	650.720	48502.00	600519
2017-12-14	3879	669.980	664.550	671.300	660.500	31967.00	600519
2017-12-15	3880	664.000	653.790	667.950	650.780	32255.00	600519

3881 rows × 7 columns

In [42]:

df = pd.read_csv('600519.csv',index_col='date')

In [43]:

df.index[0]

Out[43]:

'2001-08-27'

In [44]:

df.index

Out[44]:

Index(['2001-08-27', '2001-08-28', '2001-08-29', '2001-08-30', '2001-08-31',
       '2001-09-03', '2001-09-04', '2001-09-05', '2001-09-06', '2001-09-07',
       ...
       '2017-12-04', '2017-12-05', '2017-12-06', '2017-12-07', '2017-12-08',
       '2017-12-11', '2017-12-12', '2017-12-13', '2017-12-14', '2017-12-15'],
      dtype='object', name='date', length=3881)

In [46]:

pd.read_csv('600519.csv',index_col='date',parse_dates=True).index

Out[46]:

DatetimeIndex(['2001-08-27', '2001-08-28', '2001-08-29', '2001-08-30',
               '2001-08-31', '2001-09-03', '2001-09-04', '2001-09-05',
               '2001-09-06', '2001-09-07',
               ...
               '2017-12-04', '2017-12-05', '2017-12-06', '2017-12-07',
               '2017-12-08', '2017-12-11', '2017-12-12', '2017-12-13',
               '2017-12-14', '2017-12-15'],
              dtype='datetime64[ns]', name='date', length=3881, freq=None)

In [52]:

pd.read_csv('600519.csv',index_col='date',parse_dates=['date']).index

Out[52]:

DatetimeIndex(['2001-08-27', '2001-08-28', '2001-08-29', '2001-08-30',
               '2001-08-31', '2001-09-03', '2001-09-04', '2001-09-05',
               '2001-09-06', '2001-09-07',
               ...
               '2017-12-04', '2017-12-05', '2017-12-06', '2017-12-07',
               '2017-12-08', '2017-12-11', '2017-12-12', '2017-12-13',
               '2017-12-14', '2017-12-15'],
              dtype='datetime64[ns]', name='date', length=3881, freq=None)

In [55]:

pd.read_csv('600519.csv',header=None,names=list('abcdefgh'))

Out[55]:

	a	b	c	d	e	f	g	h
0	NaN	date	open	close	high	low	volume	code
1	0.0	2001-08-27	5.468	5.633	5.986	5.205	406318.0	600519
2	1.0	2001-08-28	5.544	5.84	5.863	5.484	129647.79	600519
3	2.0	2001-08-29	5.859	5.764	5.863	5.72	53252.75	600519
4	3.0	2001-08-30	5.749	5.878	5.943	5.704	48013.06	600519
...	...	...	...	...	...	...	...	...
3877	3876.0	2017-12-11	631.0	650.99	651.95	631.0	72849.0	600519
3878	3877.0	2017-12-12	658.7	651.32	658.77	651.02	47889.0	600519
3879	3878.0	2017-12-13	654.99	668.21	670.0	650.72	48502.0	600519
3880	3879.0	2017-12-14	669.98	664.55	671.3	660.5	31967.0	600519
3881	3880.0	2017-12-15	664.0	653.79	667.95	650.78	32255.0	600519

3882 rows × 8 columns

In [56]:

pd.read_csv('600519.csv',header=None,skiprows=[1,2,3])

Out[56]:

	0	1	2	3	4	5	6	7
0	NaN	date	open	close	high	low	volume	code
1	3.0	2001-08-30	5.749	5.878	5.943	5.704	48013.06	600519
2	4.0	2001-08-31	5.886	5.864	5.961	5.831	23231.48	600519
3	5.0	2001-09-03	5.894	5.861	5.953	5.839	22112.09	600519
4	6.0	2001-09-04	5.864	5.936	6.034	5.844	37006.77	600519
...	...	...	...	...	...	...	...	...
3874	3876.0	2017-12-11	631.0	650.99	651.95	631.0	72849.0	600519
3875	3877.0	2017-12-12	658.7	651.32	658.77	651.02	47889.0	600519
3876	3878.0	2017-12-13	654.99	668.21	670.0	650.72	48502.0	600519
3877	3879.0	2017-12-14	669.98	664.55	671.3	660.5	31967.0	600519
3878	3880.0	2017-12-15	664.0	653.79	667.95	650.78	32255.0	600519

3879 rows × 8 columns

In [58]:

pd.read_csv('600519.csv',header=None,skiprows=[1,2,3],na_values=['None'])

Out[58]:

	0	1	2	3	4	5	6	7
0	NaN	date	open	close	high	low	volume	code
1	3.0	2001-08-30	5.749	5.878	5.943	5.704	48013.06	600519
2	4.0	2001-08-31	5.886	5.864	5.961	5.831	23231.48	600519
3	5.0	2001-09-03	5.894	5.861	5.953	5.839	22112.09	600519
4	6.0	2001-09-04	5.864	5.936	6.034	5.844	37006.77	600519
...	...	...	...	...	...	...	...	...
3874	3876.0	2017-12-11	631.0	650.99	651.95	631.0	72849.0	600519
3875	3877.0	2017-12-12	658.7	651.32	658.77	651.02	47889.0	600519
3876	3878.0	2017-12-13	654.99	668.21	670.0	650.72	48502.0	600519
3877	3879.0	2017-12-14	669.98	664.55	671.3	660.5	31967.0	600519
3878	3880.0	2017-12-15	664.0	653.79	667.95	650.78	32255.0	600519

3879 rows × 8 columns

pandas 支持的其他文件类型¶

json,XML,HTML，数据库， pickle, excel.

In [ ]:

excel 是xml打包的文件

In [ ]:

相关阅读:
计算机为什么要从 0 开始计数？
MySQL索引结构为什么是B+树
 expdp导出报错ORA-39127
expdp 跳过坏块
 （转）没有索引导致的DIRECT PATH READ
Python的实用场景有哪些
 Oracle索引修复，ORA-00600: internal error code, arguments: [6200],
CentOS7.6静默安装19C实例脚本 ORA-27125 [FATAL] [DBT-10322]
ORA-00313: 无法打开日志组
 cursor: pin S wait on X等待事件的处理过程(转载)
原文地址：https://www.cnblogs.com/wenyule/p/12442713.html