我们可以将一连串的时间特征进行拆分,比如:2015-03-08 10:30:00.360000+00:00, 我们可以将其转换为日期类型,然后从里面提取年,月,日等时间信息
对于一些hour,month等信息,我们也可以使用pd.cut将hour信息按照时刻转换为离散数据,如morning,afternonn等等
代码:
第一步:载入数据
第二步:将数据转换为DataFrame格式
第三步:使用pd.Timestamp将字符串格式转换为时间格式
第四步:使用.apply提取时间格式中的各个属性特征
第五步:提取与时刻相关的属性
第六步:使用pd.cut对hour特征进行切分,转换为离散特征
第七步:使用LabelEncoder对离散数据进行数字编码
import pandas as pd # 第一步:构造DataFrame数据 time_stamps = ['2015-03-08 10:30:00.360000+00:00', '2017-07-13 15:45:05.755000-07:00', '2012-01-20 22:30:00.254000+05:30', '2016-12-25 00:30:00.000000+10:00'] # 第二步: 将time_stamps转换为DataFrame格式 time_pd = pd.DataFrame(time_stamps, columns=['Times']) # 第三步: 使用pd.Timestamp 将字符串类型转换为日期格式 time_pd['stamp'] = [pd.Timestamp(time) for time in time_pd['Times'].values] # print(time_pd[['stamp', 'Times']])
字符串时间日期化,从外表看两者并没有区别
# 第四步:使用.apply对每一个数据提取属性, lambda表示输入是x,返回是x.year time_pd['year'] = time_pd['stamp'].apply(lambda x: x.year) time_pd['month'] = time_pd['stamp'].apply(lambda x: x.month) time_pd['day'] = time_pd['stamp'].apply(lambda x: x.day) time_pd['DayOfWeek'] = time_pd['stamp'].apply(lambda d: d.dayofweek) time_pd['DayName'] = time_pd['stamp'].apply(lambda d: d.weekday_name) time_pd['DayOfYear'] = time_pd['stamp'].apply(lambda d: d.dayofyear) time_pd['WeekOfYear'] = time_pd['stamp'].apply(lambda d: d.weekofyear) time_pd['Quarter'] = time_pd['stamp'].apply(lambda d: d.quarter) # 第五步: 提取与时刻有关的特征 time_pd['Hour'] = time_pd['stamp'].apply(lambda d: d.hour) time_pd['Minute'] = time_pd['stamp'].apply(lambda d: d.minute) time_pd['Second'] = time_pd['stamp'].apply(lambda d: d.second) time_pd['MUsecond'] = time_pd['stamp'].apply(lambda d: d.microsecond) #毫秒 time_pd['UTC_offset'] = time_pd['stamp'].apply(lambda d: d.utcoffset()) # 第六步:使用pd.cut将hour的数据进行切分,分成几个过程 cut_hour = [-1, 5, 11, 16, 21, 23] cut_labels = ['last night', 'morning', 'afternoon', 'evening', 'Night'] time_pd['Hour_cut'] = pd.cut(time_pd['Hour'], bins=cut_hour, labels=cut_labels) print(time_pd['Hour_cut'].head())
hour数据转换为离散的数据格式
# 第七步:使用LabelEncoder对标签进行数值转换 from sklearn.preprocessing import LabelEncoder La = LabelEncoder() time_pd['Hour_number'] = La.fit_transform(time_pd['Hour_cut']) label_dict = {classes: number for number, classes in enumerate(La.classes_)} print(time_pd[['Hour_cut', 'Hour_number']]) print(label_dict)
离散特征转换为数字映射