head first python中的一个数据处理的例子
有四个U10选手的600米成绩,请取出每个选手跑的最快的3个时间。以下是四位选手的9次成绩
James
2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22
Julie
2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21
Mikey
2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38
Sarah
2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55
代码如下:
def sanitize(time_string): if '-' in time_string: splitter = '-' elif ':' in time_string: splitter = ':' else: return(time_string) (mins, secs) = time_string.split(splitter) return(mins + '.' + secs) def get_coach_data(filename): try: with open(filename) as f: data = f.readline() return(data.strip().split(',')) except IOError as ioerr: print('File error: ' + str(ioerr)) return(None) james = get_coach_data('james.txt') julie = get_coach_data('julie.txt') mikey = get_coach_data('mikey.txt') sarah = get_coach_data('sarah.txt') print(sorted(set([sanitize(t) for t in james]))[0:3]) print(sorted(set([sanitize(t) for t in julie]))[0:3]) print(sorted(set([sanitize(t) for t in mikey]))[0:3]) print(sorted(set([sanitize(t) for t in sarah]))[0:3])
首先定义了一个模块sanitize清理数据,注意set集合中不允许重复记录,sorted会返回一个排序后的列表,不会修改原有的列表。
打印结果
['2.01', '2.22', '2.34']
['2.11', '2.23', '2.59']
['2.22', '2.38', '2.49']
['2.18', '2.25', '2.39']
例2:
提供另外一组成绩数据,数据中包括了运动员姓名,出生日期,及成绩。
打印出每个运动员姓名,及最快的三次成绩
def sanitize(time_string): if '-' in time_string: splitter = '-' elif ':' in time_string: splitter = ':' else: return(time_string) (mins, secs) = time_string.split(splitter) return(mins + '.' + secs) def get_coach_data(filename): try: with open(filename) as f: data = f.readline() templ = data.strip().split(',') return({'Name' : templ.pop(0), 'DOB' : templ.pop(0), 'Times': str(sorted(set([sanitize(t) for t in templ]))[0:3])}) except IOError as ioerr: print('File error: ' + str(ioerr)) return(None) james = get_coach_data('james2.txt') julie = get_coach_data('julie2.txt') mikey = get_coach_data('mikey2.txt') sarah = get_coach_data('sarah2.txt') print(james['Name'] + "'s fastest times are: " + james['Times']) print(julie['Name'] + "'s fastest times are: " + julie['Times']) print(mikey['Name'] + "'s fastest times are: " + mikey['Times']) print(sarah['Name'] + "'s fastest times are: " + sarah['Times'])
上面代码中用{}定义了一个map类型的数据结构,key分别是name,DOB,Times。
也可以用其它方式实现,类似于JAVA中的JAVABEAN
def sanitize(time_string): if '-' in time_string: splitter = '-' elif ':' in time_string: splitter = ':' else: return(time_string) (mins, secs) = time_string.split(splitter) return(mins + '.' + secs) class AthleteList(list): def __init__(self, a_name, a_dob=None, a_times=[]): list.__init__([]) self.name = a_name self.dob = a_dob self.extend(a_times) def top3(self): return(sorted(set([sanitize(t) for t in self]))[0:3]) def get_coach_data(filename): try: with open(filename) as f: data = f.readline() templ = data.strip().split(',') return(AthleteList(templ.pop(0), templ.pop(0), templ)) except IOError as ioerr: print('File error: ' + str(ioerr)) return(None) james = get_coach_data('james2.txt') julie = get_coach_data('julie2.txt') mikey = get_coach_data('mikey2.txt') sarah = get_coach_data('sarah2.txt') print(james.name + "'s fastest times are: " + str(james.top3())) print(julie.name + "'s fastest times are: " + str(julie.top3())) print(mikey.name + "'s fastest times are: " + str(mikey.top3())) print(sarah.name + "'s fastest times are: " + str(sarah.top3()))
注意class中的每个方法的第一个参数必须是self