• 【六】推导数据


    一:编写程序

    现如今有4组秒表记录的数据,分别如下:

    1. james.txt:2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22
    2. julie.txt:2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21
    3. mikey.txt:2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38
    4. sarah.txt:2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55

    1.需要从各个文件将数据读入各自的列表,编写一个小程序,处理每个文件,为每个数据创建一个列表,并在屏幕上显示这些列表

    james.txt

    In [1]: with open('james.txt',"r") as jam:
       ...:     data=jam.readline()
       ...: james=data.strip().split(",") 
    #strip():去除空白换行符
    #split():分割(也是最快将元素转换成列表的方法)
       ...: 
    In [2]: james    #变量james
    Out[2]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
    In [3]: cat james.txt        #james.txt下的内容
    2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22

    julie.txt

    In [4]: with open("julie.txt","r") as ju:
       ...:     data=ju.readline()
       ...: julie=data.strip().split(",")
       ...: 
    In [5]: julie
    Out[5]: ['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21']
    In [6]: cat julie.txt
    2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21 

    mikey.txt

    In [7]: with open("mikey.txt","r") as mi:
       ...:     data=mi.readline()
       ...: mikey=data.strip().split(",")
       ...: 
    In [8]: mikey
    Out[8]: ['2:22', '3.01', '3:01', '3.02', '3:02', '3.02', '3:22', '2.49', '2:38']
    In [9]: cat mikey.txt
    2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38

    sarah.txt

    In [12]: with open("sarah.txt","r") as sa:
        ...:     data=sa.readline()
        ...: sarah=data.strip().split(",")
        ...: 
    In [13]: sarah
    Out[13]: ['2:58', '2.58', '2:39', '2-25', '2-55', '2:54', '2.18', '2:55', '2:55']
    In [14]: cat sarah.txt
    2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55

    二:排序的两种方式

      • 原地排序:sort()方法  升序
      • 降序:sort(reverse=True)
      • 复制排序:sort()   BIF升序
      • 降序:sorted(reverse=True)
    In [15]: data=[2,3,4,543221,333,1,2,3,2]
    In [16]: data    #原数据
    Out[16]: [2, 3, 4, 543221, 333, 1, 2, 3, 2]
    In [17]: data.sort()#原地排序(升序)
    In [18]: data
    Out[18]: [1, 2, 2, 2, 3, 3, 4, 333, 543221]
    In [19]: data=[2,3,4,543221,333,1,2,3,2]
    In [20]: data2=sorted(data)   #复制排序
    In [21]: data
    Out[21]: [2, 3, 4, 543221, 333, 1, 2, 3, 2]
    In [22]: data2
    Out[22]: [1, 2, 2, 2, 3, 3, 4, 333, 543221]
    In [24]: data.sort(reverse=True) #原地排序(降序)
    In [25]: data
    Out[25]: [543221, 333, 4, 3, 3, 2, 2, 2, 1]

    1.给julie排序

    In [32]: julie
    Out[32]: ['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21']
    In [33]: julie2=sorted(julie)
    In [34]: julie2
    Out[34]: ['2-23', '2.11', '2.59', '2:11', '2:23', '3-10', '3-21', '3.21', '3:10']
    In [35]: julie
    Out[35]: ['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21']
    #ps:该段代码还需修改

    从上段代码结果可以看出:1..数据格式不统一导致排序错误(2-33居然在2.11前面)

    思路:1.创建一个函数,这个函数从每个秒表数据的列表中接收一个字符串作为输入,然后处理这个字符串,将找到的所有短横线和冒号替换成一个点号,并返回清理过的字符串

       2.创建一个空列表,将清理过的数据放在该列表中,然后进行排序.

    注意:如果字符串已经包含一个点好,则不需要在做清理

    2.修改为(james的正常排序):

    In [55]: james#原数据
    Out[55]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
    In [56]: clean_james=[]#创建一个空列表
    In [57]: clean_james
    Out[57]: []
    #定义一个转换数据格式的方法(将其中你给的:-都变成.)
    In [58]: def sanitize(time_string):
        ...:     if '-' in time_string:
        ...:         splitter="-"
        ...:     elif ":" in time_string:
        ...:         splitter=":"
        ...:     else:
        ...:         return time_string
        ...:     (mins1,secs1)=time_string.split(splitter)
        ...:     return(mins1+"."+secs1)
        ...: 
    #循环james列表,将他变成(分.秒)形式狗,添加到clean_james列表中
    In [59]: for i in james:
        ...:     clean_james.append(sanitize(i))
        ...: print(clean_james)
        ...: print(sorted(clean_james))#对该列表进行排序
        ...: 
    ['2.34', '3.21', '2.34', '2.45', '3.01', '2.01', '2.01', '3.10', '2.22']
    ['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']

    3.其他替换冒号跟短横线的方法:

    james=['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
    print(type(james))
    tihuan=str(james)
    tihuan1=tihuan.replace("-",".")
    tihuan2=tihuan1.replace(":",".")
    print tihuan2
    print(type(tihuan2))

    列表推导

    1. 创建一个新列表来存放转换后的数据
    2. 迭代处理原列表中的各个数据项
    3. 每次迭代都要完成转换
    4. 将转换后的数据追加到新列表
    #将分钟转换成秒
    In [60]: mins=[1,2,3]
    In [61]: secs=[m*60 for m in mins]
    In [62]: secs
    Out[62]: [60, 120, 180]
    #将name的小写变成大写
    In [63]: name=["my","name","is","huahua"]
    In [66]: upper=[s.upper() for s in name]
    In [67]: upper
    Out[67]: ['MY', 'NAME', 'IS', 'HUAHUA']
    #将data中的字符串变成float
    In [68]: data=['2.01','2.22','9.66']
    In [69]: data1=[float(q) for q in data]
    In [70]: data1
    Out[70]: [2.01, 2.22, 9.66]

    4. 简化上述序号为2的代码:

    In [71]: james
    Out[71]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
    In [72]: def sanitize(time_string):
        ...:     if '-' in time_string:
        ...:         splitter="-"
        ...:     elif ":" in time_string:
        ...:         splitter=":"
        ...:     else:
        ...:         return time_string
        ...:     (mins1,secs1)=time_string.split(splitter)
        ...:     return(mins1+"."+secs1)
        ...: 
    In [73]: print(sorted([sanitize(i) for i in james])) #列表推倒
    ['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']

    列表分片

    5.迭代删除重复项,打印出最快的3个时间

    思路:

    • 需要新建一个空列表
    • 填入james中找到的唯一的数据项(使用not in)
    In [76]: james#james元数据
    Out[76]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
    #替换数据格式的函数
    In [77]: def sanitize(time_string):
        ...:     if '-' in time_string:
        ...:         splitter="-"
        ...:     elif ":" in time_string:
        ...:         splitter=":"
        ...:     else:
        ...:         return time_string
        ...:     (mins1,secs1)=time_string.split(splitter)
        ...:     return(mins1+"."+secs1)
        ...: 
    #打印出转换格式后的james并将他排序
    In [78]: james1=(sorted([sanitize(i)for i in james]))
    #排序后的数据
    In [79]: james1
    Out[79]: ['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']
    #空列表
    In [80]: unique_james=[]
    #循环james1列表,判断该元素是否在unique_james中存在,若存在,不添加,若不存在,添加
    In [81]: for s in james1:
        ...:     if s not in unique_james:
        ...:         unique_james.append(s)
        ...: print(unique_james[0:3]) #打印出最快的3个成绩
        ...: 
    ['2.01', '2.22', '2.34']

     使用集合删除重复项

    注意:集合是不允许有重复元素的

    In [82]: s={1,2,3,4,5,5,5,5,5,5,}
    In [83]: s
    Out[83]: {1, 2, 3, 4, 5}

     6.使用set和列表分片修改上述代码(打印出最快的3个时间)

    In [86]: james
    Out[86]: ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
    In [87]: def sanitize(time_string):
        ...:     if '-' in time_string:
        ...:         splitter="-"
        ...:     elif ":" in time_string:
        ...:         splitter=":"
        ...:     else:
        ...:         return time_string
        ...:     (mins1,secs1)=time_string.split(splitter)
        ...:     return(mins1+"."+secs1)
        ...: 
    #1.将转换后的列表变成集合(set)
    #2.在将集合排序
    In [88]: james1=(sorted(set([sanitize(i)for i in james])))
    In [89]: james1
    Out[89]: ['2.01', '2.22', '2.34', '2.45', '3.01', '3.10', '3.21']
    #取最快的3个时间
    In [90]: james1=(sorted(set([sanitize(i)for i in james]))[0:3])
    In [91]: james1
    Out[91]: ['2.01', '2.22', '2.34'

     知识点总结:

     sort()

    1. 原地排序(升序)
    2. a.sort()
    3. 降序(sort(reverse=True))

    sorted()

    1. 复制排序(升序)
    2. data1=sorted(data)
    3. 降序(sorted(reverse=True))

    列表推导

    1. [表达式 for 变量 in 列表]    或者  [表达式 for 变量 in 列表 if 条件]
    2. [m*60 for m in f]

    列表分片

    1. 在分片规则里list、tuple、str(字符串)都可以称为序列,都可以按规则进行切片操作
    2. 注意切片的下标0代表顺序的第一个元素,-1代表倒序的第一个元素;且切片不包括右边界,例如[0:3]代表元素0、1、2不包括3。
    3. james[0:3]

    set

    1. set是无序的
    2. 不存在重复元素(可以使用set来去重) 
    #coding=utf-8
    """
    总需求:在4组秒表记录中取出最快的3个时间
    """
    #获取文件中的内容
    def get_filecontent(filename):
        try:
            with open(filename) as f:
                data=f.readline().strip().split(",")
            return data
        except IOError as e:
            raise e
    #清洗数据
    def sanitize(time_string):
        if "-" in time_string:
            splitter='-'
        elif ":" in time_string:
            splitter=':'
        else:
            return time_string
        (mins,sece)=time_string.strip().split(splitter)
        return(mins+"."+sece)
    #排序,取出最快的3个时间
    yssj=get_filecontent("D:pydjjames.txt")
    print(yssj)
    #数据推导分析
    #1.将yssj中的每个元素去遍历,清理元素,变成分.秒格式
    #2.set:将取出来的整个数据变成集合,因为集合可以去重,它具有不存在重复性元素的特性
    #3.sorted:将集合复制排序
    #4.[0:3],去除前3个数
    print(sorted(set([sanitize(i)for i in yssj]))[0:3])
    #clean_sj=[]
    #for i in yssj:
    #    clean_sj.append(sanitize(i))
    #print(clean_sj)
    #print(sorted(set(clean_sj))[0:3])

     

  • 相关阅读:
    Systemd 进程管理教程
    traefik的80和443端口占用进一步分析
    使用KubeOperator安装k8s集群后,节点主机yaml文件路径
    kubernetes1.20 部署 traefik2.3
    数据采集实战(二)-- 京粉商品
    机器人导航(仿真)(三)——路径规划(更新中)
    机器人导航(仿真)(二)——amcl定位
    Tkinter 做简单的窗口视窗 Python GUI
    机器人导航(仿真)(一)——SLAM建图
    (转载)VMware Workstation Ubuntu 20.04 无法连接网络问题
  • 原文地址:https://www.cnblogs.com/8013-cmf/p/7058804.html
Copyright © 2020-2023  润新知