• python处理blog文件数据


    以下是Python数据处理的题目说明与要求:
    The attachment is a log file used to show running status of set-top-box, and each line in the file follows the format of “LineNumber + Time + ProcessName + (ProcessID) + Logs”, currently the logs are displayed in time order. Please write one script with Python language to support the following features:

    1. Sort the logs in alphabetical order of process name, e.g.: halserver, processman, etc.
    2. Filter the logs according to process name, the output only show the interested logs, e.g.: “procman”, and hiding the rest.
    3. Statistics the number of log lines for each process.

    这是机顶盒执行的blog文本文件,打开后部分截图例如以下:
    这里写图片描写叙述

    一看非常乱,事实上不应该用微软的txt打开,尝试用notepad++打开后,结构清楚了非常多,部分截图例如以下:
    这里写图片描写叙述

    以下给出代码:
    第1题的代码例如以下:

    #coding=utf-8                                
    import re  
    f1=open('stblog.txt','r')
    f2=open('cc1.txt','w')
    list1=f1.readlines()
    list_process=[]    #定义列表存放Process
    res='dDdd:dd:dd.d{3}s([a-z]+)'
    
    for i in range(len(list1)):
        list_process.append(re.findall(res,str(list1[i])))
    
    for i in range(len(list_process)):  #測试正则是否可行
        if len(list_process[i])>1:
            print 'zheng ze fail'
    
    
    #print len(list_process)    
    #print len(list1)
    #print list_process[141]
    #print list1[141]
    for m in range(len(list1)):      #冒泡排序
        for n in range(m+1,len(list1)):
            if cmp(list_process[m],list_process[n])>0:
                list_process[m],list_process[n]=list_process[n],list_process[m]
                list1[m],list1[n]=list1[n],list1[m]
    
    f2.writelines(list1)
    

    第2,3题代码例如以下:

    #coding=utf-8                              
    import re  
    f1=open('stblog.txt','r')
    f2=open('cc2.txt','w')
    list1=f1.readlines()
    list_process=[]      #定义列表存放Process
    list2=[]
    count=0
    res='dDdd:dd:dd.d{3}s([a-z.-]+)'
    
    
    for i in range(len(list1)):
        list_process.append(re.findall(res,str(list1[i])))
    
    for i in range(len(list_process)):  #測试正则是否可行
        if len(list_process[i])>1:
            print 'zheng ze fail'
    
    
    s=raw_input("please input the log you interested:")
    
    for i in range(len(list_process)):
        if list_process[i]==s.split():
            list2.append(list1[i])   #将相应的process行加入到cc2.txt
            count+=1
    print count
    f2.writelines(list2)
    
  • 相关阅读:
    cannot import name 'PILLOW_VERSION'
    scala spark2.0 rdd dataframe 分布式计算欧式距离
    scala spark dataframe 修改字段类型
    获取cookie脚本
    Loadrunner 获取请求的返回结果函数web_reg_save_param
    Python模拟接口登录
    web自动化上传附件 2
    Web自动化附件上传
    robotframework 连接mysql数据库
    Json格式获取接口返回的值
  • 原文地址:https://www.cnblogs.com/jhcelue/p/7202740.html
Copyright © 2020-2023  润新知