• BS4爬虫实例应用-CISP


    爬取目前在官网可查询的CISP证书编号以及有效期并入库

    也算是暴力破解,burp使用grep功能呢也可以实现。

    下面是python的代码

    #coding=utf-8
    import requests
    import sys
    from bs4 import BeautifulSoup
    #demourl='http://www.itsec.gov.cn/export/sites/itsec/person/peregester/CNITSEC2012CISE01098/'
    counter = 1 
    for i in range(2000,2017):
        for t in ['CISE','CISA','CISO','CISM','CISE-E','CISO-E','CISM-E','CISA-E','CISP-Auditor']:
            for j in range(10000):
                SNum = "CNITSEC"+str(i)+t+"0"+str(j).zfill(4)
                url = "http://www.itsec.gov.cn/export/sites/itsec/person/peregester/%s/"% SNum
                print counter , SNum ,'  Checking .........'
                try:
                    res = requests.get(url)
                    res.encoding = 'utf-8'
                    soup = BeautifulSoup(res.text,'html.parser')
                    clength   = res.headers['content-length']
    
                    if 200<= int(res.status_code) <=210 :
                        itsecid   = soup.select('.detail_title')[0].text.encode('gb2312','ignore').strip()
                        starttime = soup.select('.tdm')[0].text.encode('utf-8','ignore').strip().replace("
    ","").replace("                ","")
                        endtime   = soup.select('.tdm')[1].text.encode('utf-8','ignore').strip().replace("
    ","").replace("                ","")
                        username  = soup.select('.tdm')[2].text.encode('utf-8','ignore').strip()
                        authlevel = soup.select('.tdm')[3].text.encode('utf-8','ignore').strip()
                        print clength
                        print itsecid
                        print starttime
                        print endtime
                        print username
                        print authlevel
                        with open('cispall.txt','a') as f:
                            f.writelines("%s%s%s%s%s  %s
    "%(itsecid,starttime,endtime,username,authlevel,clength))
                    else:
                        print SNum ,'Non-existent ########'
                    counter+=1
                except:
                    info=sys.exc_info()
                    print 'except error'
                    print info[0],":",info[1]

    过程:

    根据分割特点可入库存储

  • 相关阅读:
    解决easy ui 1.4datebox控件不能清空的问题
    easy ui 1.4的NumberBox,失去焦点后不能再次输入小数点
    使用easy ui过程中资料(网址)总结
    解决easy ui两次请求服务器的问题
    JQuery EasyUI中datagrid的使用
    java对象和xml相互转换
    mybatisPuls初步了解
    Spring Cloud Alibaba系列
    artemis.http.client1.2 导致springboot日志不输出
    当我们在谈论爬虫时我们在谈论什么(一)
  • 原文地址:https://www.cnblogs.com/shellr00t/p/Crawler.html
Copyright © 2020-2023  润新知