经人提醒忘记发网址id的爬取过程了,
http://www.beijing.gov.cn/hudong/hdjl/com.web.consult.consultDetail.flow?originalId=AH20021300174
AH20021300174为要爬取的内容
现代码如下:
1 import json 2 import requests 3 import io 4 5 url="http://www.beijing.gov.cn/hudong/hdjl/com.web.search.mailList.mailList.biz.ext" 6 7 kv = { 8 'Host': 'www.beijing.gov.cn', 9 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0', 10 'Accept': 'application/json, text/javascript, */*; q=0.01', 11 'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2', 12 'Accept-Encoding': 'gzip, deflate', 13 'Content-Type': 'text/json', 14 'X-Requested-With': 'XMLHttpRequest', 15 'Content-Length': '155', 16 'Origin': 'http://www.beijing.gov.cn', 17 'Connection': 'keep-alive', 18 'Referer': 'http://www.beijing.gov.cn/hudong/hdjl/'} 19 20 def page(begin): 21 query={ 22 'PageCond/begin': begin, 23 'PageCond/isCount':'true', 24 'PageCond/length':6, 25 } 26 datas=json.dumps(query) 27 r=requests.post(url,data=datas,headers=kv) 28 print(r.status_code) 29 print(r.text) 30 js=json.loads(r.text) 31 for j in js["mailList"]: 32 print(j) 33 print(j.get("original_id")) 34 35 36 def href(): 37 begin=0 38 for i in range(0,5584): 39 if i%6==0: 40 page(i) 41 #print(begin) 42 43 if __name__=="__main__": 44 href()