转载自:静觅 » [Python3网络爬虫开发实战] 6.3-Ajax结果提取
上面的代码中比较好的几个地方记录:
1 base_url = 'https://m.weibo.cn/api/container/getIndex?' 2 3 headers = { 4 'Host': 'm.weibo.cn', 5 'Referer': 'https://m.weibo.cn/u/2830678474', 6 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', 7 'X-Requested-With': 'XMLHttpRequest', 8 } 9 10 11 def get_page(page): 12 params = { 13 'type': 'uid', 14 'value': '2830678474', 15 'containerid': '1076032830678474', 16 'page': page 17 } 18 19 # 在这一步中将url分成路径和参数两个部分,使用urlencode对参数进行加载 20 url = base_url + urlencode(params) 21 try: 22 response = requests.get(url, headers=headers) 23 # 这个部分对返回码进行判断,去掉非正常情况的处理 24 if response.status_code == 200: 25 # 返回结果是json格式的直接调用json方法,不用json.loads(response.content) 26 return response.json() 27 except requests.ConnectionError as e: 28 print('Error', e.args)
个人代码:
1 import requests 2 import json 3 4 headers = { 5 "Referer":"https://m.weibo.cn/u/2830678474?sudaref=cuiqingcai.com&display=0&retcode=6102", 6 "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36", 7 "X-Requested-With":"XMLHttpRequest", 8 "X-XSRF-TOKEN":"609539" 9 } 10 11 url = "https://m.weibo.cn/api/container/getIndex?sudaref=cuiqingcai.com&display=0&retcode=6102&type=uid&value=2830678474&containerid=1076032830678474" 12 while True: 13 response = requests.get(url,headers=headers) 14 try: 15 since_id = json.loads(response.content)["data"]["cardlistInfo"]["since_id"] 16 except: 17 break 18 url = "https://m.weibo.cn/api/container/getIndex?sudaref=cuiqingcai.com&display=0&retcode=6102&type=uid&value=2830678474&containerid=1076032830678474&since_id=" + str(since_id) 19 content = json.loads(response.content)["data"]["cards"] 20 for i in range(10): 21 try: 22 print(content[i]["mblog"]["text"]) 23 except: 24 continue
部分结果展示:
1 每当我颓废的时候,看看这个视频,我就浑身充满了斗志!为了我和我老婆的小米之家!我可以!我能行!加油! <a data-url="http://t.cn/A6hrPmIS" href="https://m.weibo.cn/p/index?containerid=2304444475185156522026&url_type=39&object_type=video&pos=1&luicode=10000011&lfid=1076032830678474" data-hide=""><span class='url-icon'><img style=' 1rem;height: 1rem' src='https://h5.sinaimg.cn/upload/2015/09/25/3/timeline_card_small_video_default.png'></span><span class="surl-text">崔庆才丨静觅的微博视频</span></a> 2 <span class="url-icon"><img alt=[doge] src="//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-861403219c.png" style="1em; height:1em;" /></span> 3 转发微博 4 今天我和我老婆都是健康饮食的好仔仔。<span class="url-icon"><img alt=[馋嘴] src="//h5.sinaimg.cn/m/emoticon/icon/default/d_chanzui-01ee2388fd.png" style="1em; height:1em;" /></span>