1.安装、打开chales,配置charles。
1.1勾选Proxy->macOS Proxy选项,关闭默认的mac proxy设置。
1.2勾选Proxy->Proxy Settings,弹出弹框。设置HTTP的代理端口为:6666(一般默认为:8888,可以自己定义)
1.3勾选Proxy->SSL Proxying Settings,添加要抓包的域名。我们可以添加:*,匹配所有的。
2.手机端的配置。(以iso系统为例)
2.1点击连接的Wi-Fi的感叹号图标;点击最后一项:HTTP代理->配置代理;选择‘手动’,填入电脑的ip地址和刚刚设置chales的端口号:6666
3.https抓包的配置。
3.1因为要抓包的是https请求,所以我们还要安装证书。勾选Help->SSL Proxying->Install Charles Root Certificate。
3.2双击电脑端添加的charles证书,选择‘始终信任’。
3.3安装手机端的证书。勾选Help->SSL Proxying->Install Charles Root Certificate on a Mobile Device or Remote Browser。根据提示在手机端访问网址chls.pro/ssl。
3.4根据弹窗的提示,在手机端安装该证书。
3.5在‘通用->关于本机->证书信任设置’里选择完全信任该证书。(证书就是一套公钥私钥,所以手机和电脑端都要安装,并选择信任)
4.1点击圆形按钮,就可以追踪手机开始抓包了。
本文例子中是选择了一家沃尔玛超市,进入该店铺进行数据抓取。
4.2通过分析发现发现获取商品类目的url拼接规律:
url1 = 'https://daojia.jd.com/client?lat=22.56705&lng=113.95371&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=station%2FgetStationDetail&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22skuId%22%3A%22%22%2C%22orgCode%22%3A%2281372%22%2C%22activityId%22%3A%22%22%2C%22promotionType%22%3A%22%22%2C%22lgt%22%3A113.95371%2C%22lat%22%3A22.56705%7D&afsImg=&business='
body里的内容,解码后为:
body = {"storeId":"11653731","skuId":"","orgCode":"81372","activityId":"","promotionType":"","lgt":113.95371,"lat":22.56705}
body里的数值不影响获取类目的获取。所以通过url1发送get方法就可以获取数据。
import requests url = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22orgCode%22%3A%2281372%22%2C%22skuId%22%3A%22%22%2C%22catIds%22%3A%5B%7B%22catId%22%3A%224644375%22%2C%22type%22%3A2%7D%5D%7D&afsImg=&business=undefined' ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' headers = {'User-Agent': ua} res = requests.get(url, headers=headers) print(res.text) # 即为返回的数据内容
部分数据展示:
4.3通过分析发现获取不同类目下商品的url拼接规律:
url2 = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body=%7B%22storeId%22%3A%2211653731%22%2C%22orgCode%22%3A%2281372%22%2C%22skuId%22%3A%22%22%2C%22catIds%22%3A%5B%7B%22catId%22%3A%224644376%22%2C%22type%22%3A2%7D%5D%7D&afsImg=&business=undefined'
body里的内容,解码后为:
body = {"storeId":"11653731","orgCode":"81372","skuId":"","catIds":[{"catId":"4644376","type":2}]}
catId值可以从url1返回的数据提取,传入不同的catId值,就会返回对应该类目下商品的信息。
import requests import time from urllib.parse import quote def get_product(cateid2): # 传入二级类目的类目id值 body = { "storeId": "11653731", "orgCode": "81372", "skuId": "", "catIds": [{"catId": cateid2, "type": 2}]} body = json.dumps(body) body = quote(body) base_url = 'https://daojia.jd.com/client?lat=22.51424&lng=113.93068&city_id=1607&deviceToken=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&deviceId=b2e951ed-e72e-4a9a-b9ca-cd69348c3337&channel=wx_xcx&platform=5.0.0&platCode=H5&appVersion=5.0.0&xcxVersion=3.6.2&appName=paidaojia&deviceModel=appmodel&functionId=storeIndexSearch%2FsearchByCategory&isForbiddenDialog=false&isNeedDealError=false&isNeedDealLogin=false&body={}&afsImg=&business=undefined'.format(body) print(base_url) # 根据不同的cateId拼接url ua = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' headers = {'User-Agent': ua} res = requests.get(base_url, headers=headers) print(res.text)
部分数据展示:
4.4将数据整理好输出为表的格式:
filename = '{}.csv'.format(catename1) csvfile = open(filename, 'a') writer = csv.writer(csvfile) writer.writerow(['商品名称', '价格(单位:元)', '月销量', '图片', '二级类目', '一级类目']) for product in searchResultVOList: print(product) name = product['skuName'] img = product['imgUrl'] price = product['realTimePrice'] sale = product['monthSales'] writer.writerow([name, price, sale, img, catename2, catename1]) csvfile.close()
部分数据展示: