无论Linux还是Mac, 要安装pyquery都不是一件容易的事儿
主要是lxml的依赖较多,且需要手动安装
cat /etc/redhat-release CentOS release 6.6 (Final)
1>>python-dev
yum install gcc libffi-devel python-devel openssl-devel
2>>libxslt, libxml
curl -o libxslt-1.1.29.zip https://git.gnome.org/browse/libxslt/snapshot/libxslt-1.1.29.zip
curl -o libxml-1.7.3.tar.gz http://xmlsoft.org/sources/old/libxml-1.7.3.tar.gz
但是为何安装完了还是各种错误呢!
搜索StackOverflow
yum instal libxslt-devel
然后也就安装成功了
ubuntu下就简单多了
sudo apt-get install python-lxml
Pyquery使用是特别方便的
# -*- coding:utf-8 -*- import requests from pyquery import PyQuery as pq headers = { 'User-Agent':"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0", 'Cookie':"UOR=y.qq.com,widget.weibo.com,y.qq.com; SINAGLOBAL=8818090954579.496.1461652869389; ULV=1466417097935:5:3:2:4821515600800.323.1466417097929:1466410712129; SUB=_2AkMgEzhXf8NhqwJRmP0WyGPjbol3yw7EieLBAH7sJRMxHRl-yT83qnEItRAP-q6huRFAXpidMwh9ScHwNyuDMw..; SUBP=0033WrSXqPxfM72-Ws9jqgMF55529P9D9WFzaPQwqvOukuGw_aqZrYOD; YF-V5-G0=c998e7c570da2f8537944063e27af755; YF-Page-G0=ffe43932f05408fcdf32c673d8997f97" } s = requests.Session() url = 'http://weibo.com/aj/v6/comment/big?ajwvr=6&id=3995838911192732&page={}&__rnd=1468201664629' def save(page): try: r = s.get(url.format(page), headers=headers) print r.url except Exception, e: print e exit(0) v_source = pq( r.json()['data']['html']) datas = v_source('.list_ul .list_li') for data in datas: print pq(data)('.WB_face a img').attr('src') print pq(data)('.WB_from').text(), pq(data).find('.WB_text').text() #pq(data)('.WB_text').find('a').text(), if __name__ == '__main__': for page in xrange(1,5): save(page)
;)