通用爬虫

xpath解析

import requests
from lxml import etree

url='https://bj.58.com/shunyi/ershoufang/?PGTID=0d30000c-0047-6aa6-0218-69d1ed59a77b&ClickID=3'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}

data = requests.get(url=url,headers=headers).text
tree = etree.HTML(data)
em=tree.xpath('//ul[@class="house-list-wrap"]/li')
for i in em:
title=i.xpath('./div[@class="list-info"]/h2/a/text()')
price=i.xpath('./div[@class="price"]//text()')
price=''.join(price)





中文乱码问题  
     第一种:
     response=requests.get(url=url,headers=headers)  
     response.encoding = 'utf-8'
      第二种(万能):
     image_name=image_name.endcode('iso-8859-1').decode('gbk')
请求头

headers = {
'Connection':'close',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
}

代理ip

requests.get(url=url,headers=headers,proxies={'http':'111.1.111.1:8080'})

相关阅读:
Keep at Most 100 Characters
Larry and Inversions
计算指数
简单题
重要的话说三遍
I Love GPLT
猜数字
打印沙漏
多态性（polymorphism），封装性（encapsulation），内聚（cohesion）以及耦合（coupling）的基本概念
Hibernate面试题

原文地址：https://www.cnblogs.com/ls1997/p/10847390.html