• 通用爬虫


    xpath解析

    import requests
    from lxml import etree

    url='https://bj.58.com/shunyi/ershoufang/?PGTID=0d30000c-0047-6aa6-0218-69d1ed59a77b&ClickID=3'
    headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
    }

    data = requests.get(url=url,headers=headers).text
    tree = etree.HTML(data)
    em=tree.xpath('//ul[@class="house-list-wrap"]/li')
    for i in em:
    title=i.xpath('./div[@class="list-info"]/h2/a/text()')
    price=i.xpath('./div[@class="price"]//text()')
    price=''.join(price)





    中文乱码问题
    第一种:

    response=requests.get(url=url,headers=headers) response.encoding = 'utf-8'
    第二种(万能):
    image_name=image_name.endcode('iso-8859-1').decode('gbk')
    请求头

       headers = {
       'Connection':'close',
       'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36'
    }

       代理ip

        requests.get(url=url,headers=headers,proxies={'http':'111.1.111.1:8080'})

    
    


     

  • 相关阅读:
    Keep at Most 100 Characters
    Larry and Inversions
    计算指数
    简单题
    重要的话说三遍
    I Love GPLT
    猜数字
    打印沙漏
    多态性(polymorphism),封装性(encapsulation),内聚(cohesion)以及耦合(coupling)的基本概念
    Hibernate面试题
  • 原文地址:https://www.cnblogs.com/ls1997/p/10847390.html
Copyright © 2020-2023  润新知