• [python]赶集网二手房爬虫插件【可用任意扩展】


        最近应一个老铁的要求,人家是搞房产的,所以就写了这个二手房的爬虫,因为初版,所以比较简单,有能力的老铁可用进行扩展。

       

    import requests
    import os
     
    from bs4 import BeautifulSoup
     
     
     
    class GanJi():
        """docstring for GanJi"""
     
        def __init__(self):
            super(GanJi, self).__init__()
     
        def get(self,url):
     
            user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36'
            headers    = {'User-Agent':user_agent}
             
            webData    = requests.get(url + 'o1',headers=headers).text
            soup       = BeautifulSoup(webData,'lxml')
             
             
            sum        = soup.find('span',class_="num").text.replace("套","")
            ave        = int(sum) / 32
            forNum     = int(ave)
     
            if forNum < ave:
                forNum = forNum + 1
     
     
            for x in range(forNum):
                webData    = requests.get(url + 'o' + str(x + 1),headers=headers).text
                soup       = BeautifulSoup(webData,'lxml')
                find_list  = soup.find('div',class_="f-main-list").find_all('div',class_="f-list-item ershoufang-list")
     
                for dl in find_list:
                     
                    print(dl.find('a',class_="js-title value title-font").text,end='|') # 名称
     
                    # 中间 5 个信息
                    tempDD = dl.find('dd',class_="dd-item size").find_all('span')
                    for tempSpan in tempDD:
                        if not tempSpan.text == '' : 
                            print(tempSpan.text.replace("
    ", ""),end='|')
     
                     
                    print(dl.find('span',class_="area").text.replace(" ","").replace("
    ",""),end='|') # 地址
                     
                    print(dl.find('div',class_="price").text.replace(" ","").replace("
    ",""),end='|') # 价钱
                     
                    print(dl.find('div',class_="time").text.replace(" ","").replace("
    ",""),end="|") # 平均
                     
                    print("http://chaozhou.ganji.com" + dl['href'],end="|") # 地址
     
                    print(str(x + 1))
     
    if __name__ == '__main__':
        temp = GanJi()
        temp.get("http://chaozhou.ganji.com/fang5/xiangqiao/")
    

      

  • 相关阅读:
    《第一行代码》学习笔记18-广播接收器Broadcast_Receiver(1)
    《第一行代码》学习笔记17-碎片Fragment(2)
    《第一行代码》学习笔记16-碎片Fragment(1)
    《第一行代码》学习笔记15-UI(4)
    《第一行代码》学习笔记14-UI(3)
    《第一行代码》学习笔记13-UI(2)
    《第一行代码》学习笔记12-UI(1)
    ios-NSMutableAttributedString 更改文本字符串颜色、大小
    ios-实现项目在开发、测试、正式环境快速部署
    ios-滚动导航条页面
  • 原文地址:https://www.cnblogs.com/68xi/p/9486957.html
Copyright © 2020-2023  润新知