• dygod.net


    # -*- coding: utf-8 -*-
    import scrapy
    from scrapy.linkextractors import LinkExtractor
    from scrapy.spiders import CrawlSpider, Rule
    
    
    class DgSpider(CrawlSpider):
        name = 'dg'
        # allowed_domains = ['https://www.dygod.net']
        start_urls = ['https://www.dygod.net/html/gndy/dyzz/index.html']
    
        rules = (
            Rule(LinkExtractor(allow=r'https://www.dygod.net/html/gndy/dyzz/index_d+.html')),
            Rule(LinkExtractor(allow=r'https://www.dygod.net/html/gndy/dyzz/d+/d+.html'), callback='parse_item', follow=True),
        )
    
        def parse_item(self, response):
            item = {}
            #item['domain_id'] = response.xpath('//input[@id="sid"]/@value').get()
            item['name'] = response.css('div[id*=Zoom] p:nth-child(3)::text').get()
            # item['time'] = response.xpath('//div[@id="description"]').get()
            return item

    刚开始报错,因为 start_urls的https://www.dygod.net/html/gndy/dyzz/index.html最后多了一个/

    后来继续报错,filter offline ....dygod.net,没搞清楚就直接把allowed_domains注释掉了就好了。。。,

    但是扒下来的汉字都是u25ceu7247u3000u3000u540du3000 Unicode模式

     
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- focus on what you want to be
  • 相关阅读:
    hdu 1423 LICS
    poj 1135
    poj 1112
    poj 1087
    poj 1094
    谷歌浏览器字体小于12px不能正常显示bug
    gulpfile.js配置 实现ctrl+s自动编译和刷新浏览器
    <hr>标签横线的颜色
    jQuery轮播图鼠标移入停止,移出播放,点击小横条切换图片
    最简单的jq轮播图
  • 原文地址:https://www.cnblogs.com/bamboozone/p/10464146.html
Copyright © 2020-2023  润新知