• Scrapy shell使用


    注意:容易出现403错误,实际爬取时不会出现。
    response - a Response object containing the last fetched page
    >>>response.xpath('//title/text()').extract()
     return a list of selectors
    >>>for index, link in enumerate(links):
    ... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract()) ... print 'Link number %d points to url %s and image %s' % args
    Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg'] Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg'] Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg'] Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg'] Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']
    enumerate() 函数一般用在 for 循环当中。
    普通的 for 循环
    >>>i = 0 >>> seq = ['one', 'two', 'three'] >>> for element in seq: ... print i, seq[i] ... i +=1 ... 0 one 1 two 2 three
    for 循环使用 enumerate
    >>>seq = ['one', 'two', 'three'] >>> for i, element in enumerate(seq): ... print i, seq[i] ... 0 one 1 two 2 three
    suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:
    >>> divs = response.xpath('//div')
    note the dot prefixing the .//p XPath):
    >>> for p in divs.xpath('.//p'): # extracts all <p> inside ... print p.extract()
    Another common case would be to extract all direct <p> children:
    >>> for p in divs.xpath('p'): ... print p.extract()
    在程序中使用shell
    from scrapy.shell import inspect_response inspect_response(response, self)
    Ctrl-D (or Ctrl-Z in Windows) to exit the shell and resume the crawling:
    xpath最外层最好用单引号!
    shell 本地html,方便 调试(但别取名为index.html)
    scrapy shell ./path/to/file.html ,即使在本目录,也必须要加./,不能直接 shell file.html scrapy shell ../other/path/to/file.html scrapy shell /absolute/path/to/file.html
  • 相关阅读:
    关于svn的安装配置开启服务过程和 eclipse安装SVN插件的方法
    java mail 接受QQ邮箱未读的邮件
    java 通过QQ账号发送邮件
    转:现实生活中怎么倒车入库
    转:现实中倒库怎么找点,现实生活倒车入库技巧图解
    转:Myeclipse——格式化代码块快捷键
    转:Excel快速输入的技巧 有哪些
    Excel 如何填充自增顺序号,自增1或者2或者n
    Excel的vlookup函数 秒速配两表数据
    Excel如何使用数据有效性
  • 原文地址:https://www.cnblogs.com/elesos/p/7885474.html
Copyright © 2020-2023  润新知