scrapy 框架
response的解析
>>> response.css('title::text').extract() ['Quotes to Scrape']
There are two things to note here:
(1)one is that we’ve added::text
to the CSS query, to mean we want to select only the text elements directly inside<title>
element. If we don’t specify::text
, we’d get the full title element, including its tags:
(2)the other thing is that the result of calling.extract()
is a list, because we’re dealing with an instance ofSelectorList
. When you know you just want the first result, as in this case, you can do:
When you know you just want the first result, as in this case, you can do:
>>> response.css('title::text').extract_first()
'Quotes to Scrape'
Besides the extract()
and extract_first()
methods, you can also use the re()
method to extract using regular expressions:
>>> response.css('title::text').re(r'Quotes.*') ['Quotes to Scrape'] >>> response.css('title::text').re(r'Qw+') ['Quotes'] >>> response.css('title::text').re(r'(w+) to (w+)') ['Quotes', 'Scrape']