1.启动docker,在命令行里输入
docker run -p 8050:8050 scrapinghub/splash
在docker上运行splash引擎
2.接下来就可以来写爬虫文件了
首先在setting里配置
splash_url='http://loaclhost:8050'
DUPEFLITER='scrapy_splash.SplashAwareDupeFilter'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware':723,
'scrapy_splash.SplashMiddleware':725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware':810
}
同时启用pileline
3.在写spider文件时,在开头加入
from scrapy_splash import SplashRequest
我们就使用SplashReqeust方法来将我们要解析的页面提交给splash引擎的