Scrapyd是一款用于管理scrapy爬虫的部署和运行的服务,提供了HTTP JSON形式的API来完成爬虫调度涉及的各项指令。Scrapyd是一款开源软件,代码托管于Github上。
点击此链接https://scrapyd.readthedocs.io/en/stable/阅读官方文档。Gerapy 是一款分布式爬虫管理框架,支持 Python 3,基于 Scrapy、Scrapyd、Scrapyd-Client、Scrapy-Redis、Scrapyd-API、Scrapy-Splash、Jinjia2、Django、Vue.js 开发。本文简单地介绍一下在window上安装和运行Scrapyd的各个步骤
1、 Scrapyd安装与部署
1.1、scrapyd安装
通过以下命令进行安装
pip install scrapyd
pip install scrapyd-client
安装完之后,可以直接在dos命令行里输入scrapyd,开启scrapyd服务。可以访问http://127.0.0.1:6800/,进入控制台页面
1.2、配置文件
1.1完成之后,会在C:UsersshixianqingAppDataLocalProgramsPythonPython36Libsite-packagesscrapyd目录下面生成一个default_scrapyd.conf配置文件。/可以更改bind_address和http_port两项
[scrapyd] eggs_dir = eggs logs_dir = logs items_dir = jobs_to_keep = 5 dbs_dir = dbs max_proc = 0 max_proc_per_cpu = 4 finished_to_keep = 100 poll_interval = 5.0 bind_address = 127.0.0.1 http_port = 6800 debug = off runner = scrapyd.runner application = scrapyd.app.application launcher = scrapyd.launcher.Launcher webroot = scrapyd.website.Root [services] schedule.json = scrapyd.webservice.Schedule cancel.json = scrapyd.webservice.Cancel addversion.json = scrapyd.webservice.AddVersion listprojects.json = scrapyd.webservice.ListProjects listversions.json = scrapyd.webservice.ListVersions listspiders.json = scrapyd.webservice.ListSpiders delproject.json = scrapyd.webservice.DeleteProject delversion.json = scrapyd.webservice.DeleteVersion listjobs.json = scrapyd.webservice.ListJobs daemonstatus.json = scrapyd.webservice.DaemonStatus
1.3、部署scrapy爬虫
1.3.1 让爬虫找到scrapyd
要让scrapy爬虫能够在scrapyd服务上运行,则必须让scrapy爬虫项目找到scrapyd。所以我们需要在创建好的爬虫工程中找到scrapy.cfg配置文件,里面内容如下所示:
[settings] default = hickey.settings [deploy:hickey] url = http://localhost:6800/ username = hickey password = 123456 project = hickey
deploy:hickey 中的hickey是服务名字,url----scrapyd运行服务地址
1.3.2 部署
scrapyd-deploy 服务名字 -p project-name(工程名字)
scrapyd-deploy hickey -p hickey
出现问题:
scrapyd-deploy不是内部命令
解决办法:
找到C:UsersshixianqingAppDataLocalProgramsPythonPython36Scripts目录,在该文件夹下创建一个名为scrapyd-deploy.bat的文件,往里面写入如下内容
@echo off "C:UsersshixianqingAppDataLocalProgramsPythonPython36python.exe" "C:UsersshixianqingAppDataLocalProgramsPythonPython36Scriptsscrapyd-deploy" %1 %2 %3 %4 %5 %6 %7 %8 %9
1.4 运行
调度:
curl http:// localhost:6800 / addversion.json -F project = myproject -F version = r23 -F egg=@myproject.egg
其他调用api,请点击此链接
2、gerapy安装
使用方法可参照这位大神的博客https://blog.csdn.net/fengltxx/article/details/79894839