• ubuntu14.04安装pyspider


    sudo apt-get install libcurl4-openssl-dev libxml2-dev libxslt1-dev

    sudo atp-get install phantomjs

    激活虚拟环境(python3.6.7)

    pip install pyspider

    执行pysqpider 即可 

    如果出现mysql相关的错误执行下面的语句先。

    sudo apt-get purge mysql* 

    sudo apt-get autoremove 

    sudo apt-get autoclean

    sudo apt-get dist-upgrade

     发布

    This document is based on MySQL + RabbitMQ

    config.json

    Although you can use command-line to specify the parameters. A config file is a better choice.

    {
      "taskdb": "mysql+taskdb://username:password@host:port/taskdb",
      "projectdb": "mysql+projectdb://username:password@host:port/projectdb",
      "resultdb": "mysql+resultdb://username:password@host:port/resultdb",
      "message_queue": "amqp://username:password@host:port/%2F",
      "webui": {
        "username": "some_name",
        "password": "some_passwd",
        "need-auth": true
      }
    }

    Database Connection URI type: should be one of `taskdb`, `projectdb`, `resultdb`.

    running

    You should run components alone with subcommands. You may add & after command to make it running in background and use screen or nohup to prevent exit after your ssh session ends. It's recommended to manage components with Supervisor.

    # start **only one** scheduler instance
    pyspider -c config.json scheduler
    
    # phantomjs
    pyspider -c config.json phantomjs
    
    # start fetcher / processor / result_worker instances as many as your needs
    pyspider -c config.json --phantomjs-proxy="localhost:25555" fetcher
    pyspider -c config.json processor
    pyspider -c config.json result_worker
    
    # start webui, set `--scheduler-rpc` if scheduler is not running on the same host as webui
    pyspider -c config.json webui

    you can get complete options by running pyspider --help and pyspider webui --help for subcommands. 

    "webui" in JSON is configs for subcommands. You can add parameters for other components similar to this one.

    To deploy pyspider components in each single processes, you need at least one database service. pyspider now supports MySQLMongoDB and PostgreSQL. You can choose one of them.

    And you need a message queue service to connect the components together. You can use RabbitMQBeanstalk or Redis as message queue.

    pip install --allow-all-external pyspider[all]

    Even if you had install pyspider using pip before. Install with pyspider[all] is necessary to install the requirements for MySQL/MongoDB/RabbitMQ

  • 相关阅读:
    在eclipse中进行Struts2项目的配置
    通过Java反射来理解泛型的本质
    Java动态加载类在功能模块开发中的作用
    让正常网页呈现黑白色调的方法
    养生-五谷:花生
    汉语-词语:男人
    地理-地点:白浮图镇
    地理-地点:鸡黍镇
    烹饪:杂粮
    烹饪:五谷
  • 原文地址:https://www.cnblogs.com/zxpo/p/10012330.html
Copyright © 2020-2023  润新知