• python爬虫之scrapy安装(一)


     简介:

      Scrapy,Python开发的一个快速、高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据。Scrapy用途广泛,可以用于数据挖掘、监测和自动化测试。
      Scrapy吸引人的地方在于它是一个框架,任何人都可以根据需求方便的修改。它也提供了多种类型爬虫的基类,如BaseSpider、sitemap爬虫等,最新版本又提供了web2.0爬虫的支持。

     安装环境:

      Windows安装:

      注意:

        1、从上往下依次安装依赖包,.whl文件直接pip3 install 文件绝对路径和名字即可安装

        2、注意你的pip版本,下载9.0以上版本。

      Linux

      下面是介绍Centos6.5版本安装,注意yum源的配置。

      安装依赖包。

    yum install python3-dev
    
    yum install -y python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
    

      检查pip版本是不是9.0.1

    [root@localhost /]# pip3 --version
    pip 9.0.1 from /usr/local/lib/python3.6/site-packages (python 3.6)
    

      

      开始安装scrapy。(linux系统不像windows那样依赖包需要我们一个一个安装,它会自动安装所有需要安装的依赖包,省略了很多步骤和问题)

    [root@localhost /]# pip3 install scrapy
    Collecting scrapy
      Downloading Scrapy-1.4.0-py2.py3-none-any.whl (248kB)
        100% |████████████████████████████████| 256kB 29kB/s
    Collecting PyDispatcher>=2.0.5 (from scrapy)
      Downloading PyDispatcher-2.0.5.tar.gz
    Collecting parsel>=1.1 (from scrapy)
      Downloading parsel-1.2.0-py2.py3-none-any.whl
    Collecting service-identity (from scrapy)
      Downloading service_identity-17.0.0-py2.py3-none-any.whl
    Collecting w3lib>=1.17.0 (from scrapy)
      Downloading w3lib-1.18.0-py2.py3-none-any.whl
    Collecting queuelib (from scrapy)
      Downloading queuelib-1.4.2-py2.py3-none-any.whl
    Collecting lxml (from scrapy)
      Downloading lxml-4.1.0-cp36-cp36m-manylinux1_x86_64.whl (5.6MB)
        100% |████████████████████████████████| 5.6MB 13kB/s
    Collecting cssselect>=0.9 (from scrapy)
      Downloading cssselect-1.0.1-py2.py3-none-any.whl
    Collecting six>=1.5.2 (from scrapy)
      Downloading six-1.11.0-py2.py3-none-any.whl
    Collecting pyOpenSSL (from scrapy)
      Downloading pyOpenSSL-17.3.0-py2.py3-none-any.whl (51kB)
        100% |████████████████████████████████| 51kB 11kB/s
    Collecting Twisted>=13.1.0 (from scrapy)
      Downloading Twisted-17.9.0.tar.bz2 (3.0MB)
        100% |████████████████████████████████| 3.0MB 20kB/s
    Collecting attrs (from service-identity->scrapy)
      Downloading attrs-17.2.0-py2.py3-none-any.whl
    Collecting pyasn1-modules (from service-identity->scrapy)
      Downloading pyasn1_modules-0.1.5-py2.py3-none-any.whl (60kB)
        100% |████████████████████████████████| 61kB 79kB/s
    Collecting pyasn1 (from service-identity->scrapy)
      Downloading pyasn1-0.3.7-py2.py3-none-any.whl (63kB)
        100% |████████████████████████████████| 71kB 87kB/s
    Collecting cryptography>=1.9 (from pyOpenSSL->scrapy)
      Downloading cryptography-2.1.2-cp36-cp36m-manylinux1_x86_64.whl (2.2MB)
        100% |████████████████████████████████| 2.2MB 16kB/s
    Collecting zope.interface>=4.0.2 (from Twisted>=13.1.0->scrapy)
      Downloading zope.interface-4.4.3-cp36-cp36m-manylinux1_x86_64.whl (173kB)
        100% |████████████████████████████████| 174kB 20kB/s
    Collecting constantly>=15.1 (from Twisted>=13.1.0->scrapy)
      Downloading constantly-15.1.0-py2.py3-none-any.whl
    Collecting incremental>=16.10.1 (from Twisted>=13.1.0->scrapy)
      Downloading incremental-17.5.0-py2.py3-none-any.whl
    Collecting Automat>=0.3.0 (from Twisted>=13.1.0->scrapy)
      Downloading Automat-0.6.0-py2.py3-none-any.whl
    Collecting hyperlink>=17.1.1 (from Twisted>=13.1.0->scrapy)
      Downloading hyperlink-17.3.1-py2.py3-none-any.whl (73kB)
        100% |████████████████████████████████| 81kB 46kB/s
    Collecting idna>=2.1 (from cryptography>=1.9->pyOpenSSL->scrapy)
      Downloading idna-2.6-py2.py3-none-any.whl (56kB)
        100% |████████████████████████████████| 61kB 48kB/s
    Collecting asn1crypto>=0.21.0 (from cryptography>=1.9->pyOpenSSL->scrapy)
      Downloading asn1crypto-0.23.0-py2.py3-none-any.whl (99kB)
        100% |████████████████████████████████| 102kB 19kB/s
    Collecting cffi>=1.7; platform_python_implementation != "PyPy" (from cryptography>=1.9->pyOpenSSL->scrapy)
      Downloading cffi-1.11.2-cp36-cp36m-manylinux1_x86_64.whl (419kB)
        100% |████████████████████████████████| 430kB 18kB/s
    Requirement already satisfied: setuptools in /usr/local/lib/python3.6/site-packages (from zope.interface>=4.0.2->Twisted>=13.1.0->scrapy)
    Collecting pycparser (from cffi>=1.7; platform_python_implementation != "PyPy"->cryptography>=1.9->pyOpenSSL->scrapy)
      Downloading pycparser-2.18.tar.gz (245kB)
        100% |████████████████████████████████| 256kB 62kB/s
    Installing collected packages: PyDispatcher, six, w3lib, lxml, cssselect, parsel, idna, asn1crypto, pycparser, cffi, cryptography, pyOpenSSL, attrs, pyasn1, pyasn1-modules, service-identity, queuelib, zope.interface, constantly, incremental, Automat, hyperlink, Twisted, scrapy
      Running setup.py install for PyDispatcher ... done
      Running setup.py install for pycparser ... done
      Running setup.py install for Twisted ... done
    Successfully installed Automat-0.6.0 PyDispatcher-2.0.5 Twisted-17.9.0 asn1crypto-0.23.0 attrs-17.2.0 cffi-1.11.2 constantly-15.1.0 cryptography-2.1.2 cssselect-1.0.1 hyperlink-17.3.1 idna-2.6 incremental-17.5.0 lxml-4.1.0 parsel-1.2.0 pyOpenSSL-17.3.0 pyasn1-0.3.7 pyasn1-modules-0.1.5 pycparser-2.18 queuelib-1.4.2 scrapy-1.4.0 service-identity-17.0.0 six-1.11.0 w3lib-1.18.0 zope.interface-4.4.3
    [root@localhost /]# python3
    Python 3.6.3 (default, Oct 25 2017, 10:18:57)
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import scrapy
    >>>
    

      

  • 相关阅读:
    Android Studio打包过程和应用安装过程
    MVP模式和Clean模式
    Gradle入门学习---认识buildeTypes和dependencies
    微信小程序官方DEMO解读
    从ListView逐步演变到RecyclerView
    Mac下如何配置环境变量
    Android上滑手势触发和不增加布局层级扩大点击区域
    寻找Fragment的替代品的尝试
    高效的策略模式设计方法
    利用ListView的基本方法实现效果
  • 原文地址:https://www.cnblogs.com/lei0213/p/7727120.html
Copyright © 2020-2023  润新知