• Python爬虫入门遇到的坑


    1. 环境 

    - Python
      mac os预装的python 

    $ python -V  
    Python 2.7.10
    $ where python
    /usr/bin/python
    $ ls /System/Library/Frameworks/Python.framework/Versions
    2.3     2.5     2.6     2.7     Current
    $ ls /Library/Frameworks/Python.framework/Versions (用户安装的目录)

    - IDE
      Pycharm
    - 辅助
      安装pip

    sudo easy_install pip

    - Python库

    sudo pip install requests (默认安装requests 2.13.0) 
    sudo pip install BeautifulSoup (默认安装BeautifulSoup 3.2.1)
    sudo pip install lxml (默认安装lxml 3.7.3)

    2. 问题

    - 问题1

    代码:
    soup = BeautifulSoup(html, 'lxml')
    报错:
    Traceback (most recent call last):
    File "/Users/cuizhenyu/Documents/Codes/Python/DownloadMeitu/LibBeautifulSoupTest.py", line 15, in <module>
    soup = BeautifulSoup(html) #soup = BeautifulSoup(html, 'lxml')报错
    TypeError: 'module' object is not callable
    解决:
    from BeautifulSoup import BeautifulSoup

    - 问题2

    代码:
    soup = BeautifulSoup(html, 'lxml')
    报错:
    Traceback (most recent call last):
    File "/Users/cuizhenyu/Documents/Codes/Python/DownloadMeitu/LibBeautifulSoupTest.py", line 15, in <module>
    soup = BeautifulSoup(html, 'lxml') #soup = BeautifulSoup(html, 'lxml')报错
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1189, in _feed
    SGMLParser.feed(self, markup)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 138, in goahead
    k = self.parse_starttag(i)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 296, in parse_starttag
    self.finish_starttag(tag, attrs)
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 338, in finish_starttag
    self.unknown_starttag(tag, attrs)
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1338, in unknown_starttag
    self.endData()
    File "/Library/Python/2.7/site-packages/BeautifulSoup.py", line 1251, in endData
    (not self.parseOnlyThese.text or
    AttributeError: 'str' object has no attribute 'text'
    解决:
    当前BeautifulSoup是v3版,不支持lxml等,需用v4版。

     

  • 相关阅读:
    Server2012R2 ADFS3.0 The same client browser session has made '6' requests in the last '13'seconds
    pig的grunt中shell命令不稳定,能不用尽量不用
    pig脚本的参数传入,多个参数传入
    pig的cogroup详解
    pig的limit无效(返回所有记录)sample有效
    Dynamics CRM2013 任务列表添加自定义按钮
    简述浏览器渲染机制
    如何区分浏览器类型
    使用mock.js生成前端测试数据
    理解Ajax
  • 原文地址:https://www.cnblogs.com/mulisheng/p/6665350.html
Copyright © 2020-2023  润新知