• BeautifulSoup_python3


    1.错误排除

    bsObj = BeautifulSoup(html.read())

    报错:

     UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

    解决办法:

    bsObj = BeautifulSoup(html.read(),"html.parser")

    BeautifulSoup

    简介:通过定位HTML标签来格式化和组织复杂的网络信息,用简单的python对象来展现XML结构信息。

    python3 安装 版本4  BeautifulSoup4 (BS4) 

    运行实例:

     1 #!/usr/bin/env python
     2 # encoding: utf-8
     3 """
     4 @author: 侠之大者kamil
     5 @file: beautifulsoup.py
     6 @time: 2016/4/19 16:36
     7 """
     8 from bs4 import BeautifulSoup
     9 from urllib.request import urlopen
    10 html = urlopen('http://www.cnblogs.com/kamil/')
    11 print(type(html))
    12 bsObj = BeautifulSoup(html.read(),"html.parser") #html.read() 获取网页内容,并且传输到BeautifulSoup 对象。
    13 print(type(bsObj))
    14 print(bsObj.h1)

     第12 行注意,需要加上 "html.parser"

    结果:

    ssh://kamil@xzdz.hk:22/usr/bin/python3 -u /home/kamil/windows_python3/python3/Day11/day12/beautifulsoup.py
    <class 'http.client.HTTPResponse'>
    <class 'bs4.BeautifulSoup'>
    <h1><a class="headermaintitle" href="http://www.cnblogs.com/kamil/" id="Header1_HeaderTitle">侠之大者kamil</a></h1>
    
    Process finished with exit code 0

     官方文档

    公众号请关注:侠之大者
  • 相关阅读:
    C++ 字符串与数字之间的转换
    两种常见的模式匹配算法(代码实现)
    C++ string整行读取带空格的字符串
    JavaEE(一)开发环境搭建(JDK+Eclipse+Tomcat+Mysql+Spring)
    25java模拟容器的实现
    24java的StringBuilder类和StringBuffer类
    23java的String类常用方法
    22java的回调&内部类
    21java的抽象类和接口
    20java的组合&IDEA踩坑合集1
  • 原文地址:https://www.cnblogs.com/kamil/p/5408986.html
Copyright © 2020-2023  润新知