• python之BeautifulSoup模块


    # 名称修改(bs4)
    from bs4 import BeautifulSoup

     帮助文档

    Beautiful Soup parses a (possibly invalid) XML or HTML document into a
    tree representation. It provides methods and Pythonic idioms that make
    it easy to navigate, search, and modify the tree.

    A well-formed XML/HTML document yields a well-formed data
    structure. An ill-formed XML/HTML document yields a correspondingly
    ill-formed data structure. If your document is only locally
    well-formed, you can use this library to find and process the
    well-formed part of it.

    Beautiful Soup works with Python 2.2 and up. It has no external
    dependencies, but you'll have more success at converting data to UTF-8
    if you also install these three packages:

    * chardet, for auto-detecting character encodings
      http://chardet.feedparser.org/
    * cjkcodecs and iconv_codec, which add more encodings to the ones supported
      by stock Python.
      http://cjkpython.i18n.org/

    Beautiful Soup defines classes for two main parsing strategies:

     * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific
       language that kind of looks like XML.

     * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid
       or invalid. This class has web browser-like heuristics for
       obtaining a sensible parse tree in the face of common HTML errors.

    Beautiful Soup also defines a class (UnicodeDammit) for autodetecting
    the encoding of an HTML or XML document, and converting it to
    Unicode. Much of this code is taken from Mark Pilgrim's Universal Feed Parser.

    For more than you ever wanted to know about Beautiful Soup, see the
    documentation:
    http://www.crummy.com/software/BeautifulSoup/documentation.html

    Here, have some legalese:

    Copyright (c) 2004-2010, Leonard Richardson

    All rights reserved.

  • 相关阅读:
    spring boot SpringApplication.run 执行过程
    算法 计算四则运算字符串结果
    算法 RingBuffer
    java BigDecimal 四舍五入
    算法 常用函数和近似
    java 多线程执行
    Java 三个线程依次输出ABC
    Java interrupt 中断
    java 垃圾收集器与内存分配策略
    软件项目与软件产品的区别
  • 原文地址:https://www.cnblogs.com/jinhh/p/8032286.html
Copyright © 2020-2023  润新知