• python学习之路——基础篇(3)模块(续)


    re正则表达式、shutil、ConfigParser、xml

    一、re

    • 正则元字符和语法:
    语法 说明   表达式 完全匹配字符
              字符
    一般字符   匹配自身 abc   abc
    . 匹配除换行符“ ”外,任意一个字符 a.c abc
           转义字符,将特殊字符转义为本身

    a.c

    a\c

    a.c

    ac

    [...]

    匹配字符集任意一个字符,或者“-”表示一个集合范围

    如:[a-zA-Z0-9]匹配范围中任意一个字符;或者[^]

    匹配否定,对括号中的内容取反。

    [abc]efg

    aefg

    befg

    cefg

             预定义字符集
    d 数字:[0-9] adc a1c
    D 非数字:[^d] aDc abc
    s 空白字符:[<空格> fv] asc a c
    S 非空白字符:[^s] aSc abc
    w 字符:[a-zA-Z0-9_]   awc abc
    W  非单词字符:[^w] aWc a c
                               数量词
    *   匹配前一个字符0次或无数次 a*b

    aab

    ab

    b

    +   匹配前一个字符1次或无数次 a+b

    aab

    aaaab

    ? 匹配前一个字符0次或1次 a?b

    b

    ab

    {m} 匹配前一个字符m次 a{2}c aac
    {m,n}

    匹配前一个字符m次到n次。m与n可以省略

    如果省略m,表示0次到n次;省略n表示从m次到

    无数次

    a{1,2}c

    ac

    aac

    *? +? ??

    {m,n}?

    使*、+、?、{m,n}变为非贪婪模式 见后文
                        边界匹配
    ^ 匹配字符串开头 ^abc abc
    $ 匹配字符串结尾 abc$ abc
    A 仅匹配字符串开头 Aabc abc
     仅匹配字符串结尾 abc abc
    
    匹配一个单词边界,也就是指单词和空格间的位置。例如,“er”可以匹配“never”中的“er”,但不能匹配“verb”中的“er”
    ab ab
    B
    匹配非单词边界。“erB”能匹配“verb”中的“er”,但不能匹配“never”中的“er”
    abBc abc
                        逻辑与分组
    |

    代表左右表达式任意匹配一个。

    它总是先匹配左边的,一旦匹配成功,则跳过右边表达式。

    如果|没有被包含在()中,他的范围将是整个表达式。

    abc|def

    abc

    def

    ()

    被括号括起来的表达式将视为分组。

    从表达式左边开始,每遇到一个分组的左括号“(“,编号+1

    分组表达式作为一个整体,可以后接数量词。表达式中|仅在分组中生效。

    (abc){2}

    (abc|bcd)

    abcabc

    abc

    (?P<name>...) 分组,除了原有编号外,再指定一个别名。group(1)=group(name) (?P<id>abc) abc
    (?P=name)引用别名为name的分组匹配到的字符串(?P<id>123)abc(?P=id)123abc123
    • 数量词的贪婪模式与非贪婪模式

      正则表达式通常用于在文本中查找匹配的字符串。Python里数量词默认是贪婪的(在少数语言里也可能是默认非贪婪),总是尝试匹配尽可能多的字符;非贪婪的则相反,总是尝试匹配尽可能少的字符。例如:正则表达式"ab*"如果用于查找"abbbc",将找到"abbb"。而如果使用非贪婪的数量词"ab*?",将找到"a"。

    • 反斜杠的困扰

      与大多数编程语言相同,正则表达式里使用""作为转义字符,这就可能造成反斜杠困扰。假如你需要匹配文本中的字符"",那么使用编程语言表示的正则表达式里将需要4个反斜杠"\\":前两个和后两个分别用于在编程语言里转义成反斜杠,转换成两个反斜杠后再在正则表达式里转义成一个反斜杠。Python里的原生字符串很好地解决了这个问题,这个例子中的正则表达式可以使用r"\"表示。同样,匹配一个数字的"\d"可以写成r"d"。有了原生字符串,你再也不用担心是不是漏写了反斜杠,写出来的表达式也更直观。

    • re相关匹配方法
    1. match

        match,从起始位置开始匹配,匹配成功返回一个对象,未匹配成功返回None

        

     1 import re
     2 
     3 text="the Attila the Hun show"
     4 m = re.match(".",text)
     5 print(m.group()) #"t" 或者group(0)取得匹配所有结果
     6 
     7 m = re.match("(.)(.)(.)",text)
     8 print(m.group(0)) #"the"
     9 
    10 #分组
    11 print(m.group(1,2,3)) #('t','h','e') 匹配后得分组
    12 
    13 #将正则编译成Pattern对象
    14 pattern = re.compile(".") 
    15 m = pattern.match(text)
    16 print(m.group()) #'t'

      2. search

        search, 浏览整个字符串去匹配第一个,未匹配成功返回None

    1 import re
    2 text = "Example 3:there is 1 date 11/5/2016 in here"
    3 m = re.search("(d{1,2})/(d{1,2})/(d{2,4})",text)
    4 print(m.group(1),m.group(2),m.group(3))# 11 5 2016

         3. sub

        替换匹配成功的指定位置字符串

     1 import re
     2 # sub(pattern, repl, string, count=0, flags=0)
     3 # pattern: 正则模型
     4 # repl   : 要替换的字符串或可执行对象
     5 # string : 要匹配的字符串
     6 # count  : 指定匹配个数
     7 # flags  : 匹配模式
     8 text = "you're no fun anymore fun"
     9 m = re.sub("fun","entertaining",text,2)
    10 print(m)
    11 # "you're no entertaining anymore entertaining"

        4. spilt

              根据正则匹配分隔字符串

    import re
    # split(pattern, string, maxsplit=0, flags=0)
    # pattern: 正则模型
    # string : 要匹配的字符串
    # maxsplit:指定分割个数
    # flags  : 匹配模式
    
    # 无分组
    origin = "hello alex bcd alex lge alex acd 19"
    r = re.split("alex", origin, 1)
    print(r) #["hello","bcd alex lge alex acd 19"]
    
    # 有分组
    origin = "hello alex bcd alex lge alex acd 19"
    r1 = re.split("(alex)", origin, 1)
    print(r1) # ["hello","alex","bcd alex lge alex acd 19"]
    
     r2 = re.split("(al(ex))", origin, 1)
    print(r2) # ["hello","alex","ex","bcd alex lge alex acd 19"]

        5. findall

        获取非重复的匹配列表;如果有一个组则以列表形式返回,且每一个匹配均是字符串;如果模型中有多个组,则以列表形式返回,且每一个匹配均是元祖;

        空的匹配也会包含在结果中
    1 # 无分组
    2 origin = "hello alex bcd abcd lge acd 19"
    3 r = re.findall("aw+",origin)
    4 print(r) # ["alex","abcd","acd"]
    5 
    6 # 有分组
    7 origin = "hello alex bcd abcd lge acd 19"
    8 r = re.findall("a((w*)c)(d)", origin)
    9 print(r) # 匹配两个字符串"abcd"&"acd"先将匹配最外层分组的元素放入元祖#中,再将内层分组匹配的元素放入元祖中结果[("bc","b","d"),("c","","d")] 
    IP:
    ^(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}$
    手机号:
    ^1[3|4|5|8][0-9]d{8}$
    邮箱:
    [a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(.[a-zA-Z0-9_-]+)+

    二、shutil

    高级文件、文件夹、压缩包处理模块

    1. 将文件内容拷贝到另一个文件

        shutil.copyfileobj(fsrc, fdst[, length])

    1 import shutil
    2 shutil.copyfileobj(open('old.xml','r'), open('new.xml', 'w'))

      2. 拷贝文件

        shutil.copyfile(src, dst)

    1 shutil.copyfile('f1.log', 'f2.log')

      3. 仅拷贝权限。内容、组、用户均不变

        shutil.copymode(src, dst)

    1 shutil.copymode('f1.log', 'f2.log')

      4. 仅拷贝状态的信息,包括:mode bits, atime, mtime, flags

        shutil.copystat(src, dst)

    1 shutil.copystat('f1.log', 'f2.log')

      5. 拷贝文件和权限

        shutil.copy(src, dst)

    1 shutil.copy('f1.log', 'f2.log')

      6. 拷贝文件和状态信息

        shutil.copy2(src, dst)

    1 shutil.copy2('f1.log', 'f2.log')

      7. 递归的去拷贝文件夹

        shutil.ignore_patterns(*patterns) 忽略某些格式文件
        shutil.copytree(src, dst, symlinks=False, ignore=None)

    1 import shutil
    2 shutil.copytree('folder1', 'folder2', ignore=shutil.ignore_patterns('*.pyc', 'tmp*'))
    1 import shutil
    2 shutil.copytree('f1', 'f2', symlinks=True, ignore=shutil.ignore_patterns('*.pyc', 'tmp*'))

     8. 递归的去删除文件

        shutil.rmtree(path[, ignore_errors[, onerror]])

    1 import shutil
    2 shutil.rmtree('folder1')

      9. 递归的去移动文件,它类似mv命令,其实就是重命名。

        shutil.move(src, dst)

    1 import shutil
    2 shutil.move('folder1', 'folder3')

      10. 创建压缩包并返回文件路径,例如:zip、tar

        shutil.make_archive(base_name, format,...) 这个功能只能压缩一个文件夹

    • base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,如:www =>保存至当前路径如:/Users/lcy/www =>保存至/Users/lcy/
    • format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”
    • root_dir: 要压缩的文件夹路径(默认当前目录)
    • owner: 用户,默认当前用户
    • group: 组,默认当前组
    • logger: 用于记录日志,通常是logging.Logger对象  
    #将 /Users/lcy/Downloads/test 下的文件打包放置当前程序目录
    import shutil
    ret = shutil.make_archive("www", 'gztar', root_dir='/Users/lcy/Downloads/test')
    #将 /Users/lcy/Downloads/test 下的文件打包放置 /Users/lcy/目录
    import shutil
    ret = shutil.make_archive("/Users/lcy/www", 'gztar', root_dir='/Users/lcy/Downloads/test')

    附加:ZipFile 和 TarFile一般用这个较多

     1 import zipfile
     2 
     3 # 压缩
     4 z = zipfile.ZipFile('laxi.zip', 'w') #创建一个压缩包 如果以“a”模式打开 追加 在已存在追加文件放入压缩包
     5 z.write('a.log') #将文件写到这个压缩包中
     6 z.write('data.data')
     7 z.close()
     8 # 解压
     9 z = zipfile.ZipFile('laxi.zip', 'r')
    10 z.extractall() #解压全部
    11 z.close()
    1 # 解压 单个文件
    2 z= zipfile.ZipFile("la.zip","r")
    3 for item in z.namelist(): # 将打印出压缩包中成员文件
    4      print(item)
    5 z.extract(member) #根据item 解压某个文件名
    6 z.close()

    tarfile模块

     1 import tarfile
     2 
     3 # 压缩
     4 tar = tarfile.open('your.tar','w')
     5 tar.add('/Users/wupeiqi/PycharmProjects/bbs2.log', arcname='bbs3.log')# arcname 改压缩文件的名字
     6 tar.add('/Users/wupeiqi/PycharmProjects/cmdb.log', arcname='cmdb2.log')
     7 tar.close()
     8 
     9 # 解压
    10 tar = tarfile.open('your.tar','r')
    11 tar.extractall()  # 可设置解压地址 
    #tar.getmembers() 来获取压缩包中的成员,返回是所有成员对象类型为tarfile.TarInfo;获取某个文件名对象obj=tar.getmeber("文件名")然后tar.extract(obj)解压单个文件
    12 tar.close()

    三、ConfigParse

      configparser用于处理特定格式的文件,其本质上是利用open来操作文件。

    #指定格式文件如下
    [section1] # 节点 k1 = v1 # 值 k2:v2 # 值 [section2] # 节点 k1 = v1 # 值
    1. 获取所有节点
    1 import configparser
    2  
    3 config = configparser.ConfigParser()
    4 config.read('xxxooo', encoding='utf-8')
    5 ret = config.sections()
    6 print(ret)

        2. 获取指定节点下所有的键值对

    1 import configparser
    2  
    3 config = configparser.ConfigParser()
    4 config.read('conf', encoding='utf-8')
    5 ret = config.items('section1')

      3. 获取指定节点下所有的建

    import configparser
     
    config = configparser.ConfigParser()
    config.read('conf', encoding='utf-8')
    ret = config.options('section1')
    print(ret)

      4. 获取指定节点下指定key的值

     1 import configparser
     2  
     3 config = configparser.ConfigParser()
     4 config.read('conf', encoding='utf-8')
     5  
     6  
     7 v = config.get('section1', 'k1')
     8 # v = config.getint('section1', 'k1')
     9 # v = config.getfloat('section1', 'k1')
    10 # v = config.getboolean('section1', 'k1')

      5. 检查、删除、添加节点

     1 import configparser
     2  
     3 config = configparser.ConfigParser()
     4 config.read('conf', encoding='utf-8')
     5  
     6  
     7 # 检查
     8 has_sec = config.has_section('section1')
     9 print(has_sec)
    10  
    11 # 添加节点
    12 config.add_section("SEC_1")
    13 config.write(open('conf', 'w'))
    14  
    15 # 删除节点
    16 config.remove_section("SEC_1")
    17 config.write(open('conf', 'w'))

      6. 检查、删除、设置指定组内的键值对

    import configparser
     
    config = configparser.ConfigParser()
    config.read('conf', encoding='utf-8')
     
    # 检查
    has_opt = config.has_option('section1', 'k1')
    print(has_opt)
     
    # 删除
    config.remove_option('section1', 'k1')
    config.write(open('conf', 'w'))
     
    # 设置
    config.set('section1', 'k10', "123")
    config.write(open('conf', 'w')) # 从内存写到文件

    四、XML

    xml文件格式:

    <data>
        <country name="Liechtenstein">
            <rank updated="yes">2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
        <country name="Singapore">
            <rank updated="yes">5</rank>
            <year>2026</year>
            <gdppc>59900</gdppc>
            <neighbor direction="N" name="Malaysia" />
        </country>
        <country name="Panama">
            <rank updated="yes">69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    
      1 class Element:
      2     """An XML element.
      3 
      4     This class is the reference implementation of the Element interface.
      5 
      6     An element's length is its number of subelements.  That means if you
      7     want to check if an element is truly empty, you should check BOTH
      8     its length AND its text attribute.
      9 
     10     The element tag, attribute names, and attribute values can be either
     11     bytes or strings.
     12 
     13     *tag* is the element name.  *attrib* is an optional dictionary containing
     14     element attributes. *extra* are additional element attributes given as
     15     keyword arguments.
     16 
     17     Example form:
     18         <tag attrib>text<child/>...</tag>tail
     19 
     20     """
     21 
     22     当前节点的标签名
     23     tag = None
     24     """The element's name."""
     25 
     26     当前节点的属性
     27 
     28     attrib = None
     29     """Dictionary of the element's attributes."""
     30 
     31     当前节点的内容
     32     text = None
     33     """
     34     Text before first subelement. This is either a string or the value None.
     35     Note that if there is no text, this attribute may be either
     36     None or the empty string, depending on the parser.
     37 
     38     """
     39 
     40     tail = None
     41     """
     42     Text after this element's end tag, but before the next sibling element's
     43     start tag.  This is either a string or the value None.  Note that if there
     44     was no text, this attribute may be either None or an empty string,
     45     depending on the parser.
     46 
     47     """
     48 
     49     def __init__(self, tag, attrib={}, **extra):
     50         if not isinstance(attrib, dict):
     51             raise TypeError("attrib must be dict, not %s" % (
     52                 attrib.__class__.__name__,))
     53         attrib = attrib.copy()
     54         attrib.update(extra)
     55         self.tag = tag
     56         self.attrib = attrib
     57         self._children = []
     58 
     59     def __repr__(self):
     60         return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self))
     61 
     62     def makeelement(self, tag, attrib):
     63         创建一个新节点
     64         """Create a new element with the same type.
     65 
     66         *tag* is a string containing the element name.
     67         *attrib* is a dictionary containing the element attributes.
     68 
     69         Do not call this method, use the SubElement factory function instead.
     70 
     71         """
     72         return self.__class__(tag, attrib)
     73 
     74     def copy(self):
     75         """Return copy of current element.
     76 
     77         This creates a shallow copy. Subelements will be shared with the
     78         original tree.
     79 
     80         """
     81         elem = self.makeelement(self.tag, self.attrib)
     82         elem.text = self.text
     83         elem.tail = self.tail
     84         elem[:] = self
     85         return elem
     86 
     87     def __len__(self):
     88         return len(self._children)
     89 
     90     def __bool__(self):
     91         warnings.warn(
     92             "The behavior of this method will change in future versions.  "
     93             "Use specific 'len(elem)' or 'elem is not None' test instead.",
     94             FutureWarning, stacklevel=2
     95             )
     96         return len(self._children) != 0 # emulate old behaviour, for now
     97 
     98     def __getitem__(self, index):
     99         return self._children[index]
    100 
    101     def __setitem__(self, index, element):
    102         # if isinstance(index, slice):
    103         #     for elt in element:
    104         #         assert iselement(elt)
    105         # else:
    106         #     assert iselement(element)
    107         self._children[index] = element
    108 
    109     def __delitem__(self, index):
    110         del self._children[index]
    111 
    112     def append(self, subelement):
    113         为当前节点追加一个子节点
    114         """Add *subelement* to the end of this element.
    115 
    116         The new element will appear in document order after the last existing
    117         subelement (or directly after the text, if it's the first subelement),
    118         but before the end tag for this element.
    119 
    120         """
    121         self._assert_is_element(subelement)
    122         self._children.append(subelement)
    123 
    124     def extend(self, elements):
    125         为当前节点扩展 n 个子节点
    126         """Append subelements from a sequence.
    127 
    128         *elements* is a sequence with zero or more elements.
    129 
    130         """
    131         for element in elements:
    132             self._assert_is_element(element)
    133         self._children.extend(elements)
    134 
    135     def insert(self, index, subelement):
    136         在当前节点的子节点中插入某个节点,即:为当前节点创建子节点,然后插入指定位置
    137         """Insert *subelement* at position *index*."""
    138         self._assert_is_element(subelement)
    139         self._children.insert(index, subelement)
    140 
    141     def _assert_is_element(self, e):
    142         # Need to refer to the actual Python implementation, not the
    143         # shadowing C implementation.
    144         if not isinstance(e, _Element_Py):
    145             raise TypeError('expected an Element, not %s' % type(e).__name__)
    146 
    147     def remove(self, subelement):
    148         在当前节点在子节点中删除某个节点
    149         """Remove matching subelement.
    150 
    151         Unlike the find methods, this method compares elements based on
    152         identity, NOT ON tag value or contents.  To remove subelements by
    153         other means, the easiest way is to use a list comprehension to
    154         select what elements to keep, and then use slice assignment to update
    155         the parent element.
    156 
    157         ValueError is raised if a matching element could not be found.
    158 
    159         """
    160         # assert iselement(element)
    161         self._children.remove(subelement)
    162 
    163     def getchildren(self):
    164         获取所有的子节点(废弃)
    165         """(Deprecated) Return all subelements.
    166 
    167         Elements are returned in document order.
    168 
    169         """
    170         warnings.warn(
    171             "This method will be removed in future versions.  "
    172             "Use 'list(elem)' or iteration over elem instead.",
    173             DeprecationWarning, stacklevel=2
    174             )
    175         return self._children
    176 
    177     def find(self, path, namespaces=None):
    178         获取第一个寻找到的子节点
    179         """Find first matching element by tag name or path.
    180 
    181         *path* is a string having either an element tag or an XPath,
    182         *namespaces* is an optional mapping from namespace prefix to full name.
    183 
    184         Return the first matching element, or None if no element was found.
    185 
    186         """
    187         return ElementPath.find(self, path, namespaces)
    188 
    189     def findtext(self, path, default=None, namespaces=None):
    190         获取第一个寻找到的子节点的内容
    191         """Find text for first matching element by tag name or path.
    192 
    193         *path* is a string having either an element tag or an XPath,
    194         *default* is the value to return if the element was not found,
    195         *namespaces* is an optional mapping from namespace prefix to full name.
    196 
    197         Return text content of first matching element, or default value if
    198         none was found.  Note that if an element is found having no text
    199         content, the empty string is returned.
    200 
    201         """
    202         return ElementPath.findtext(self, path, default, namespaces)
    203 
    204     def findall(self, path, namespaces=None):
    205         获取所有的子节点
    206         """Find all matching subelements by tag name or path.
    207 
    208         *path* is a string having either an element tag or an XPath,
    209         *namespaces* is an optional mapping from namespace prefix to full name.
    210 
    211         Returns list containing all matching elements in document order.
    212 
    213         """
    214         return ElementPath.findall(self, path, namespaces)
    215 
    216     def iterfind(self, path, namespaces=None):
    217         获取所有指定的节点,并创建一个迭代器(可以被for循环)
    218         """Find all matching subelements by tag name or path.
    219 
    220         *path* is a string having either an element tag or an XPath,
    221         *namespaces* is an optional mapping from namespace prefix to full name.
    222 
    223         Return an iterable yielding all matching elements in document order.
    224 
    225         """
    226         return ElementPath.iterfind(self, path, namespaces)
    227 
    228     def clear(self):
    229         清空节点
    230         """Reset element.
    231 
    232         This function removes all subelements, clears all attributes, and sets
    233         the text and tail attributes to None.
    234 
    235         """
    236         self.attrib.clear()
    237         self._children = []
    238         self.text = self.tail = None
    239 
    240     def get(self, key, default=None):
    241         获取当前节点的属性值
    242         """Get element attribute.
    243 
    244         Equivalent to attrib.get, but some implementations may handle this a
    245         bit more efficiently.  *key* is what attribute to look for, and
    246         *default* is what to return if the attribute was not found.
    247 
    248         Returns a string containing the attribute value, or the default if
    249         attribute was not found.
    250 
    251         """
    252         return self.attrib.get(key, default)
    253 
    254     def set(self, key, value):
    255         为当前节点设置属性值
    256         """Set element attribute.
    257 
    258         Equivalent to attrib[key] = value, but some implementations may handle
    259         this a bit more efficiently.  *key* is what attribute to set, and
    260         *value* is the attribute value to set it to.
    261 
    262         """
    263         self.attrib[key] = value
    264 
    265     def keys(self):
    266         获取当前节点的所有属性的 key
    267 
    268         """Get list of attribute names.
    269 
    270         Names are returned in an arbitrary order, just like an ordinary
    271         Python dict.  Equivalent to attrib.keys()
    272 
    273         """
    274         return self.attrib.keys()
    275 
    276     def items(self):
    277         获取当前节点的所有属性值,每个属性都是一个键值对
    278         """Get element attributes as a sequence.
    279 
    280         The attributes are returned in arbitrary order.  Equivalent to
    281         attrib.items().
    282 
    283         Return a list of (name, value) tuples.
    284 
    285         """
    286         return self.attrib.items()
    287 
    288     def iter(self, tag=None):
    289         在当前节点的子孙中根据节点名称寻找所有指定的节点,并返回一个迭代器(可以被for循环)。
    290         """Create tree iterator.
    291 
    292         The iterator loops over the element and all subelements in document
    293         order, returning all elements with a matching tag.
    294 
    295         If the tree structure is modified during iteration, new or removed
    296         elements may or may not be included.  To get a stable set, use the
    297         list() function on the iterator, and loop over the resulting list.
    298 
    299         *tag* is what tags to look for (default is to return all elements)
    300 
    301         Return an iterator containing all the matching elements.
    302 
    303         """
    304         if tag == "*":
    305             tag = None
    306         if tag is None or self.tag == tag:
    307             yield self
    308         for e in self._children:
    309             yield from e.iter(tag)
    310 
    311     # compatibility
    312     def getiterator(self, tag=None):
    313         # Change for a DeprecationWarning in 1.4
    314         warnings.warn(
    315             "This method will be removed in future versions.  "
    316             "Use 'elem.iter()' or 'list(elem.iter())' instead.",
    317             PendingDeprecationWarning, stacklevel=2
    318         )
    319         return list(self.iter(tag))
    320 
    321     def itertext(self):
    322         在当前节点的子孙中根据节点名称寻找所有指定的节点的内容,并返回一个迭代器(可以被for循环)。
    323         """Create text iterator.
    324 
    325         The iterator loops over the element and all subelements in document
    326         order, returning all inner text.
    327 
    328         """
    329         tag = self.tag
    330         if not isinstance(tag, str) and tag is not None:
    331             return
    332         if self.text:
    333             yield self.text
    334         for e in self:
    335             yield from e.itertext()
    336             if e.tail:
    337                 yield e.tail
    338 
    339 节点功能一览表
    func
    1. 解析xml文件  
    • 解析文件为xml对象
    1 from xml.etree import ElementTree as ET
    2 
    3 # 直接解析xml文件
    4 tree = ET.parse("xo.xml")
    5 
    6 # 获取xml文件的根节点
    7 root = tree.getroot()
    • 解析字符串
    1 from xml.etree import ElementTree as ET
    2 
    3 
    4 # 打开文件,读取XML内容
    5 str_xml = open('xo.xml', 'r').read()
    6 
    7 # 将字符串解析成xml特殊对象,root代指xml文件的根节点
    8 root = ET.XML(str_xml)

       2. 操作xml

    •   遍历xml所有节点
     1 from xml.etree import  ElementTree as ET
     2 
     3 ## 打开xml文件 通过返回对象tree进行相关操作
     4 tree = ET.parse("xo.xml")   tree =>ElementTree
     5 #
     6 ##获取根节点 root可以获取相关节点属性 tag为根节点名字 节点属性attrib  标签中间的值<xml>text</xml>
     7 root = tree.getroot()
     8 print(root,root.tag,root.attrib)
     9 #
    10 #遍历节点
    11 for child in root:
    12     print(child, child.tag, child.attrib)
    13     for grandechild in child:
    14         print(grandechild, grandechild.tag, grandechild.attrib,grandechild.text)
    • 遍历指定节点,删除修改等
     1   # 解析字符串形式 得到一个 xml对象
     2 # 打开文件,读取XML内容
     3 str_xml = open('xo.xml', 'r').read()
     4 
     5 # 将字符串解析成xml特殊对象,root代指xml文件的根节点
     6 root = ET.XML(str_xml)
     7 
     8 ############ 操作 ############
     9 
    10 # 顶层标签
    11 print(root.tag)
    12 
    13 # 循环所有的year节点
    14 for node in root.iter('year'):
    15     # 将year节点中的内容自增一
    16     new_year = int(node.text) + 1
    17     node.text = str(new_year)
    18 
    19     # 设置属性
    20     node.set('name', 'alex')
    21     node.set('age', '18')
    22     # 删除属性
    23     del node.attrib['name']

    ############ 保存文件 ############
    tree = ET.ElementTree(root)
    tree.write("newnew.xml", encoding='utf-8') # 只有tree 才有write 写操作 《=所以无论是文件操作还是字符串形式xml

      3. 创建xml

    • SubElement
    • Element与append也可以创建
    • makeElement与append 也可以自己创建
     1 from xml.etree import ElementTree as ET
     2 
     3 from xml.dom import minidom
     4 
     5 
     6 def prettify(elem):
     7     """将节点转换成字符串,并添加缩进。
     8     """
     9     rough_string = ET.tostring(elem, 'utf-8')
    10     reparsed = minidom.parseString(rough_string)
    11     return reparsed.toprettyxml(indent="	")
    12 # 创建根节点
    13 root = ET.Element("famliy")
    14 
    15 
    16 # 创建节点大儿子
    17 son1 = ET.SubElement(root, "son", attrib={'name': '儿1'})
    18 # 创建小儿子
    19 son2 = ET.SubElement(root, "son", attrib={"name": "儿2"})
    20 
    21 # 在大儿子中创建一个孙子
    22 grandson1 = ET.SubElement(son1, "age", attrib={'name': '儿11'})
    23 grandson1.text = '孙子'
    24 
    25 #
    26 #et = ET.ElementTree(root)  #生成文档对象
    27 #et.write("test.xml", encoding="utf-8", xml_declaration=True, short_empty_elements=False)
    28 
    29 # 调用缩进函数 所以得到的字符串就包含了缩进
    30 raw_str = prettify(root)
    31 
    32 f = open("xxxoo.xml",'w',encoding='utf-8')
    33 f.write(raw_str)
    34 f.close()

     makeelement

     1 from xml.etree import ElementTree as ET
     2 
     3 # 创建根节点
     4 root = ET.Element("famliy")
     5 
     6 
     7 # 创建大儿子
     8 # son1 = ET.Element('son', {'name': '儿1'})
     9 son1 = root.makeelement('son', {'name': '儿1'})
    10 # 创建小儿子
    11 # son2 = ET.Element('son', {"name": '儿2'})
    12 son2 = root.makeelement('son', {"name": '儿2'})
    13 
    14 # 在大儿子中创建两个孙子
    15 # grandson1 = ET.Element('grandson', {'name': '儿11'})
    16 grandson1 = son1.makeelement('grandson', {'name': '儿11'})
    17 # grandson2 = ET.Element('grandson', {'name': '儿12'})
    18 grandson2 = son1.makeelement('grandson', {'name': '儿12'})
    19 
    20 son1.append(grandson1)
    21 son1.append(grandson2)
    22 
    23 
    24 # 把儿子添加到根节点中
    25 root.append(son1)
    26 root.append(son1)
    27 
    28 tree = ET.ElementTree(root)
    29 tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)
    makeelement

    Element

     1 from xml.etree import ElementTree as ET
     2 
     3 
     4 # 创建根节点
     5 root = ET.Element("famliy")
     6 
     7 
     8 # 创建节点大儿子
     9 son1 = ET.Element('son', {'name': '儿1'})
    10 # 创建小儿子
    11 son2 = ET.Element('son', {"name": '儿2'})
    12 
    13 # 在大儿子中创建两个孙子
    14 grandson1 = ET.Element('grandson', {'name': '儿11'})
    15 grandson2 = ET.Element('grandson', {'name': '儿12'})
    16 son1.append(grandson1)
    17 son1.append(grandson2)
    18 
    19 
    20 # 把儿子添加到根节点中
    21 root.append(son1)
    22 root.append(son1)
    23 
    24 tree = ET.ElementTree(root)
    25 tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)
    Element
  • 相关阅读:
    第三次迭代目标
    UML用例图以及时序图
    第一次迭代目标完成情况及感想
    第四次迭代感想
    数据流图与数据流程图的区别
    第三次迭代感想
    速达的用例图与时序图
    第一次迭代的感想
    速达的WBS
    NABCD的分解
  • 原文地址:https://www.cnblogs.com/lcysen/p/6032932.html
Copyright © 2020-2023  润新知