• Python XML解析和处理


    movies.xml

    <collection shelf = "New Arrivals">
    <movie title = "Enemy Behind">
       <type>War, Thriller</type>
       <format>DVD</format>
       <year>2013</year>
       <rating>PG</rating>
       <stars>10</stars>
       <description>Talk about a US-Japan war</description>
    </movie>
    <movie title = "Transformers">
       <type>Anime, Science Fiction</type>
       <format>DVD</format>
       <year>1989</year>
       <rating>R</rating>
       <stars>8</stars>
       <description>A schientific fiction</description>
    </movie>
       <movie title = "Trigun">
       <type>Anime, Action</type>
       <format>DVD</format>
       <episodes>4</episodes>
       <rating>PG</rating>
       <stars>10</stars>
       <description>Vash the Stampede!</description>
    </movie>
    <movie title = "Ishtar">
       <type>Comedy</type>
       <format>VHS</format>
       <rating>PG</rating>
       <stars>2</stars>
       <description>Viewable boredom</description>
    </movie>
    </collection>

    使用SAX API解析XML

    #!/usr/bin/python3
    
    import xml.sax
    
    class MovieHandler( xml.sax.ContentHandler ):
       def __init__(self):
          self.CurrentData = ""
          self.type = ""
          self.format = ""
          self.year = ""
          self.rating = ""
          self.stars = ""
          self.description = ""
    
       # Call when an element starts
       def startElement(self, tag, attributes):
          self.CurrentData = tag
          if tag == "movie":
             print ("*****Movie*****")
             title = attributes["title"]
             print ("Title:", title)
    
       # Call when an elements ends
       def endElement(self, tag):
          if self.CurrentData == "type":
             print ("Type:", self.type)
          elif self.CurrentData == "format":
             print ("Format:", self.format)
          elif self.CurrentData == "year":
             print ("Year:", self.year)
          elif self.CurrentData == "rating":
             print ("Rating:", self.rating)
          elif self.CurrentData == "stars":
             print ("Stars:", self.stars)
          elif self.CurrentData == "description":
             print ("Description:", self.description)
          self.CurrentData = ""
    
       # Call when a character is read
       def characters(self, content):
          if self.CurrentData == "type":
             self.type = content
          elif self.CurrentData == "format":
             self.format = content
          elif self.CurrentData == "year":
             self.year = content
          elif self.CurrentData == "rating":
             self.rating = content
          elif self.CurrentData == "stars":
             self.stars = content
          elif self.CurrentData == "description":
             self.description = content
    
    if ( __name__ == "__main__"):
    
       # create an XMLReader
       parser = xml.sax.make_parser()
       # turn off namepsaces
       parser.setFeature(xml.sax.handler.feature_namespaces, 0)
    
       # override the default ContextHandler
       Handler = MovieHandler()
       parser.setContentHandler( Handler )
    
       parser.parse("movies.xml")

    输出

    *****Movie*****
    Title: Enemy Behind
    Type: War, Thriller
    Format: DVD
    Year: 2003
    Rating: PG
    Stars: 10
    Description: Talk about a US-Japan war
    *****Movie*****
    Title: Transformers
    Type: Anime, Science Fiction
    Format: DVD
    Year: 1989
    Rating: R
    Stars: 8
    Description: A schientific fiction
    *****Movie*****
    Title: Trigun
    Type: Anime, Action
    Format: DVD
    Rating: PG
    Stars: 10
    Description: Vash the Stampede!
    *****Movie*****
    Title: Ishtar
    Type: Comedy
    Format: VHS
    Rating: PG
    Stars: 2
    Description: Viewable boredom

    使用DOM API解析XML

    #!/usr/bin/python3
    
    from xml.dom.minidom import parse
    import xml.dom.minidom
    
    # Open XML document using minidom parser
    DOMTree = xml.dom.minidom.parse("movies.xml")
    collection = DOMTree.documentElement
    if collection.hasAttribute("shelf"):
       print ("Root element : %s" % collection.getAttribute("shelf"))
    
    # Get all the movies in the collection
    movies = collection.getElementsByTagName("movie")
    
    # Print detail of each movie.
    for movie in movies:
       print ("*****Movie*****")
       if movie.hasAttribute("title"):
          print ("Title: %s" % movie.getAttribute("title"))
    
       type = movie.getElementsByTagName('type')[0]
       print ("Type: %s" % type.childNodes[0].data)
       format = movie.getElementsByTagName('format')[0]
       print ("Format: %s" % format.childNodes[0].data)
       rating = movie.getElementsByTagName('rating')[0]
       print ("Rating: %s" % rating.childNodes[0].data)
       description = movie.getElementsByTagName('description')[0]
       print ("Description: %s" % description.childNodes[0].data)

    输出

    Root element : New Arrivals
    *****Movie*****
    Title: Enemy Behind
    Type: War, Thriller
    Format: DVD
    Rating: PG
    Description: Talk about a US-Japan war
    *****Movie*****
    Title: Transformers
    Type: Anime, Science Fiction
    Format: DVD
    Rating: R
    Description: A schientific fiction
    *****Movie*****
    Title: Trigun
    Type: Anime, Action
    Format: DVD
    Rating: PG
    Description: Vash the Stampede!
    *****Movie*****
    Title: Ishtar
    Type: Comedy
    Format: VHS
    Rating: PG
    Description: Viewable boredom

  • 相关阅读:
    [usaco3.2.5]msquare
    [usaco3.2.4]ratios
    [usaco3.2.3]spin
    [文献记录] Few-shot Learning for Named Entity Recognition in Medical Text 医学文本中命名实体识别的小样本学习
    计算机保研经验分享
    文本处理、词频统计与Simhash生成文档指纹
    [知乎live笔记]如何得到好的科研Idea
    POJ 2787:算24
    POJ 2964:日历问题 日期转换+闰年月份可放在一个month[2][12]数组里
    POJ-1835 宇航员 空间方向模拟+打表
  • 原文地址:https://www.cnblogs.com/sea-stream/p/10178995.html
Copyright © 2020-2023  润新知