• XMLREADER/DOM/SIMPLEXML 解析大文件


    DOM和simplexml处理xml非常的灵活方便,它们的内存组织结构与xml文件格式很相近。但是同时它们也有一个缺点,对于大文件处理起来力不从心,太耗内存了。

    还好有xmlreader,基于流的解析器,(什么是基于流)。它可以对于xml大文件进行解析,采用一边读取一边解析的方法,而不是一股脑儿都加载到内存去处理。但是它也有缺点,不够灵活方便(这是DOM和simplexml擅长的)。

    那些把他们结合起来,不就可以很好的解析大文件了吗? 我写了一个简单的类实现了一点点鸡肋般的功能。

    xml文件

    <?xml version='1.0' standalone='yes'?>
    <movies>
     <movie>
      <title>PHP: Behind the Parser</title>
      <characters>
       <character>
        <name>Ms. Coder</name>
        <actor>Onlivia Actora</actor>
       </character>
       <character>
        <name>Mr. Coder</name>
        <actor>El Act&#211;r</actor>
       </character>
      </characters>
      <plot>
       So, this language. It's like, a programming language. Or is it a
       scripting language? All is revealed in this thrilling horror spoof
       of a documentary.
      </plot>
      <great-lines>
       <line>PHP solves all my web problems</line>
      </great-lines>
      <rating type="thumbs">7</rating>
      <rating type="stars">5</rating>
    </movie>
    <
    movie> <title>PHP: Behind the Parser</title> <characters> <character> <name>Ms. Coder</name> <actor>Onlivia Actora</actor> </character> <character> <name>Mr. Coder</name> <actor>El Act&#211;r</actor> </character> </characters> <plot> So, this language. It's like, a programming language. Or is it a scripting language? All is revealed in this thrilling horror spoof of a documentary. </plot> <great-lines> <line>PHP solves all my web problems</line> </great-lines> <rating type="thumbs">7</rating> <rating type="stars">5</rating> </movie> </movies>

    实现类

    class SimpleXmlReader extends XMLReader{
    
        public function __construct($source, $isfile = false){
            if($isfile){
                $this->open($source);
            }else{
                $this->XML($source);
            }
        }
    
        public function getElement($nodename, $depth = 0){
            if($this->localName == $nodename && $this->nodeType == self::ELEMENT){
                if(!$depth || ($depth && $depth == $this->depth)){
                    $this->next();
                }
            }
            while($this->read()){
                if($this->localName == $nodename && $this->nodeType == self::ELEMENT){
                    if(!$depth || ($depth && $depth == $this->depth)){
                        return true;
                    }
                }
            }
            return false;
        }
    
        public function expandNodeToSimpleXml(){
           if($this->nodeType == self::ELEMENT){
               $node = $this->expand();
               $dom = new DomDocument();
               $n = $dom->importNode($node, true);
               $sxe = simplexml_import_dom($n);
               return $sxe;
           }
           return false;
        }
    }
    

     实例代码:

    $xmlhl = new SimpleXmlReader('test.xml', true);
    while($xmlhl->getElement('movie')){
        $sxe = $xmlhl->expandNodeToSimpleXml();
        foreach($sxe->characters[0] as $character){
            echo "
     name -> "  . $character->name;
            echo "
     actor -> " . $character->actor;
        }
    }
    

     结构:

     name -> Ms. Coder
     actor -> Onlivia Actora
     name -> Mr. Coder
     actor -> El ActÓr
     name -> Ms. Coder
     actor -> Onlivia Actora
     name -> Mr. Coder
     actor -> El ActÓr
  • 相关阅读:
    iOS之蓝牙开发—CoreBluetooth详解
    iOS-GCD使用详解
    iOS—Mask属性的使用
    idea导入eclipse中的maven项目
    SQL Server 查找字符串中指定字符出现的次数
    lLinux的常用命令
    从excel表中生成批量SQL
    ORA-00911: invalid character 错误解决
    sqlserver sp_who2和inputbuffer的使用,连接数
    如果存在这个表,则删除这个表的sql
  • 原文地址:https://www.cnblogs.com/mengzhongshi/p/3433229.html
Copyright © 2020-2023  润新知