• 使用php simple html dom parser解析html标签


    转自:http://www.blhere.com/1243.html

    使用php simple html dom parser解析html标签
    用了一下
    PHP Simple HTML DOM Parser
    解析HTML页面,感觉还不错,它能创建一个DOM tree方便你解析html里面的内容。用来抓东西挺好的。

    附带一个例子,你也到sourceforge下载压缩包看里面的例子:
    Scraping data with PHP Simple HTML DOM Parser

    PHP Simple HTML DOM Parser , written in PHP5+, allows you to manipulate HTML in a very easy way. Supporting invalid HTML, this parser is better then other PHP scripts using complicated regexes to extract information from web pages.
    Before getting the necessary info, a DOM should be created from either URL or file. The following script extracts links & images from a website:
    view plain copy to clipboard print ?

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    // Create DOM from URL or file  
    $html = file_get_html('http://www.microsoft.com/');  
     
    // Extract links  
    foreach($html->find('a') as $element)  
           echo $element->href . '<br>';    
     
    // Extract images  
    foreach($html->find('img') as $element)  
           echo $element->src . '<br>';
    1
    2
    3
    4
    5
    6
    7
    8
    // Create DOM from URL or file
    $html = file_get_html('http://www.microsoft.com/');
    // Extract links
    foreach($html->find('a') as $element)
           echo $element->href . '<br>';
    // Extract images
    foreach($html->find('img') as $element)
           echo $element->src . '<br>';

    The parser can also be used to modify HTML elements:
    view plain copy to clipboard print ?

    1
    2
    3
    4
    5
    6
    7
    8
    9
    // Create DOM from string  
    $html = str_get_html('<div id="simple">Simple</div><div id="parser">Parser</div>');  
     
    $html->find('div', 1)->class = 'bar';  
     
    $html->find('div[id=simple]', 0)->innertext = 'Foo';  
     
    // Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>  
    echo $html;
    1
    2
    3
    4
    5
    6
    // Create DOM from string
    $html = str_get_html('<div id="simple">Simple</div><div id="parser">Parser</div>');
    $html->find('div', 1)->class = 'bar';
    $html->find('div[id=simple]', 0)->innertext = 'Foo';
    // Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
    echo $html;

    Do you wish to retrieve content without any tags?
    view plain copy to clipboard print ?

    1
    echo file_get_html('http://www.yahoo.com/')->plaintext;
    1
    echo file_get_html('http://www.yahoo.com/')->plaintext;

    In the package files of this parser ([url]http://simplehtmldom.sourceforge.net/[/url]) you can find some scraping examples from digg, imdb, slashdot. Let’s create one that extracts the first 10 results (titles only) for the keyword “php” from Google:
    view plain copy to clipboard print ?

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    $url = 'http://www.google.com/search?hl=en&q=php&btnG=Search';  
     
    // Create DOM from URL  
    $html = file_get_html($url);  
     
    // Match all 'A' tags that have the class attribute equal with 'l'  
    foreach($html->find('a[class=l]') as $key => $info)  
    {  
    echo ($key + 1).'. '.$info->plaintext."<br /> ";  
    }
    1
    2
    3
    4
    5
    6
    7
    8
    $url = 'http://www.google.com/search?hl=en&q=php&btnG=Search';
    // Create DOM from URL
    $html = file_get_html($url);
    // Match all 'A' tags that have the class attribute equal with 'l'
    foreach($html->find('a[class=l]') as $key => $info)
    {
    echo ($key + 1).'. '.$info->plaintext."<br /> ";
    }

    NOTE Make sure to include the parser before using any functions of it:
    view plain copy to clipboard print ?

    Php代码

    1
    include 'simple_html_dom.php';
    1
    include 'simple_html_dom.php';

    For more information regarding the usage of this function consider checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the package files use the following URL: [下载]

  • 相关阅读:
    nginx 中用 sed 批量增加配置文件内容
    apache中 sed 指定文件中某字符串增加行
    centos7 下 nfs 搭建总结
    centos7.2 环境下两个数据库的安装部署
    centos7.2 环境下 mysql-5.1.73 安装配置
    二代云盒混合网
    安装tftp
    云盒所有服务检查
    将某个目录下的 文件(字符窜) 只将数字过滤出来
    让VS2012支持Less css
  • 原文地址:https://www.cnblogs.com/lvchenfeng/p/5261199.html
Copyright © 2020-2023  润新知