• PHP curl_setopt函数用法介绍中篇


    此篇已实例为主。

    一.一般的实例

    demo1.php

    <?php
        $user = "admin123";
        $pass = "admin456";
        // $curlPost = "user=$user&pass=$pass";    ####  测试一
    ######测试二
    $curlPost = array( 'a'=>123, 'b'=>456, 'c'=>789 ); $ch = curl_init(); //初始化一个CURL对象 curl_setopt($ch, CURLOPT_URL, "demo2.php"); //设置你所需要抓取的URL curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); //设置curl参数,要求结果是否输出到屏幕上,为true的时候是不返回到网页中 // 假设上面的0换成1的话,那么接下来的$data就需要echo一下。 curl_setopt($ch, CURLOPT_POST, 1); //post提交 curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost); $data = curl_exec($ch); // echo $data; //运行curl,请求网页。 curl_close($ch); ?>

    查看结果:

    <?php
    // echo "aaaaa";
    echo "<pre>";
    var_dump($_POST);
    echo "<br/>";
    echo $_POST['b'];
    
    
    echo "<br/>";
    
    $a = array('1'=>1234,'2'=>567);
    var_dump($a);
    
    ?>
    <?php
    // 初始化一个 cURL 对象
    $curl = curl_init();
    
    // 设置你需要抓取的URL
    curl_setopt($curl, CURLOPT_URL, 'http://www.baidu.com');
    
    // 设置header
    curl_setopt($curl, CURLOPT_HEADER, 1);
    
    // 设置cURL 参数,要求结果保存到字符串中还是输出到屏幕上。
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    
    // 运行cURL,请求网页
    $data = curl_exec($curl);
    
    // 关闭URL请求
    curl_close($curl);
    
    // 显示获得的数据
    var_dump($data);
    ?>

     二.采集数据信息

    在实践的项目中,要求采集某网站产品列表的信息。右击网站,查看源码,发现并无产品信息。通过分析该网站,产品信息是通过ajax调用的。

    利用跑数据的形式,获取产品信息,分析url地址,得到的json形式的产品数据,通过转换,将产品属性“对坐入号”获取,我采用的是插入到数据库当中。

    代码如下:

    <?php
    header('Content-Type:text/html;charset=UTF-8');
    
    set_time_limit(0);
    
    $id = isset($_GET['id']) ? intval($_GET['id']) : 1;
    
    $listurl = "http://www.ptgcn.com/Handler/PublicationHandler.ashx?r=0.38488502931227453&func=GetDataByPCD&index={$id}&size=20&sort=&dir=asc&param=";
    // echo $listurl;
    exit;    ####默认结束状态
    $ch = curl_init();
    // var_dump($ch);
    // 2. 设置选项,包括URL
    curl_setopt($ch, CURLOPT_URL, $listurl);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch,CURLOPT_HTTPHEADER,array ("Content-Type: text/xml; charset=utf-8","Expect: 100-continue"));
    
    // 3. 执行并获取HTML文档内容
    $output = curl_exec($ch);
    
    
    $httpCode = curl_getinfo($ch,CURLINFO_HTTP_CODE);
    
    curl_close($ch);
    // var_dump($httpCode);
    if($httpCode == '200'){
    // echo "123456";
        // var_dump($output);
        $json_b = json_decode($output,true);
        // var_dump($json_b);
        $new = $json_b['rows'];
        // var_dump($new);
        foreach($new as $row){
    
          $catalog ='';
          $catalog = addslashes(trim($row['CatalogNo']));
          $catalog = "'". $catalog."'";
    
          $url = '';
          $url = "http://www.ptgcn.com/Products/".trim($row['PermaLink']).".html";
          $catalog_addr = $url;
          $catalog_addr = "'".$catalog_addr."'";
          
          $pubmed_id ='';
          $pubmed_id = addslashes(trim($row['PMID']));
          $pubmed_id = "'".$pubmed_id."'";
          
          $author = '';
          $author = addslashes(trim($row['Author']));
          $author = "'".$author."'";
          
          $journal_a = '';
          $journal_b = '';
          $journal = '';
          $journal_a = addslashes(trim($row['Journal']));
          $journal_b = addslashes(trim($row['PubDate']));
          $journal = $journal_a.",".$journal_b;
          $journal = "'".$journal."'";
          // $journal = addslashes(trim($row['CatalogNo']));
          $application = '';
          $application = addslashes(trim($row['App']));
          $application = "'".$application."'";
    
          $species = '';
          $species = addslashes(trim($row['Species']));
          $species = "'".$species."'";
          
          $title = '';
          $title = addslashes(trim($row['Subject']));
          $title = "'".$title."'";
    
         var_dump($row);
         exit;
          $con = mysql_connect('localhost',"root","root");
          mysql_set_charset("utf8");
          $select = mysql_select_db("wh");
    
          $sql = '';
          $sql = "insert into ptg(catalog,catalog_addr,pubmed_id,author,journal,application,species,title)values({$catalog},{$catalog_addr},
    {$pubmed_id},{$author},{$journal},{$application},{$species},{$title})"; echo $sql; $query_insert = mysql_query($sql); if($query_insert){ echo "success!"; }else{ $html = ''; $html = $html.$catalog." "; file_put_contents('error.txt',$html,FILE_APPEND); //用于装载执行失败的数据 } mysql_close($con); // var_dump($con,$select); // exit; } }else{ echo 'finished !'; exit; } ?> <script> function JumpUrl() { location.href='?id=<?php echo ($id+1);?>'; } setTimeout('JumpUrl()',1); </script>

    说到采集再介绍一种方法:

    三.利用phpQuery.采集数据信息

    下载地址:http://www.jb51.net/article/59522.htm

           git:https://github.com/TobiaszCudnik/phpquery

    1.使用场合,右击某网站,查看源码,如果源码信息与网站本身一样,可以考虑使用phpQuery进行采集

    2.phpQuery,顾名思义:及将源码的编译方式转换为jQuery的形式。然后一切调用都使用jQuery风格即可

    3.说明:先要引用  include 'phpQuery/phpQuery.php';

    如:

    简单的三行代码,就可以获取头条内容。首先在程序中包含phpQuery.php核心程序,然后调用读取目标网页,最后输出对应标签下的内容。

    pq()是一个功能强大的方法,跟jQuery的$()如出一辙,jQuery的选择器基本上都能使用在phpQuery上,只要把“.”变成 “->”。

    如上例中,pq(".blkTop h1:eq(0)")抓取了页面class属性为blkTop的DIV元素,并找到该DIV内部的第一个h1标签,

    然后用html()方法获取h1标签里 的内容(带html标签),也就是我们要获取的头条信息,如果使用text()方法,则只获取头条的文本内容。

    当然要使用好phpQuery,关键是要找 对文档中对应内容的节点。

    4.具体的采集代码如下:

    <?php
    header('Content-Type:text/html;charset=UTF-8');
    include 'phpQuery/phpQuery.php';
    
    set_time_limit(0);
    
    $id = isset($_GET['id']) ? intval($_GET['id']) : 1;
    
    
    if($id > 14172){
       echo "finish!";
       exit;
    }
    
    
    
    $con = mysql_connect('localhost',"root","root");
    mysql_set_charset("utf8");
    $select = mysql_select_db("wh");
    
    $sql = "select catalog_addr from ptg where id = {$id}";
    $query = mysql_query($sql);
    $row = mysql_fetch_assoc($query);
    // var_dump($row);
    $url = stripslashes(trim($row['catalog_addr']));
    $url = str_replace("html","htm",$url); 
    // echo $url;
    // exit;
    
    
    // echo "aaaaa";
    // phpQuery::newDocumentFile('http://helloweba.com/blog.html');
    ####$url,就是单纯的url地址
    phpQuery::newDocumentFile($url); // echo "aaaaa"; // exit; $artList = pq(".proTitle"); // var_dump($artList); #### No.1 foreach($artList as $li){ $one = pq($li)->find('h1')->html(); $one = strip_tags($one); $one = trim($one); echo $one.'<br/>'; } #### No.2 $artList_a = pq("#dvProApp"); foreach($artList_a as $li){ $two_a = pq($li) -> find('tr td')->eq(1) ->html(); $two_b = pq($li) -> find('tr td')->eq(1) ->find('a') ->html(); $two_a = strip_tags($two_a); $two_a = trim($two_a); // var_dump($two_a); // echo '<br/>'; $two_b = strip_tags($two_b); $two_b = trim($two_b); // var_dump($two_b); // echo '<br/>'; $two = str_replace($two_b,"",$two_a); $two = strip_tags($two); $two = str_replace(" ","",$two); // $two = str_replace(" ","",$two); $two = trim($two); echo $two.'<br/>'; // $two = preg_replace('/<(/?a.*?)>/si','',$two); // echo $two; // exit; } ##### N0.3 $artList_b = pq("#dvProApp"); foreach($artList_b as $li){ $three = pq($li) -> find('tr')->eq(1)->find('td')->eq(1)->html(); $three = trim(strip_tags($three)); echo $three.'<br/>'; } ##### N0.4 $artList_c = pq("#dvProApp"); foreach($artList_c as $li){ $four = pq($li) -> find('tr')->eq(2)->find('td')->eq(1)->html(); $four = trim(strip_tags($four)); echo $four.'<br/>'; } ##### N0.5 $artList_d = pq("#dvProApp"); foreach($artList_d as $li){ $five = pq($li) -> find('tr')->eq(3)->find('td')->eq(1)->html(); $five = trim(strip_tags($five)); echo $five.'<br/>'; } ##### N0.6 $artList_e = pq("#dvProImm"); foreach($artList_e as $li){ $six = pq($li) -> find('tr')->eq(2)->find('td')->eq(1)->html(); $six = trim(strip_tags($six)); echo $six.'<br/>'; } ##### N0.7 $artList_f = pq("#dvProImm"); foreach($artList_f as $li){ $seven = pq($li) -> find('tr')->eq(2)->find('td')->eq(3)->html(); $seven = trim(strip_tags($seven)); echo $seven.'<br/>'; } ##### N0.8 $artList_g = pq("#dvProImm"); foreach($artList_g as $li){ $ba = pq($li) -> find('tr')->eq(3)->find('td')->eq(1)->html(); $ba = trim(strip_tags($ba)); echo $ba.'<br/>'; } ##### N0.9 $artList_h = pq("#dvProImm"); foreach($artList_h as $li){ $nine = pq($li) -> find('tr')->eq(3)->find('td')->eq(3)->html(); $nine = trim(strip_tags($nine)); echo $nine.'<br/>'; } ####执行插入操作 $one = "'".addslashes(trim($one))."'"; $two = "'".addslashes(trim($two))."'"; $three = "'".addslashes(trim($three))."'"; $four = "'".addslashes(trim($four))."'"; $five = "'".addslashes(trim($five))."'"; $six = "'".addslashes(trim($six))."'"; $seven = "'".addslashes(trim($seven))."'"; $ba = "'".addslashes(trim($ba))."'"; $nine = "'".addslashes(trim($nine))."'"; $sql_in = "insert into ptg_page(title,house_app,pub_app,spe_spe,pub_spe,gen_num,gen_id,gene_sym,syn)values({$one},{$two},{$three},{$four},{$five},{$six},{$seven},{$ba},{$nine})"; echo $sql_in.'<br/>'; $query_in = mysql_query($sql_in); if($query_in){ echo "success!"; }else{ $html = ''; $html = $html.$url." "; file_put_contents('error_url.txt',$html,FILE_APPEND); //用于装载执行失败的数据 } mysql_close($con); ?> <script> function JumpUrl(){ location.href='?id=<?php echo ($id+1);?>'; } setTimeout('JumpUrl()',1); </script>
  • 相关阅读:
    微信JSSDK使用指南
    安装eclipse中html/jsp/xml editor插件以及改动html页面的字体
    OpenLayers 3+Geoserver+PostGIS实现点击查询
    编程算法
    javascript闭包具体解释
    网络安全基本概念
    Android 5.1 Settings源代码简要分析
    Linq 使用注意
    父类引用指向子类对象
    CPU使用率
  • 原文地址:https://www.cnblogs.com/wuheng1991/p/5145398.html
Copyright © 2020-2023  润新知