• 使用PHP下载文件


    使用PHP脚本来下载文件,无非是通过两种方式,一种是使用systemexec等即有的函数调用系统自带的下载工具,比如 wget 之类的来下载文件,还有一种是使用php本身利用Socket来下载文件,我选择第二种方式。

    使用Socket下载文件,首先如果是http协议的文件,必须明白HTTP协议的运行过程,如果是FTP协议的则要了解ftp协议运行过程,比较繁琐。比如HTTP协议访问一个文件的代码:(来自手册)


    <?php
    $fp fsockopen("www.example.com"80$errno$errstr30);
    if (!
    $fp
    ) {
            echo 
    "$errstr ($errno)<br />\n"
    ;
    } else {
        
    $out "GET / HTTP/1.1\r\n"
    ;
        
    $out .= "Host: www.example.com\r\n"
    ;
        
    $out .= "Connection: Close\r\n\r\n"
    ;

        
    fwrite($fp$out
    );
            while (!
    feof($fp
    )) {
                echo 
    fgets($fp128
    );
            }
        
    fclose($fp
    );
    }
    ?>

    我们为了简单起见,使用fopen直接访问远程文件来达到目的,同事又能够访问http,也能访问ftp,比较合适。当然,如果按照上面的思路来说,也可以使用ftp的函数库来实现。

    我们使用fopen函数来完成我们的工作,实现了如下代码: 



    #! /usr/bin/php
    <?php
    error_reporting
    (0);
    set_time_limit(0);

    //无参数则给出提示
    if ($argc 2){
            echo 
    "Usage: "$argv[0] ." URL [Destination]\n\n";
            exit();
    }

    //设置获取基本变量
    $url $argv[1];
    $save_path $argv[2] ? $argv[2] : "./";
    $file_name array_pop(explode("/"$url));
    $localfile $save_path $file_name;

    //检查变量
    if (!check_url($url)){
            exit(
    "Error: URL "$url ." invalid.\n\n");
    }
    if (
    file_exists($localfile)){
            exit(
    "Error: local file "$localfile ." exists.\n\n");
    }

    //打开远程文件
    $fp fopen($url"rb");
    if (!
    $fp){
            exit(
    "Error: Download "$url ." failed.\n\n");
    }

    //打开本地文件
    $sp fopen($localfile"wb");
    if (!
    $sp){
            exit(
    "Error: Open local file "$localfile ." failed.\n\n");
    }

    //下载远程文件
    echo "Downloading, please waiting...\n\n";
    while (!
    feof($fp)){
        
    $tmpfile .= fread($fp1024);
    }

    //保存文件到本地
    fwrite($sp$tmpfile);
    fclose($fp);
    fclose($sp);
    echo 
    "Download file "$file_name ." succeed!\n\n";

    /* 检查URL合法性函数 */
    function check_url($url){
            return 
    preg_match("/^(http|ftp)(:\/\/)([a-zA-Z0-9-_]+[\.\/]+[\w\-_\/]+.*)+$/i"$url);    
    }

    ?> 




    我们把以上代码保存为 download.php 文件,在Linux/Unix下记得要加上可执行属性:
    chmod +x download.php

    另外,PHP脚本引擎的路径必须是 /usr/bin/php ,如果不是,请自行修改第一行为实际的PHP引擎路径,比如:
    #! /usr/local/php/bin/php


    使用上面的脚本来下载文件:
    download.php       远程文件      保存路径

    如把Google Talk程序下载到我们的 /tmp 目录下:
    download.php  http://dl.google.com/googletalk/googletalk-setup.exe     /tmp/

    如果不出错,等待一会就能够在 /tmp/ 目录下看到 googletalk-setup.exe 文件。

    能够改进的就是支持更多协议、需要验证的能够输入用户名密码、有下载进度条。至于断点续传和多线程对于PHP来说还不太现实,有兴趣的可以自己加深一步。

    PS: 我另外发现一个更强的HTTP下载类,是dedeCMS的作者IT柏拉图写的:

    <?
    /*=======================================
    // 织梦Http下载类
    // 织梦之旅 www.dedecms.com 
    =======================================*/
    class DedeHttpDown
    {
    var $m_url = "";
    var $m_urlpath = "";
    var $m_scheme = "http";
    var $m_host = "";
    var $m_port = "80";
    var $m_user = "";
    var $m_pass = "";
    var $m_path = "/";
    var $m_query = "";
    var $m_fp = "";
    var $m_error = "";
    var $m_httphead = "" ;
    var $m_html = "";
    //
    //初始化系统
    //
    function PrivateInit($url)
    {
          $urls = "";
          $urls = @parse_url($url);
          $this->m_url = $url;
            if(is_array($urls))
            {
            $this->m_host = $urls["host"];
            if(!empty($urls["scheme"])) $this->m_scheme = $urls["scheme"];
         
            if(!empty($urls["user"])){
            $this->m_user = $urls["user"];
            }
         
            if(!empty($urls["pass"])){
            $this->m_pass = $urls["pass"];
            }

            if(!empty($urls["port"])){
            $this->m_port = $urls["port"];
            }
         
            if(!empty($urls["path"])) $this->m_path = $urls["path"];
            $this->m_urlpath = $this->m_path;
         
            if(!empty($urls["query"]))
            {
            $this->m_query = $urls["query"];
            $this->m_urlpath .= "?".$this->m_query;
            }
         }
    }
    //
    //打开指定网址
    //
    function OpenUrl($url)
    {
         //重设各参数
         $this->m_url = "";
         $this->m_urlpath = "";
         $this->m_scheme = "http";
         $this->m_host = "";
         $this->m_port = "80";
         $this->m_user = "";
         $this->m_pass = "";
         $this->m_path = "/";
         $this->m_query = "";
         $this->m_error = "";
         $this->m_httphead = "" ;
         $this->m_html = "";
         $this->Close();
         //初始化系统
         $this->PrivateInit($url);
         $this->PrivateStartSession();
    }
    //
    //获得某操作错误的原因
    //
    function printError()
    {
         echo "错误信息:".$this->m_error;
         echo "具体返回头:<br>";
         foreach($this->m_httphead as $k=>$v)
         { echo "$k => $v <br>\r\n"; }
    }
    //
    //判别用Get方法发送的头的应答结果是否正确
    //
    function IsGetOK()
    {
         if( ereg("^2",$this->GetHead("http-state")) )
         { return true; }
         else
         {
          $this->m_error .= $this->GetHead("http-state")." - ".$this->GetHead("http-describe")."<br>";
          return false;
         }
    }
    //
    //看看返回的网页是否是text类型
    //
    function IsText()
    {
         if(ereg("^2",$this->GetHead("http-state"))
          && eregi("^text",$this->GetHead("content-type")))
         { return true; }
         else
         {
          $this->m_error .= "内容为非文本类型<br>";
          return false;
         }
    }
    //
    //判断返回的网页是否是特定的类型
    //
    function IsContentType($ctype)
    {
         if(ereg("^2",$this->GetHead("http-state"))
          && $this->GetHead("content-type")==strtolower($ctype))
         { return true; }
         else
         {
          $this->m_error .= "类型不对 ".$this->GetHead("content-type")."<br>";
          return false;
         }
    }
    //
    //用Http协议下载文件
    //
    function SaveToBin($savefilename)
    {
         if(!$this->IsGetOK()) return false;
         if(@feof($this->m_fp))
         { $this->m_error = "连接已经关闭!"; return false; }
         $fp = fopen($savefilename,"w") or die("写入文件 $savefilename 失败!");
         while(!feof($this->m_fp)){
          @fwrite($fp,fgets($this->m_fp,256));
         }
         @fclose($this->m_fp);
         return true;
    }
    //
    //保存网页内容为Text文件
    //
    function SaveToText($savefilename)
    {
         if($this->IsText()) $this->SaveBinFile($savefilename);
         else return "";
    }
    //
    //用Http协议获得一个网页的内容
    //
    function GetHtml()
    {
         if(!$this->IsText()) return "";
         if($this->m_html!="") return $this->m_html;
         if(!$this->m_fp||@feof($this->m_fp)) return "";
         while(!feof($this->m_fp)){
          $this->m_html .= fgets($this->m_fp,256);
         }
         @fclose($this->m_fp);
         return $this->m_html;
    }
    //
    //开始HTTP会话
    //
    function PrivateStartSession()
    {
         if(!$this->PrivateOpenHost()){
          $this->m_error .= "打开远程主机出错!";
          return false;
         }
         if($this->GetHead("http-edition")=="HTTP/1.1") $httpv = "HTTP/1.1";
         else $httpv = "HTTP/1.0";
         fputs($this->m_fp,"GET ".$this->m_urlpath." $httpv\r\n");
         fputs($this->m_fp,"Host: ".$this->m_host."\r\n");
         fputs($this->m_fp,"Accept: */*\r\n");
         fputs($this->m_fp,"User-Agent: Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.2)\r\n");
         //HTTP1.1协议必须指定文档结束后关闭链接,否则读取文档时无法使用feof判断结束
         if($httpv=="HTTP/1.1") fputs($this->m_fp,"Connection: Close\r\n\r\n");
         else fputs($this->m_fp,"\r\n");
         $httpstas = fgets($this->m_fp,256);
         $httpstas = split(" ",$httpstas);
         $this->m_httphead["http-edition"] = trim($httpstas[0]);
         $this->m_httphead["http-state"] = trim($httpstas[1]);
         $this->m_httphead["http-describe"] = "";
         for($i=2;$i<count($httpstas);$i++){
          $this->m_httphead["http-describe"] .= " ".trim($httpstas[$i]);
         }
         while(!feof($this->m_fp)){
          $line = str_replace("\"","",trim(fgets($this->m_fp,256)));
          if($line == "") break;
          if(ereg(":",$line)){
           $lines = split(":",$line);
           $this->m_httphead[strtolower(trim($lines[0]))] = trim($lines[1]);
          }
         }
    }
    //
    //获得一个Http头的值
    //
    function GetHead($headname)
    {
         $headname = strtolower($headname);
         if(isset($this->m_httphead[$headname]))
          return $this->m_httphead[$headname];
         else
          return "";
    }
    //
    //打开连接
    //
    function PrivateOpenHost()
    {
         if($this->m_host=="") return false;
         $this->m_fp = @fsockopen($this->m_host, $this->m_port, &$errno, &$errstr,10);
         if(!$this->m_fp){
          $this->m_error = $errstr;
          return false;
         }
         else{
          return true;
         }
    }
    //
    //关闭连接
    //
    function Close(){
         @fclose($this->m_fp);
    }
    }

    ?>

    ----------------------------------------------------------------------------------

    这个类的使用方法:

    下载网页
    <?
    $httpdown = new DedeHttpDown();
    $httpdown->OpenUrl("http://www.dedecms.com");
    echo $httpdown->GetHtml();
    $httpdown->Close();
    ?>
    如果下载图片并保存,可以用
    <?
    $httpdown = new DedeHttpDown();
    $httpdown->OpenUrl("http://prato.bokele.com/0/0/399/bGluMi5qcGc=.jpg");
    echo $httpdown->SaveBin("test.jpg");
    $httpdown->Close();
    echo "<img src='test.jpg'>";
    ?>

  • 相关阅读:
    Python 实践
    Keras实践
    NLP S实践
    Spark java 实践
    Seaborn数据探索可视化
    Linux实践
    Redis
    ML算法选型
    Elasticsearch issue
    牛客练习赛37
  • 原文地址:https://www.cnblogs.com/mfryf/p/2527302.html
Copyright © 2020-2023  润新知