• PHP采集淘宝商品


    项目需求:

      1.通过PHP程序更新所采集淘宝商品的价格以及是否停售

    数据表:

      

    CREATE TABLE `goods` (
    `id`  int(11) NOT NULL AUTO_INCREMENT ,
    `type`  varchar(30) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `keyid`  varchar(60) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `shop_id`  int(11) NULL DEFAULT 0 ,
    `cid`  smallint(6) NULL DEFAULT 0 ,
    `img_id`  int(11) NULL DEFAULT 0 ,
    `imgs`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL ,
    `name`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `url`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `taoke_url`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT '' ,
    `price`  decimal(10,2) NULL DEFAULT 0.00 ,
    `sellerid`  int(11) UNSIGNED NULL DEFAULT NULL ,
    `is_off_sale`  tinyint(1) UNSIGNED NULL DEFAULT 0 ,
    `delist_time`  int(11) NULL DEFAULT 0 ,
    `create_time`  int(11) NULL DEFAULT 0 ,
    `ctime`  int(11) NULL DEFAULT NULL ,
    `cache_data`  text CHARACTER SET utf8 COLLATE utf8_general_ci NULL ,
    `create_day`  int(11) NULL DEFAULT 0 ,
    `commission`  decimal(10,2) NULL DEFAULT 0.00 ,
    `comment_collect_time`  int(11) NULL DEFAULT 0 ,
    `color`  smallint(6) NULL DEFAULT 0 ,
    `content`  text CHARACTER SET utf8 COLLATE utf8_general_ci NULL ,
    `taobao_desc_url`  varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '''' COMMENT '淘宝商品详细介绍地址' ,
    PRIMARY KEY (`id`),
    UNIQUE INDEX `keyid` (`keyid`) USING BTREE ,
    INDEX `shop_id` (`shop_id`) USING BTREE ,
    INDEX `delist_time` (`delist_time`) USING BTREE ,
    INDEX `cid` (`cid`) USING BTREE ,
    INDEX `create_day` (`create_day`) USING BTREE 
    )
    ENGINE=InnoDB
    DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci
    AUTO_INCREMENT=40614
    ROW_FORMAT=COMPACT
    ;

    PHP文件:

    <?php
    $con=mysql_connect('localhost','root',''); 
    mysql_select_db("fanwe", $con);
    $start_time=microtime(1);
    $sql="SELECT id,price,url FROM s_goods WHERE url!='' AND keyid like 'taobao%' AND is_off_sale=0 ORDER BY id LIMIT 10";//更新前10个
    $rs=mysql_query($sql);
    echo 'COUNT:'.mysql_num_rows($rs)."
    ";
    $error=array();
    $i=0;
    $h=fopen('d:/mydomain/updateprice.log','a+');
    while ($r=mysql_fetch_array($rs)){
        $s=microtime(1);
        echo $i++.':'.$r['id']."	";
        $price=getPrice($r['url']);
        $t=microtime(1)-$s;
        echo (ceil($t*1000)/1000)."	";
        if($price===false) echo 'FALSE';
        elseif(bccomp($price,$r['price'])==0) echo 'EQUAL';
        else{
            echo "UPDATE	".$r['price']."	".$price;
            mysql_query("UPDATE s_goods SET price=".$price." WHERE id=".$r['id']);
            fputcsv($h,array(date('Y-m-d H:i:s'),$r['id'],$r['price'],$price));
        }
        echo "
    ";
    }
    fclose($h);
    echo 'COUNT:'.mysql_num_rows($rs)."	TIME:".ceil(microtime(1)-$start_time);
    function getPrice($url,$time=1){
        $des_url='';
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0');
        curl_setopt($ch, CURLOPT_REFERER,'http://www.tmall.com/');
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//设置输出方式, 0为自动输出返回的内容, 1为返回输出的内容,但不自动输出.
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); //timeout on connect
        curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout on response
        curl_setopt($ch, CURLOPT_HEADER, 1);//是否输出头信息,0为不输出,非零则输出
        curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
        curl_setopt($ch, CURLOPT_URL, $url);
        $content = curl_exec($ch);
        curl_close($ch);
        if(preg_match("/'reservePrice's*:s*'([d.]+?)',/",$content,$price)){
            $price = (float)$price[1];
        }elseif(preg_match('/price:([d.]+?),/',$content,$price)){
            $price = (float)$price[1];
        }
        if(!$price&&preg_match('/tmall/',$url)){//天猫促销价 add LiuYang 2014-02-24 15:10
            preg_match('/id=(d+)+/',$url,$temp);
            $url2="http://mdskip.taobao.com/core/initItemDetail.htm?itemId=".$temp[1];
            $ch = curl_init();
            curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
            curl_setopt( $ch, CURLOPT_URL, $url2 );
            curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
            curl_setopt( $ch, CURLOPT_ENCODING, "" );
            curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
            curl_setopt( $ch, CURLOPT_REFERER, 'http://www.tmall.com' );
            curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
            curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );   
            curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, 10 );
            curl_setopt( $ch, CURLOPT_TIMEOUT, 10 );
            curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
            $price_content = curl_exec( $ch );
            $response = curl_getinfo( $ch );
            curl_close ( $ch );
            $price_content=json_decode(iconv('gbk','utf-8',preg_replace('/(d{10,}):/','"${1}":',$price_content)),true);
            $priceinfo=$price_content['defaultModel']['itemPriceResultDO']['priceInfo'];
            $price=array();
            foreach ($priceinfo as $v){
                $price[]=$v['price'];
                if(is_array($v['promotionList'])){
                    foreach ($v['promotionList'] as $v2){
                        $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price'];
                    }
                }
                if(is_array($v['suggestivePromotionList'])){
                    foreach ($v['suggestivePromotionList'] as $v2){
                        $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price'];
                    }
                }
            }
            $price=count($price)>0?min($price):false;
        }
        if($price) return $price;
        elseif($time==1) return getPrice($url,2);
        else return false;
    }
    ?>

    执行方式如果采用apache或nginx等服务器,会因为各个服务器的最大响应时间而受影响.如果只更新10个那可能会完成,如果是上百个肯定是不能完全更新的.

    可以采用PHP命令的等式来执行.

    D:mydomain>php updateprice.php
    COUNT:10
    0:36599 0.232   EQUAL
    1:36600 1.091   EQUAL
    2:36601 1.039   EQUAL
    3:36603 1.08    EQUAL
    4:36604 0.984   EQUAL
    5:36605 0.972   EQUAL
    6:36610 1.019   EQUAL
    7:36611 0.971   EQUAL
    8:36612 1.048   EQUAL
    9:36613 1.149   EQUAL
    COUNT:10        TIME:10

    如此既清晰又明了,而且会一直执行到程序完成,不会担心服务器没有响应.

    上面是在windows下面执行,如果输入php提示:

    'php' 不是内部或外部命令,也不是可运行的程序
    或批处理文件。

    就必须把php.exe所在的地址补全

    D:mydomain>D:AppServphp5php updatePrice.php

    或者把php.exe所在的地址加入全局变量

    此方法在Linux下同样有用只用修改对应的地址即可,在linux中php命令是可以直接用的.

    [root@liu ~]# php updatePrice.php

    此方法有一个缺点,就是执行效率问题.一个商品采集平均需要0.8秒.那10000个商品采集完需要2个半小时.

    如想继续了解,请看下篇.

  • 相关阅读:
    [转]Splay算法
    [转]模拟退火算法
    [转]九种背包问题
    [转]C++实现平衡二叉树
    关于set
    __builtin_popcount
    为什么调用线程的join方法,等待的是执行join方法的线程
    volatile and 指令重排序 (单例模式用volatile来声明单例对象)
    线程池
    sts 转用 IDEA随记
  • 原文地址:https://www.cnblogs.com/lywy510/p/3613528.html
Copyright © 2020-2023  润新知