• 小爬爬5:重点回顾&&移动端数据爬取1


    1.

    (1)什么是selenium
        - 基于浏览器自动化的一个模块
    (2)在爬虫中为什么使用selenium及其和爬虫之间的关联
        - 可以便捷的获取动态加载的数据
        - 实现模拟登陆
    (3)列举常见的selenium模块的方法及其作用
        - get(url)              
        - find系列的函数进行标签定位         #记住常用的几个
        - send_keys(‘key’)            #录入1个数据池
        - click()                #点击
        - excute_script(‘jsCode’)      #执行js代码
        - page_source              #获取页面的数据
        - switch_to.frame('iframeID')    #iframe需要切换
        - quite()                 #关闭
        - save_screenshot()          #保存屏幕的内容
        - a = ActionChains(bro)        #动作链实例化对象
        - a.click_and_hold('tag')        #点击且长按这个标签
        - tag.move_by_offset(x,y).perform()  #偏移某个标签
    
    (4)loop的作用:
        可以将多个任务对象注册到loop中
        loop就可以通过不间断循环的形式异步的执行任务对象
    
    (5)多任务异步协程是如何实现异步的
        - 协程
        - 任务对象
        - loop

    2.单线程多任务异步协程回顾

    # Author: studybrother sun
    import asyncio
    import aiohttp
    #在实现该函数的时候,其函数实现内部不可以出现非异步模块的代码
    async def request(url):
       async with aiohttp.ClientSession() as s:
           async with await s.get(url=url) as response:
                page_text = await response.text()     #解析的搜索界面
    
                return page_text
    
    def callback(task):  #回调
        print(task.result())
    def callback1(task):
        print(task.result())
    
    #事件循环对象:
    loop = asyncio.get_event_loop()
    c = request('https://www.baidu.com')
    c1 = request('https://www.sogou.com')
    
    task = asyncio.ensure_future(c)
    task.add_done_callback(callback)
    
    task1 = asyncio.ensure_future(c1)
    task1.add_done_callback(callback1)
    
    tasks = [task,task1]
    loop.run_until_complete(asyncio.wait(tasks))

    运行的得到下面的结果:

    <html>
    <head>
        <script>
            location.replace(location.href.replace("https://","http://"));
        </script>
    </head>
    <body>
        <noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
    </body>
    </html>
    <!DOCTYPE html>
    <html lang="cn">
    <head>
        <script>window._speedMark = new Date();
         window.lead_ip = '221.218.208.77';window.now = 1559307496193;</script>    <meta charset="utf-8">
    <link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com">
    <title>搜狗搜索引擎 - 上网从搜狗开始</title>
    <link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="搜狗搜索">
    <meta name="keywords" content="搜狗搜索,网页搜索,微信搜索,视频搜索,图片搜索,音乐搜索,新闻搜索,软件搜索,问答搜索,百科搜索,购物搜索">
    <meta name="description" content="搜狗搜索是全球第三代互动式搜索引擎,支持微信公众号和文章搜索、知乎搜索、英文搜索及翻译等,通过自主研发的人工智能算法为用户提供专业、精准、便捷的搜索服务。">    <link rel="stylesheet" type="text/css" href="/web/index/css/base.v.1.4.12.css">
    <style>.wrapper .suggestion{border: 1px solid #e8e8e8; 622px;-moz-box-shadow: 0px 1px 8px rgba(0,0,0,0.1);-webkit-box-shadow: 0px 1px 8px rgba(0,0,0,0.1);box-shadow: 0px 1px 8px rgba(0,0,0,0.1);border-top-left-radius: 0px;border-top-right-radius: 0px;border-bottom-right-radius: 2px;border-bottom-left-radius: 2px; top:43px;}  .wrapper .suglist{ 206px;}  .wrapper .suglist .keyword {color: #7a77c8;}  .big-scn .suggestion { 654px;}  .big-scn .suglist{236px;}  .wrapper .suglist{ padding:4px 0}</style></head>
    <body >
            <div class="bg-gj-w" id="settings-mask" style="display: none;"></div>
    <div class="gjss" id="settings-advanced" style="display: none;top:-240px;">
        <div class="hf-box" id="settings-save-layer">
            <div class="hf-def">已保存设置</div>
        </div>
        <div class="gjss-tab">
            <a uigs-id="tab_set" href="javascript:void(0);" class="js-settings-tab tab-a cur">搜索设置</a>
            <a uigs-id="tab_adv" href="javascript:void(0);" class="js-settings-tab tab-a">高级搜索</a>
            <a href="javascript:void(0);" class="close-btn" id="settings-close"></a>
        </div>
        <div class="gjss-main">
            <div class="gjss-sz js-settings-content">
                <p class="gjss-err js-settings-mask" style="display: none;">搜索设置暂不可用,请启用浏览器的Cookie功能,然后刷新本页。</p>
                <div class="bg-wkq js-settings-mask" id="settings-tips" style="display: none;"></div>
    
                <dl class="js-as-select">
                    <dt>搜索结果显示条数</dt>
                    <dd>
                        <a href="javascript:void(0);" class="xz" id="settings-number" data-value="10">每页显示10条</a>
                        <ul id="settings-number-list">
                            <li><a uigs-id="set_10" href="javascript:void(0);" data-value="10">每页显示10条</a></li>
                            <li><a uigs-id="set-20" href="javascript:void(0);" data-value="20">每页显示20条</a></li>
                            <li><a uigs-id="set-50" href="javascript:void(0);" data-value="50">每页显示50条</a></li>
                            <li><a uigs-id="set-100" href="javascript:void(0);" data-value="100">每页显示100条</a></li>
                        </ul>
                    </dd>
                    <input type="hidden" name="pageNum" id="settings-show-number" value="10">
                </dl>
                <p class="enter" style="padding-top: 20px;">
                    <a href="javascript:void(0);" id="settings-save" uigs-id="set-save" class="a1">保存</a>
                    <a href="javascript:void(0);" id="settings-reset" uigs-id="set-reset" class="a2">恢复默认</a>
                </p>
            </div>
            <div class="gjss-sz js-settings-content" style="display: none;">
                <form action="/web" target="_blank" id="advanced-search-form">
                    <input type="hidden" name="query" value="">
                    <input name="fieldtitle" type="hidden" value=""/>
                    <input name="fieldcontent" type="hidden" value=""/>
                    <input name="fieldstripurl" type="hidden" value=""/>
                    <input name="bstype" type="hidden" value=""/>
                    <input name="ie" type="hidden" value="utf8"/>
                    <dl>
                        <dt>搜索关键词</dt>
                        <dd class="js-as-radio">
                                                    <div class="input-box js-input-box" id="advanced-query-box">
                                <input name="q" type="text" must="1" size="42" maxlength="100" autocomplete="off" placeholder="例如:搜狗真棒(多个关键词可用空格区分)">
                                <span class="err-word">* 请输入搜索关键词</span>
                            </div>
                            <a uigs-id="adv_split-query" href="javascript:void(0);" data-value="checkbox" class="dk-btn cur">拆分关键词</a>
                            <a uigs-id="adv_no-split-query" href="javascript:void(0);" data-value="" class="dk-btn">不拆分关键词</a>
                            <input type="hidden" name="include" value="checkbox">
                        </dd>
                    </dl>
                    <dl>
                        <dt>在指定站内搜索</dt>
                        <dd>
                            <div class="input-box js-input-box"><input name="sitequery" type="text" size="40" autocomplete="off" placeholder="例如:www.sogou.com"></div>
                        </dd>
                    </dl>
                    <dl class="js-as-select" style="padding-top:16px">
                        <dt>搜索词位于</dt>
                        <dd>
                            <a href="javascript:void(0);" class="xz">网页中任何地方</a>
                            <ul>
                                <li><a href="javascript:void(0);" data-value="0">网页中任何地方</a></li>
                                <li><a href="javascript:void(0);" data-value="1">仅在标题中</a></li>
                                <li><a href="javascript:void(0);" data-value="2">仅在正文中</a></li>
                                <li><a href="javascript:void(0);" data-value="3">仅在网址中</a></li>
                            </ul>
                        </dd>
                        <input type="hidden" name="located" value="0">
                    </dl>
                    <dl class="js-as-select" style="padding-top:16px">
                        <dt>需要搜索的文件格式</dt>
                        <dd >
                            <a href="javascript:void(0);" class="xz">全部网页</a>
                            <ul>
                                <li><a href="javascript:void(0);" data-value="">全部网页</a></li>
                                <li><a href="javascript:void(0);" data-value="doc">Microsoft Word (.doc)</a></li>
                                <li><a href="javascript:void(0);" data-value="xls">Microsoft Excel (.xls)</a></li>
                                <li><a href="javascript:void(0);" data-value="ppt">Microsoft Powerpoint (.ppt)</a></li>
                                <li><a href="javascript:void(0);" data-value="pdf">Adobe Acrobat PDF (.pdf)</a></li>
                                <li><a href="javascript:void(0);" data-value="rtf">RTF (.rtf)</a></li>
                                <li><a href="javascript:void(0);" data-value="all">全部文档</a></li>
                            </ul>
                        </dd>
                        <input type="hidden" name="filetype" value="">
                    </dl>
                    <dl>
                        <dt>搜索结果排序方式</dt>
                        <dd class="js-as-radio">
                            <a uigs-id="adv_relevance-ranking" href="javascript:void(0);" data-value="off" class="dk-btn cur">按相关性排序</a>
                            <a uigs-id="adv_time-sort" href="javascript:void(0);" data-value="on" class="dk-btn">按时间排序</a>
                            <input type="hidden" name="tro" value="off">
                        </dd>
                    </dl>
                    <p class="enter"><input id="adv-search-btn" uigs-id="adv_search-btn" type="submit" class="a1" value="开始搜索"></p>
                </form>
            </div>
        </div>
    </div>
        <div class="wrapper" id="wrap">
            <div class="header">
                <div class="top-nav">
        <ul>
            <li><a onclick="st(this,'40030300','news')" href="http://news.sogou.com" uigs-id="nav_news" id="news">新闻</a></li>
            <li class="cur"><span>网页</span></li>
            <li><a onclick="st(this,'73141200','weixin')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">微信</a></li>
            <li><a onclick="st(this,'40051200','zhihu')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">知乎</a></li>
            <li><a onclick="st(this,'40030500','pic')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">图片</a></li>
            <li><a onclick="st(this,'40030600','video')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">视频</a></li>
            <li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,'','myingyi')">明医</a></li>
            <li><a href="http://english.sogou.com?fr=pcweb_index_nav" uigs-id="nav_overseas" id="overseas" onclick="st(this,'','overseas')" >英文</a></li>
            <li><a onclick="st(this,'web2ww','wenwen')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">问问</a></li>
            <li><a href="http://scholar.sogou.com?fr=common_index_nav" uigs-id="nav_scholar" id="scholar" onclick="st(this,'','scholar')">学术</a></li>
            <li class="show-more">
                <a href="javascript:void(0);" id="more-product">更多<i class="m-arr"></i></a>
                <div class="pos-more" id="products-box" style="top: 40px;">
                    <span class="ico-san"></span>
    
                    <a onclick="st(this,'40031000')" href="http://map.sogou.com" uigs-id="nav_map" id="map">地图</a>
                    <a onclick="st(this,'40031500')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">购物</a>
                    <a onclick="st(this,'40051203')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_more_baike">百科</a>
                    <a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">知识</a>
                    <a onclick="st(this,'40051205')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">应用</a>
                    <a onclick="st(this,'40051205','fanyi')" href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="index_more_fanyi">翻译</a>
                    <a href="http://index.sogou.com" uigs-id="nav_index" id="index_more_index">指数</a>
                                        <a href="http://dangjian.sogou.com" uigs-id="nav_dangjian" id="dangjian" onclick="st(this,'','dangjian')">党建</a>
                                    <span class="all"><a onclick="st(this,'40051206')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">全部</a></span>
                </div>
            </li>
        </ul>
    </div>            <div class="user-box">
        <div class="local-weather" id="local-weather">
            <div class="wea-box" id="cur-weather" style="display: none;"></div>
            <div class="pos-more" id="detail-weather" style="top:40px;"></div>
        </div>
        <span class="line" id="user-box-line" style="display: none;"></span>
        <div class="user-enter">
            <a href="javascript:void(0);" id="show-card" style="display: none" uigs-id="settings_show-card">显示卡片</a>
                        <a href="javascript:void(0);" uigs-id="settings_change-skin" id="changeSkinBtn" >换肤</a>
                    <span class="s-dw">
                <a href="javascript:void(0);" id="settings">设置</a>
                <div class="pos-more" id="settings-box" style="top:40px;">
                    <span class="ico-san"></span>
                    <a href="javascript:void(0);" id="search-settings" uigs-id="settings_config">搜索设置</a>
                    <a href="javascript:void(0);" id="advanced-search" uigs-id="settings_advanced">高级搜索</a>
                    <a href="http://help.sogou.com/?w=01091500&v=1" uigs-id="settings_help">帮助</a>
                </div>
            </span>
                        <a href="javascript:void(0);" class="enter" id="loginBtn">登录</a>            </div>
    </div>
            </div>
            <div class="content" id="content">
                <div class="pos-header" id="top-float-bar">
        <div class="part-one"></div>
        <div class="part-two" id="card-tab-layer">
            <div class="c-top" id="top-card-tab"></div>
        </div>
    </div>
    <div class="logo2" id="logo-s"><span></span></div>            <div class="logo" id="logo-l"><span></span></div>            <div class="search-box" id="search-box">
        <form action="/web" name="sf" id="sf">
            <span class="sec-input-box">
                <input type="text" class="sec-input active" name="query" id="query" maxlength="100" len="80" autocomplete="off" />
            </span>
            <span class="enter-input"><input type="submit" value="" id="stb"></span>
            <input type="hidden" name="_asf" value="www.sogou.com" />
            <input type="hidden" name="_ast" />
            <input type="hidden" name="w" value="01019900" />
            <input type="hidden" name="p" value="40040100" />
            <input type="hidden" name="ie" value="utf8" />
                    <input type="hidden" name="from" value="index-nologin" />
                    <input type="hidden" name="s_from" value="index" />
            <div class="keywords-tips" id="keywordsTips" style="display:none">
                <i></i><p>搜狗的查询限制在"<strong>40个汉字</strong>"以内。</p>
            </div>
        </form>
    </div>
            </div>
                <div class="card-box" id="card-box" style="display: none;">
        <div class="card-box2" id="card-box2">
            <div class="c-top" id="card-tab-box">
                <a href="javascript:void(0);" id="card-settings" uigs-id="settings_settings-btn" class="shezhi"></a>
                <div class="pos-more" id="card-options">
                    <span class="ico-san"></span>
                    <a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card">关闭卡片</a>
                </div>
            </div>
            <div class="c-main" id="card-content"></div>
        </div>
    </div>
    <div class="loog-more" id="scroll-more" style="display: none;">
        <a href="javascript:void(0);" uigs-id="scroll-more">滚动查看更多<br><span class="ico_san"></span></a>
    </div>            <div class="ft" id="footer" style="display: none;">
        <a href="http://fuwu.sogou.com/" target="_blank" uigs-id="footer_tuiguang">企业推广</a><span class="line"></span><a href="http://corp.sogou.com/" target="_blank" uigs-id="footer_about">关于搜狗</a><span class="line"></span><a href="http://ir.sogou.com/" target="_blank" uigs-id="footer_aboutEnglish">About Sogou</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">免责声明</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank"  uigs-id="footer_feedback">意见反馈及投诉</a><span class="line"></span><a href="http://corp.sogou.com/private.html" target="_blank" uigs-id="footer_private">隐私政策</a><br>
        &copy;&nbsp;2004-2019&nbsp;Sogou.com&nbsp;/&nbsp;<span class="g">京网文 (2016) 6432-852号</span>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a><br>
        <span class="g">(京)-经营性-2016-0019</span>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn/" target="_blank" class="g">京ICP备11001839号-1</a>&nbsp;/&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a>
    </div>
    <div class="ft-v1" id="QRcode-footer" style="padding-bottom:53px; ">
        <div class="erwm-box">
            <span class="ewm"></span>
            <div class="erwx">
                <p>搜狗搜索APP</p>
                <p class="p2">搜你所想</p>
            </div>
        </div>
        <div class="ft-info">
            <a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>搜狗输入法</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>浏览器</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>网址导航</a><br> <a href="http://corp.sogou.com/" target="_blank" class="g">关于搜狗</a>&nbsp;-&nbsp;<a href="http://ir.sogou.com/" target="_blank" class="g">About Sogou</a>&nbsp;-&nbsp;<a href="http://fuwu.sogou.com/" target="_blank" class="g">企业推广</a>&nbsp;-&nbsp;<a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">免责声明</a>&nbsp;-&nbsp;<a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">意见反馈及投诉</a>&nbsp;-&nbsp;<a href="http://corp.sogou.com/private.html" target="_blank" class="g" uigs-id="footer_private">隐私政策</a><br>
            &copy;&nbsp;2004-2019&nbsp;Sogou.com&nbsp;/&nbsp;<span class="g">京网文 (2016) 6432-852号</span>&nbsp;/&nbsp;<span class="g">(京)-经营性-2016-0019</span><br>
            <a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn/" target="_blank" class="g">京ICP备11001839号-1</a>&nbsp;/&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a>
        </div>
    </div>            <div class="kuozhan" id="QRcode-box" style="display: none;">
        <a href="javascript:void(0);" id="miniQRcode"></a>
        <span id="QRcode"></span>
    </div>
    <a href="javascript:void(0);" class="back-top" id="back-top"></a>    </div>
            <script>
        var SugPara, uigs_para,
            msBrowserName = navigator.userAgent.toLowerCase(),
            msIsSe = false,
            msIsMSearch = false,
            hasDoodle = false,
            queryinput = document.getElementById('query');
    
        uigs_para={
            "uigs_productid": "webapp",
            "type": "webindex_new",
            "stype": "nologin",
            "scrnwi": screen.width,
            "scrnhi": screen.height,
            "uigs_pbtag": "A",
            "uigs_cookie": "SUID,sct",
                    "protocol": location.protocol.toLowerCase() == "https:" ? "https" : "http"
        };
    
        SugPara = {"enableSug":true,"sugType":"web","domain":"w.sugg.sogou.com","productId":"web","sugFormName":"sf","inputid":"query","submitId":"stb","suggestRid":"01015002","normalRid":"01019900","useParent":0 ,"sugglocation":"index","showVr":true,"showHotwords":true,"suggAbtestObject":{"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}};
    
            
        function mk_con() {
            try {
                window.external.metasearch('make_connection', 'www.google.com.hk');
            } catch (e) {}
        }
    
        if (/se 2.x/i.test(msBrowserName)) {
            msIsSe = true;
        }
    
        if (/metasr/i.test(msBrowserName)) {
            msIsMSearch = true;
        }
    
        if (queryinput) {
            if (msIsSe && msIsMSearch) {
                if (queryinput.addEventListener) {
                    queryinput.addEventListener('keypress', mk_con, false);
                    queryinput.addEventListener('keydown', mk_con, false)
                } else if (queryinput.attachEvent) {
                    queryinput.attachEvent('onkeypress', mk_con);
                    queryinput.attachEvent('onkeydown', mk_con);
                } else {
                    queryinput.onkeypress = mk_con;
                    queryinput.onkeydown = mk_con;
                }
            }
        }
        function getDomain(){
            var domainName = document.domain;
            if(domainName.indexOf("sogou.com")==(domainName.length-9)){
                return ".sogou.com";
            }else if(domainName.indexOf("soso.com")==(domainName.length-8)){
                return ".soso.com";
            }else if(domainName.indexOf("sogo.com") != -1){
                return ".sogo.com"
            }
        }
        window.m_s_index = function() {
            var w = document.sf.query,
                    c = Math.round((new Date().getTime() + Math.random()) * 1000);
    
            w.focus();
    
            if(new RegExp("kw=([^&]+)").test(location.search)) {
                if(w.value.length == 0) {
                    w.value = decodeURIComponent(RegExp.$1);
                }
            }
    
            if (document.cookie.indexOf("SUV=") < 0) {
                document.cookie = "SUV=" + c + ";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+getDomain();
            }
    
                                (new Image).src = '//pb6.sogou.com/v6';
            
        };
    
        function st(self, p, product, anchor) {
            var searchBox = document.sf.query,
                query = encodeURIComponent(searchBox.value),
    
                productUrl = {
                    "news": 'http://news.sogou.com/news?ie=utf8&query=',
                    "web": 'web?ie=utf8&query=',
                    "weixin": 'http://weixin.sogou.com/weixin?type=2&ie=utf8&query=',
                    "zhihu": 'http://zhihu.sogou.com/zhihu?ie=utf8&query=',
                    "pic": 'http://pic.sogou.com/pics?ie=utf8&query=',
                    "video": 'https://v.sogou.com/v?ie=utf8&query=',
                    "myingyi": 'https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=',
                    "overseas": 'http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=',
                    "scholar": 'http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=',
                    "fanyi": 'http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=',
                    "wenwen":'http://wenwen.sogou.com/s/?ch=websearch&w=',
                    "dangjian":'http://dangjian.sogou.com/dangjian?query='
                },
                newHref = productUrl[product] || self.href;
    
            function getConnectSymbol(url) {
                return url.indexOf("?") > -1 ? '&' : '?';
            }
    
            if(searchBox && searchBox.value !== ''){
    
                if(productUrl[product]) {
                    newHref = productUrl[product] + query;
                } else if(newHref.indexOf("kw=") > 0) {
                    newHref = newHref.replace(new RegExp("kw=[^&$]*"), "kw=" + query)
                } else {
                    newHref += getConnectSymbol(newHref) + 'kw=' + query;
                }
            }
    
            if(p){
                newHref += getConnectSymbol(newHref) + "p=" + p;
            }
    
            if (anchor && anchor.length > 0){
                newHref += "#" + anchor;
            }
    
            if (searchBox && searchBox.value == '' && (product == 'wenwen' || product == 'dangjian')){//问问首页链接单独处理
                newHref = self.href;
            }
    
            self.href = newHref;
        }
    
        window.cid = function(o, p) {
            var w = document.sf.query,
                q = encodeURIComponent(w.value);
    
            if (!q) {
                o.href += "?cid=" + p
            } else {
                if (p === "web2ww") {
                    o.href += "s/?cid=web2ww&w=" + q
                } else if (p === "web2bk") {
                    o.href += "Search.e?sp=S" + q + "&cid=web2bk"
                }
            }
        };
    
        window.m_s_index();
    </script>
    <script src="//dlweb.sogoucdn.com/common/lib/jquery/jquery-1.11.0.min.js"></script>
    <script charset="gbk" type="text/javascript" src="/js/sugg_new.v.104.js"></script>
    <script src="/js/pb_v.1.9.6.min.js"></script>
    <script src="/js/lib/jquery.mousewheel.min.js"></script>
    <script src="/js/lib/juicer-min.js"></script>
    <script src="/js/common/widget/login_new.min.v.0.5.js"></script>
    <script src="//account.sogou.com/static/api/passport-async.js"></script>
    <script src="/web/index/js/base.v.1.1.14.js"></script>
    <script src="/web/js/voice.min.v.0.0.6.js"></script>
    <script src="/web/js/taspeed.min.v.0.0.1.js"></script>
    </body>
    </html>
    <!--zly-->
    View Code

    3.移动端数据爬取&&环境配置等

    实验:参考下面的blog

    https://www.cnblogs.com/bobo-zhang/p/10068994.html

    - 移动端数据爬取:
        - 抓包工具:(定义:代理服务器)
            window:- fiddler,mitproxy(两者都是代理服务器)
         mac:青花瓷
    - 在手机中安装证书: - 1让电脑开启一个wifi,然后手机连接wifi(手机和电脑是在同一个网段下) - 手机浏览器中:ip:8888,点击超链进行证书下载 - 需要将手机的代理开启:将代理ip和端口号设置成fiddler的端口和fidd所在机器的ip

    (1)将证书发送给"手机"

    (2)在Fiddler中,点击Tools=>Options=>

    下一步,"允许"其他设备连接:=>"确定"=>OK

    在浏览器中访问:http://localhost:8888/http://localhost:8888/

    得到下面的结果

    我们可以在上图的最后一行下载"证书"

  • 相关阅读:
    poj 2312 Battle City
    poj 2002 Squares
    poj 3641 Pseudoprime numbers
    poj 3580 SuperMemo
    poj 3281 Dining
    poj 3259 Wormholes
    poj 3080 Blue Jeans
    poj 3070 Fibonacci
    poj 2887 Big String
    poj 2631 Roads in the North
  • 原文地址:https://www.cnblogs.com/studybrother/p/10957649.html
Copyright © 2020-2023  润新知