• 小爬爬5:重点回顾&&移动端数据爬取1


    1.

    (1)什么是selenium
        - 基于浏览器自动化的一个模块
    (2)在爬虫中为什么使用selenium及其和爬虫之间的关联
        - 可以便捷的获取动态加载的数据
        - 实现模拟登陆
    (3)列举常见的selenium模块的方法及其作用
        - get(url)              
        - find系列的函数进行标签定位         #记住常用的几个
        - send_keys(‘key’)            #录入1个数据池
        - click()                #点击
        - excute_script(‘jsCode’)      #执行js代码
        - page_source              #获取页面的数据
        - switch_to.frame('iframeID')    #iframe需要切换
        - quite()                 #关闭
        - save_screenshot()          #保存屏幕的内容
        - a = ActionChains(bro)        #动作链实例化对象
        - a.click_and_hold('tag')        #点击且长按这个标签
        - tag.move_by_offset(x,y).perform()  #偏移某个标签
    
    (4)loop的作用:
        可以将多个任务对象注册到loop中
        loop就可以通过不间断循环的形式异步的执行任务对象
    
    (5)多任务异步协程是如何实现异步的
        - 协程
        - 任务对象
        - loop

    2.单线程多任务异步协程回顾

    # Author: studybrother sun
    import asyncio
    import aiohttp
    #在实现该函数的时候,其函数实现内部不可以出现非异步模块的代码
    async def request(url):
       async with aiohttp.ClientSession() as s:
           async with await s.get(url=url) as response:
                page_text = await response.text()     #解析的搜索界面
    
                return page_text
    
    def callback(task):  #回调
        print(task.result())
    def callback1(task):
        print(task.result())
    
    #事件循环对象:
    loop = asyncio.get_event_loop()
    c = request('https://www.baidu.com')
    c1 = request('https://www.sogou.com')
    
    task = asyncio.ensure_future(c)
    task.add_done_callback(callback)
    
    task1 = asyncio.ensure_future(c1)
    task1.add_done_callback(callback1)
    
    tasks = [task,task1]
    loop.run_until_complete(asyncio.wait(tasks))

    运行的得到下面的结果:

    <html>
    <head>
        <script>
            location.replace(location.href.replace("https://","http://"));
        </script>
    </head>
    <body>
        <noscript><meta http-equiv="refresh" content="0;url=http://www.baidu.com/"></noscript>
    </body>
    </html>
    <!DOCTYPE html>
    <html lang="cn">
    <head>
        <script>window._speedMark = new Date();
         window.lead_ip = '221.218.208.77';window.now = 1559307496193;</script>    <meta charset="utf-8">
    <link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com">
    <title>搜狗搜索引擎 - 上网从搜狗开始</title>
    <link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon">
    <meta http-equiv="X-UA-Compatible" content="IE=Edge">
    <link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="搜狗搜索">
    <meta name="keywords" content="搜狗搜索,网页搜索,微信搜索,视频搜索,图片搜索,音乐搜索,新闻搜索,软件搜索,问答搜索,百科搜索,购物搜索">
    <meta name="description" content="搜狗搜索是全球第三代互动式搜索引擎,支持微信公众号和文章搜索、知乎搜索、英文搜索及翻译等,通过自主研发的人工智能算法为用户提供专业、精准、便捷的搜索服务。">    <link rel="stylesheet" type="text/css" href="/web/index/css/base.v.1.4.12.css">
    <style>.wrapper .suggestion{border: 1px solid #e8e8e8; 622px;-moz-box-shadow: 0px 1px 8px rgba(0,0,0,0.1);-webkit-box-shadow: 0px 1px 8px rgba(0,0,0,0.1);box-shadow: 0px 1px 8px rgba(0,0,0,0.1);border-top-left-radius: 0px;border-top-right-radius: 0px;border-bottom-right-radius: 2px;border-bottom-left-radius: 2px; top:43px;}  .wrapper .suglist{ 206px;}  .wrapper .suglist .keyword {color: #7a77c8;}  .big-scn .suggestion { 654px;}  .big-scn .suglist{236px;}  .wrapper .suglist{ padding:4px 0}</style></head>
    <body >
            <div class="bg-gj-w" id="settings-mask" style="display: none;"></div>
    <div class="gjss" id="settings-advanced" style="display: none;top:-240px;">
        <div class="hf-box" id="settings-save-layer">
            <div class="hf-def">已保存设置</div>
        </div>
        <div class="gjss-tab">
            <a uigs-id="tab_set" href="javascript:void(0);" class="js-settings-tab tab-a cur">搜索设置</a>
            <a uigs-id="tab_adv" href="javascript:void(0);" class="js-settings-tab tab-a">高级搜索</a>
            <a href="javascript:void(0);" class="close-btn" id="settings-close"></a>
        </div>
        <div class="gjss-main">
            <div class="gjss-sz js-settings-content">
                <p class="gjss-err js-settings-mask" style="display: none;">搜索设置暂不可用,请启用浏览器的Cookie功能,然后刷新本页。</p>
                <div class="bg-wkq js-settings-mask" id="settings-tips" style="display: none;"></div>
    
                <dl class="js-as-select">
                    <dt>搜索结果显示条数</dt>
                    <dd>
                        <a href="javascript:void(0);" class="xz" id="settings-number" data-value="10">每页显示10条</a>
                        <ul id="settings-number-list">
                            <li><a uigs-id="set_10" href="javascript:void(0);" data-value="10">每页显示10条</a></li>
                            <li><a uigs-id="set-20" href="javascript:void(0);" data-value="20">每页显示20条</a></li>
                            <li><a uigs-id="set-50" href="javascript:void(0);" data-value="50">每页显示50条</a></li>
                            <li><a uigs-id="set-100" href="javascript:void(0);" data-value="100">每页显示100条</a></li>
                        </ul>
                    </dd>
                    <input type="hidden" name="pageNum" id="settings-show-number" value="10">
                </dl>
                <p class="enter" style="padding-top: 20px;">
                    <a href="javascript:void(0);" id="settings-save" uigs-id="set-save" class="a1">保存</a>
                    <a href="javascript:void(0);" id="settings-reset" uigs-id="set-reset" class="a2">恢复默认</a>
                </p>
            </div>
            <div class="gjss-sz js-settings-content" style="display: none;">
                <form action="/web" target="_blank" id="advanced-search-form">
                    <input type="hidden" name="query" value="">
                    <input name="fieldtitle" type="hidden" value=""/>
                    <input name="fieldcontent" type="hidden" value=""/>
                    <input name="fieldstripurl" type="hidden" value=""/>
                    <input name="bstype" type="hidden" value=""/>
                    <input name="ie" type="hidden" value="utf8"/>
                    <dl>
                        <dt>搜索关键词</dt>
                        <dd class="js-as-radio">
                                                    <div class="input-box js-input-box" id="advanced-query-box">
                                <input name="q" type="text" must="1" size="42" maxlength="100" autocomplete="off" placeholder="例如:搜狗真棒(多个关键词可用空格区分)">
                                <span class="err-word">* 请输入搜索关键词</span>
                            </div>
                            <a uigs-id="adv_split-query" href="javascript:void(0);" data-value="checkbox" class="dk-btn cur">拆分关键词</a>
                            <a uigs-id="adv_no-split-query" href="javascript:void(0);" data-value="" class="dk-btn">不拆分关键词</a>
                            <input type="hidden" name="include" value="checkbox">
                        </dd>
                    </dl>
                    <dl>
                        <dt>在指定站内搜索</dt>
                        <dd>
                            <div class="input-box js-input-box"><input name="sitequery" type="text" size="40" autocomplete="off" placeholder="例如:www.sogou.com"></div>
                        </dd>
                    </dl>
                    <dl class="js-as-select" style="padding-top:16px">
                        <dt>搜索词位于</dt>
                        <dd>
                            <a href="javascript:void(0);" class="xz">网页中任何地方</a>
                            <ul>
                                <li><a href="javascript:void(0);" data-value="0">网页中任何地方</a></li>
                                <li><a href="javascript:void(0);" data-value="1">仅在标题中</a></li>
                                <li><a href="javascript:void(0);" data-value="2">仅在正文中</a></li>
                                <li><a href="javascript:void(0);" data-value="3">仅在网址中</a></li>
                            </ul>
                        </dd>
                        <input type="hidden" name="located" value="0">
                    </dl>
                    <dl class="js-as-select" style="padding-top:16px">
                        <dt>需要搜索的文件格式</dt>
                        <dd >
                            <a href="javascript:void(0);" class="xz">全部网页</a>
                            <ul>
                                <li><a href="javascript:void(0);" data-value="">全部网页</a></li>
                                <li><a href="javascript:void(0);" data-value="doc">Microsoft Word (.doc)</a></li>
                                <li><a href="javascript:void(0);" data-value="xls">Microsoft Excel (.xls)</a></li>
                                <li><a href="javascript:void(0);" data-value="ppt">Microsoft Powerpoint (.ppt)</a></li>
                                <li><a href="javascript:void(0);" data-value="pdf">Adobe Acrobat PDF (.pdf)</a></li>
                                <li><a href="javascript:void(0);" data-value="rtf">RTF (.rtf)</a></li>
                                <li><a href="javascript:void(0);" data-value="all">全部文档</a></li>
                            </ul>
                        </dd>
                        <input type="hidden" name="filetype" value="">
                    </dl>
                    <dl>
                        <dt>搜索结果排序方式</dt>
                        <dd class="js-as-radio">
                            <a uigs-id="adv_relevance-ranking" href="javascript:void(0);" data-value="off" class="dk-btn cur">按相关性排序</a>
                            <a uigs-id="adv_time-sort" href="javascript:void(0);" data-value="on" class="dk-btn">按时间排序</a>
                            <input type="hidden" name="tro" value="off">
                        </dd>
                    </dl>
                    <p class="enter"><input id="adv-search-btn" uigs-id="adv_search-btn" type="submit" class="a1" value="开始搜索"></p>
                </form>
            </div>
        </div>
    </div>
        <div class="wrapper" id="wrap">
            <div class="header">
                <div class="top-nav">
        <ul>
            <li><a onclick="st(this,'40030300','news')" href="http://news.sogou.com" uigs-id="nav_news" id="news">新闻</a></li>
            <li class="cur"><span>网页</span></li>
            <li><a onclick="st(this,'73141200','weixin')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">微信</a></li>
            <li><a onclick="st(this,'40051200','zhihu')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">知乎</a></li>
            <li><a onclick="st(this,'40030500','pic')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">图片</a></li>
            <li><a onclick="st(this,'40030600','video')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">视频</a></li>
            <li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,'','myingyi')">明医</a></li>
            <li><a href="http://english.sogou.com?fr=pcweb_index_nav" uigs-id="nav_overseas" id="overseas" onclick="st(this,'','overseas')" >英文</a></li>
            <li><a onclick="st(this,'web2ww','wenwen')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">问问</a></li>
            <li><a href="http://scholar.sogou.com?fr=common_index_nav" uigs-id="nav_scholar" id="scholar" onclick="st(this,'','scholar')">学术</a></li>
            <li class="show-more">
                <a href="javascript:void(0);" id="more-product">更多<i class="m-arr"></i></a>
                <div class="pos-more" id="products-box" style="top: 40px;">
                    <span class="ico-san"></span>
    
                    <a onclick="st(this,'40031000')" href="http://map.sogou.com" uigs-id="nav_map" id="map">地图</a>
                    <a onclick="st(this,'40031500')" href="http://gouwu.sogou.com/" uigs-id="nav_gouwu" id="index_more_gouwu">购物</a>
                    <a onclick="st(this,'40051203')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_more_baike">百科</a>
                    <a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">知识</a>
                    <a onclick="st(this,'40051205')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">应用</a>
                    <a onclick="st(this,'40051205','fanyi')" href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="index_more_fanyi">翻译</a>
                    <a href="http://index.sogou.com" uigs-id="nav_index" id="index_more_index">指数</a>
                                        <a href="http://dangjian.sogou.com" uigs-id="nav_dangjian" id="dangjian" onclick="st(this,'','dangjian')">党建</a>
                                    <span class="all"><a onclick="st(this,'40051206')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">全部</a></span>
                </div>
            </li>
        </ul>
    </div>            <div class="user-box">
        <div class="local-weather" id="local-weather">
            <div class="wea-box" id="cur-weather" style="display: none;"></div>
            <div class="pos-more" id="detail-weather" style="top:40px;"></div>
        </div>
        <span class="line" id="user-box-line" style="display: none;"></span>
        <div class="user-enter">
            <a href="javascript:void(0);" id="show-card" style="display: none" uigs-id="settings_show-card">显示卡片</a>
                        <a href="javascript:void(0);" uigs-id="settings_change-skin" id="changeSkinBtn" >换肤</a>
                    <span class="s-dw">
                <a href="javascript:void(0);" id="settings">设置</a>
                <div class="pos-more" id="settings-box" style="top:40px;">
                    <span class="ico-san"></span>
                    <a href="javascript:void(0);" id="search-settings" uigs-id="settings_config">搜索设置</a>
                    <a href="javascript:void(0);" id="advanced-search" uigs-id="settings_advanced">高级搜索</a>
                    <a href="http://help.sogou.com/?w=01091500&v=1" uigs-id="settings_help">帮助</a>
                </div>
            </span>
                        <a href="javascript:void(0);" class="enter" id="loginBtn">登录</a>            </div>
    </div>
            </div>
            <div class="content" id="content">
                <div class="pos-header" id="top-float-bar">
        <div class="part-one"></div>
        <div class="part-two" id="card-tab-layer">
            <div class="c-top" id="top-card-tab"></div>
        </div>
    </div>
    <div class="logo2" id="logo-s"><span></span></div>            <div class="logo" id="logo-l"><span></span></div>            <div class="search-box" id="search-box">
        <form action="/web" name="sf" id="sf">
            <span class="sec-input-box">
                <input type="text" class="sec-input active" name="query" id="query" maxlength="100" len="80" autocomplete="off" />
            </span>
            <span class="enter-input"><input type="submit" value="" id="stb"></span>
            <input type="hidden" name="_asf" value="www.sogou.com" />
            <input type="hidden" name="_ast" />
            <input type="hidden" name="w" value="01019900" />
            <input type="hidden" name="p" value="40040100" />
            <input type="hidden" name="ie" value="utf8" />
                    <input type="hidden" name="from" value="index-nologin" />
                    <input type="hidden" name="s_from" value="index" />
            <div class="keywords-tips" id="keywordsTips" style="display:none">
                <i></i><p>搜狗的查询限制在"<strong>40个汉字</strong>"以内。</p>
            </div>
        </form>
    </div>
            </div>
                <div class="card-box" id="card-box" style="display: none;">
        <div class="card-box2" id="card-box2">
            <div class="c-top" id="card-tab-box">
                <a href="javascript:void(0);" id="card-settings" uigs-id="settings_settings-btn" class="shezhi"></a>
                <div class="pos-more" id="card-options">
                    <span class="ico-san"></span>
                    <a href="javascript:void(0);" uigs-id="settings_close-card" id="close-card">关闭卡片</a>
                </div>
            </div>
            <div class="c-main" id="card-content"></div>
        </div>
    </div>
    <div class="loog-more" id="scroll-more" style="display: none;">
        <a href="javascript:void(0);" uigs-id="scroll-more">滚动查看更多<br><span class="ico_san"></span></a>
    </div>            <div class="ft" id="footer" style="display: none;">
        <a href="http://fuwu.sogou.com/" target="_blank" uigs-id="footer_tuiguang">企业推广</a><span class="line"></span><a href="http://corp.sogou.com/" target="_blank" uigs-id="footer_about">关于搜狗</a><span class="line"></span><a href="http://ir.sogou.com/" target="_blank" uigs-id="footer_aboutEnglish">About Sogou</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">免责声明</a><span class="line"></span><a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank"  uigs-id="footer_feedback">意见反馈及投诉</a><span class="line"></span><a href="http://corp.sogou.com/private.html" target="_blank" uigs-id="footer_private">隐私政策</a><br>
        &copy;&nbsp;2004-2019&nbsp;Sogou.com&nbsp;/&nbsp;<span class="g">京网文 (2016) 6432-852号</span>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a><br>
        <span class="g">(京)-经营性-2016-0019</span>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn/" target="_blank" class="g">京ICP备11001839号-1</a>&nbsp;/&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a>
    </div>
    <div class="ft-v1" id="QRcode-footer" style="padding-bottom:53px; ">
        <div class="erwm-box">
            <span class="ewm"></span>
            <div class="erwx">
                <p>搜狗搜索APP</p>
                <p class="p2">搜你所想</p>
            </div>
        </div>
        <div class="ft-info">
            <a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>搜狗输入法</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>浏览器</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>网址导航</a><br> <a href="http://corp.sogou.com/" target="_blank" class="g">关于搜狗</a>&nbsp;-&nbsp;<a href="http://ir.sogou.com/" target="_blank" class="g">About Sogou</a>&nbsp;-&nbsp;<a href="http://fuwu.sogou.com/" target="_blank" class="g">企业推广</a>&nbsp;-&nbsp;<a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">免责声明</a>&nbsp;-&nbsp;<a href="http://fankui.help.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">意见反馈及投诉</a>&nbsp;-&nbsp;<a href="http://corp.sogou.com/private.html" target="_blank" class="g" uigs-id="footer_private">隐私政策</a><br>
            &copy;&nbsp;2004-2019&nbsp;Sogou.com&nbsp;/&nbsp;<span class="g">京网文 (2016) 6432-852号</span>&nbsp;/&nbsp;<span class="g">(京)-经营性-2016-0019</span><br>
            <a href="http://www.miibeian.gov.cn" target="_blank" class="g">京ICP证050897号</a>&nbsp;/&nbsp;<a href="http://www.miibeian.gov.cn/" target="_blank" class="g">京ICP备11001839号-1</a>&nbsp;/&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a>
        </div>
    </div>            <div class="kuozhan" id="QRcode-box" style="display: none;">
        <a href="javascript:void(0);" id="miniQRcode"></a>
        <span id="QRcode"></span>
    </div>
    <a href="javascript:void(0);" class="back-top" id="back-top"></a>    </div>
            <script>
        var SugPara, uigs_para,
            msBrowserName = navigator.userAgent.toLowerCase(),
            msIsSe = false,
            msIsMSearch = false,
            hasDoodle = false,
            queryinput = document.getElementById('query');
    
        uigs_para={
            "uigs_productid": "webapp",
            "type": "webindex_new",
            "stype": "nologin",
            "scrnwi": screen.width,
            "scrnhi": screen.height,
            "uigs_pbtag": "A",
            "uigs_cookie": "SUID,sct",
                    "protocol": location.protocol.toLowerCase() == "https:" ? "https" : "http"
        };
    
        SugPara = {"enableSug":true,"sugType":"web","domain":"w.sugg.sogou.com","productId":"web","sugFormName":"sf","inputid":"query","submitId":"stb","suggestRid":"01015002","normalRid":"01019900","useParent":0 ,"sugglocation":"index","showVr":true,"showHotwords":true,"suggAbtestObject":{"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}};
    
            
        function mk_con() {
            try {
                window.external.metasearch('make_connection', 'www.google.com.hk');
            } catch (e) {}
        }
    
        if (/se 2.x/i.test(msBrowserName)) {
            msIsSe = true;
        }
    
        if (/metasr/i.test(msBrowserName)) {
            msIsMSearch = true;
        }
    
        if (queryinput) {
            if (msIsSe && msIsMSearch) {
                if (queryinput.addEventListener) {
                    queryinput.addEventListener('keypress', mk_con, false);
                    queryinput.addEventListener('keydown', mk_con, false)
                } else if (queryinput.attachEvent) {
                    queryinput.attachEvent('onkeypress', mk_con);
                    queryinput.attachEvent('onkeydown', mk_con);
                } else {
                    queryinput.onkeypress = mk_con;
                    queryinput.onkeydown = mk_con;
                }
            }
        }
        function getDomain(){
            var domainName = document.domain;
            if(domainName.indexOf("sogou.com")==(domainName.length-9)){
                return ".sogou.com";
            }else if(domainName.indexOf("soso.com")==(domainName.length-8)){
                return ".soso.com";
            }else if(domainName.indexOf("sogo.com") != -1){
                return ".sogo.com"
            }
        }
        window.m_s_index = function() {
            var w = document.sf.query,
                    c = Math.round((new Date().getTime() + Math.random()) * 1000);
    
            w.focus();
    
            if(new RegExp("kw=([^&]+)").test(location.search)) {
                if(w.value.length == 0) {
                    w.value = decodeURIComponent(RegExp.$1);
                }
            }
    
            if (document.cookie.indexOf("SUV=") < 0) {
                document.cookie = "SUV=" + c + ";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+getDomain();
            }
    
                                (new Image).src = '//pb6.sogou.com/v6';
            
        };
    
        function st(self, p, product, anchor) {
            var searchBox = document.sf.query,
                query = encodeURIComponent(searchBox.value),
    
                productUrl = {
                    "news": 'http://news.sogou.com/news?ie=utf8&query=',
                    "web": 'web?ie=utf8&query=',
                    "weixin": 'http://weixin.sogou.com/weixin?type=2&ie=utf8&query=',
                    "zhihu": 'http://zhihu.sogou.com/zhihu?ie=utf8&query=',
                    "pic": 'http://pic.sogou.com/pics?ie=utf8&query=',
                    "video": 'https://v.sogou.com/v?ie=utf8&query=',
                    "myingyi": 'https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=',
                    "overseas": 'http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=',
                    "scholar": 'http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=',
                    "fanyi": 'http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=',
                    "wenwen":'http://wenwen.sogou.com/s/?ch=websearch&w=',
                    "dangjian":'http://dangjian.sogou.com/dangjian?query='
                },
                newHref = productUrl[product] || self.href;
    
            function getConnectSymbol(url) {
                return url.indexOf("?") > -1 ? '&' : '?';
            }
    
            if(searchBox && searchBox.value !== ''){
    
                if(productUrl[product]) {
                    newHref = productUrl[product] + query;
                } else if(newHref.indexOf("kw=") > 0) {
                    newHref = newHref.replace(new RegExp("kw=[^&$]*"), "kw=" + query)
                } else {
                    newHref += getConnectSymbol(newHref) + 'kw=' + query;
                }
            }
    
            if(p){
                newHref += getConnectSymbol(newHref) + "p=" + p;
            }
    
            if (anchor && anchor.length > 0){
                newHref += "#" + anchor;
            }
    
            if (searchBox && searchBox.value == '' && (product == 'wenwen' || product == 'dangjian')){//问问首页链接单独处理
                newHref = self.href;
            }
    
            self.href = newHref;
        }
    
        window.cid = function(o, p) {
            var w = document.sf.query,
                q = encodeURIComponent(w.value);
    
            if (!q) {
                o.href += "?cid=" + p
            } else {
                if (p === "web2ww") {
                    o.href += "s/?cid=web2ww&w=" + q
                } else if (p === "web2bk") {
                    o.href += "Search.e?sp=S" + q + "&cid=web2bk"
                }
            }
        };
    
        window.m_s_index();
    </script>
    <script src="//dlweb.sogoucdn.com/common/lib/jquery/jquery-1.11.0.min.js"></script>
    <script charset="gbk" type="text/javascript" src="/js/sugg_new.v.104.js"></script>
    <script src="/js/pb_v.1.9.6.min.js"></script>
    <script src="/js/lib/jquery.mousewheel.min.js"></script>
    <script src="/js/lib/juicer-min.js"></script>
    <script src="/js/common/widget/login_new.min.v.0.5.js"></script>
    <script src="//account.sogou.com/static/api/passport-async.js"></script>
    <script src="/web/index/js/base.v.1.1.14.js"></script>
    <script src="/web/js/voice.min.v.0.0.6.js"></script>
    <script src="/web/js/taspeed.min.v.0.0.1.js"></script>
    </body>
    </html>
    <!--zly-->
    View Code

    3.移动端数据爬取&&环境配置等

    实验:参考下面的blog

    https://www.cnblogs.com/bobo-zhang/p/10068994.html

    - 移动端数据爬取:
        - 抓包工具:(定义:代理服务器)
            window:- fiddler,mitproxy(两者都是代理服务器)
         mac:青花瓷
    - 在手机中安装证书: - 1让电脑开启一个wifi,然后手机连接wifi(手机和电脑是在同一个网段下) - 手机浏览器中:ip:8888,点击超链进行证书下载 - 需要将手机的代理开启:将代理ip和端口号设置成fiddler的端口和fidd所在机器的ip

    (1)将证书发送给"手机"

    (2)在Fiddler中,点击Tools=>Options=>

    下一步,"允许"其他设备连接:=>"确定"=>OK

    在浏览器中访问:http://localhost:8888/http://localhost:8888/

    得到下面的结果

    我们可以在上图的最后一行下载"证书"

  • 相关阅读:
    【转载】线程数究竟设多少合理
    【转载】lvs为何不能完全替代DNS轮询
    接口测试考虑点
    隐式等待的两种写法
    邮件的操作
    Python列表排序 reverse、sort、sorted 操作方法
    兼容和适配的区别
    文件操作-oracle数据库
    初试线程-文件操作
    Selenium Grid分布式测试入门笔记
  • 原文地址:https://www.cnblogs.com/studybrother/p/10957649.html
Copyright © 2020-2023  润新知