• selenium:css 定位与网页翻页功能


    作业:

    打开百度,搜索selenium,找到第一页到第十页搜索结果标题不包含selenium的标题,统计数量,并打印标题出来

    在实现过程中出现的难点:翻页过程中元素过期,无法定位到翻页元素

    解决方案一:

    http://blog.sina.com.cn/s/blog_65bc768e0102ve78.html
    解决方案二:刷新页面重新获取元素,思路来源:https://blog.csdn.net/freesigefei/article/details/50501961

    实现代码:

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-
    # @Time    : 2018/5/7 22:19
    # @Author  : fguo1029
    # @File    : findkw.py
    from selenium import webdriver
    import time
    
    #定位搜索结果的标题
    def find_one_page(list_one_page=[], kw=''):
        driver.find_element_by_css_selector('.s_ipt').send_keys(kw)
        driver.find_element_by_css_selector('#su').click()
        time.sleep(2)
        # string_all = driver.find_elements_by_css_selector("div[srcid='1599']")
        string_all = driver.find_elements_by_css_selector("h3>a[target='_blank'] ")
        i = 0
        j = 0
        for string_one in string_all:
            baidu_href = string_one.get_attribute('href')
            kw_string = string_one.get_attribute('text')
            kw_string1 = kw_string.lower()
            if 'baidu' in baidu_href:
                if kw in kw_string1:
                    i += 1
                # print(kw_string, i)
                else:
                    j += 1
                    if kw_string.strip() == '':
                        pass
                    else:
                        list_one_page.append(kw_string)
        return list_one_page
    
    #翻页功能
    def find_kw_not_in_page(kw='', page=0):
        flag = 0
        list_all_page = []
        while flag < page:
            find_one_page(list_all_page, kw)
            flag += 1
            print(' looking for %s in page %d ' % (kw, flag))
            driver.refresh()
            driver.implicitly_wait(2)
            driver.find_element_by_css_selector("div#page> :last-child").click()
            driver.implicitly_wait(2)
        return list_all_page
    
    
    if __name__ == '__main__':
        driver = webdriver.Chrome('G:Pythonpython3chromedriver.exe')
        driver.get('http://www.baidu.com')
        driver.implicitly_wait(2)
    
        titles = find_kw_not_in_page('selenium', 10)
    
        for i in titles:
            print(i)
        print('total titles numbers is %d' % len(titles))
    
        time.sleep(2)
        print('try to close chrome')
        driver.quit()
    

      

    学好python自动化,工作效率顶呱呱
  • 相关阅读:
    ngnix-内网能用,外网不能用
    python学习
    mysql数据库导出xls-自定义
    Oralce-资源配置PROFILE
    oracle-用户管理与权限分配
    Oracle-创建索引分区
    Oracle-表分区
    Oracle--索引视图序列等对象
    Oracle-数据表对象
    Oracle-管理表空间和数据文件
  • 原文地址:https://www.cnblogs.com/fenglovellx/p/9022643.html
Copyright © 2020-2023  润新知