目的:
前一篇文章写的也是爬取前程无忧,该篇文章对脚本进行了部分改动,增加了灵活性
1.利用隐式等待,废除time.sleep的使用,节约时间成本
2.添加了用户名,密码及job名称参数,更加灵活
3.增加了下一页判断,如果某个job有多页结果,可以查询到全部信息
代码:
#!/usr/bin/env python #-*- coding:utf-8 -*- """ 目的: 从前程无忧网站上提取指定工作的详细信息 """ import time from selenium import webdriver import requests from bs4 import BeautifulSoup from getpass import getpass def get_soup(url): headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0" } try: response = requests.get(url, headers=headers) if response.status_code == 200: # response.apparent_encoding = "utf-8" html = response.content soup = BeautifulSoup(html, 'html.parser') except: print("爬取失败") return soup def get_content(soup): content = soup.find("div", class_="bmsg job_msg inbox").text # print(content) return content.strip() def final_result(url): soup = get_soup(url) result = get_content(soup) return result def next_page(): print("