模拟登陆+数据爬取 (python+selenuim)

以下代码是用来爬取LinkedIn网站一些学者的经历的，仅供参考，注意：不要一次性大量爬取会被封号，不要问我为什么知道

#-*- coding:utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup

diver=webdriver.Chrome()
diver.get('https://www.linkedin.com/')
#等待网站加载完成
time.sleep(1)
#模拟登陆
diver.find_element_by_id('login-email').send_keys(用户名)
diver.find_element_by_id('login-password').send_keys(密码)
# 点击跳转
diver.find_element_by_id('login-submit').send_keys(Keys.ENTER)
time.sleep(1)
#查询
 diver.find_element_by_tag_name('input').send_keys(学者名)
diver.find_element_by_tag_name('input').send_keys(Keys.ENTER)
time.sleep(1)
#获取当前页面所有可能的人
soup=BeautifulSoup(diver.page_source,'lxml')
items=soup.findAll('div',{'class':'search-result__wrapper'})
n=0
for i in items:
n+=1
title=i.find('div',{'class':'search-result__image-wrapper'}).find('a')['href']
diver.get('https://www.linkedin.com'+title)
time.sleep(3)
Soup=BeautifulSoup(diver.page_source,'lxml')
# print Soup
Items=Soup.findAll('li',{'class':'pv-profile-section__card-item pv-position-entity ember-view'})
print str(n)+':'
for i in Items:
    print i.find('div',{'class':'pv-entity__summary-info'}).get_text().replace('
','')
diver.close()

相关阅读:
CodeForces 97 E. Leaders（点双连通分量 + 倍增）
51nod 1318 最大公约数与最小公倍数方程组（2-SAT）
关于 atcoder 页面美化的 css
凸优化小结
LOJ #2802. 「CCC 2018」平衡树(整除分块 + dp)
AGC 016 F
BZOJ 3745: [Coci2015]Norma（分治）
BZOJ 1124: [POI2008]枪战Maf（构造 + 贪心）
Linux之Json20160705
Linux之ioctl20160705

原文地址：https://www.cnblogs.com/ybf-yyj/p/8059171.html