<head><meta content="IE=Edge" http-equiv="X-UA-Compatible"/><link href="//static.youku.com" rel="dns-prefetch"/><link href="//static.ykimg.com" rel="dns-prefetch"/><link href="//r1.ykimg.com" rel="dns-prefetch"/><link href="//r2.ykimg.com" rel="dns-prefetch"/><link href="//r3.ykimg.com" rel="dns-prefetch"/><link href="//r4.ykimg.com" rel="dns-prefetch"/><link href="//g1.ykimg.com" rel="dns-prefetch"/><link href="//g2.ykimg.com" rel="dns-prefetch"/><link href="//g3.ykimg.com" rel="dns-prefetch"/><link href="//g4.ykimg.com" rel="dns-prefetch"/><link href="//p.l.youku.com" rel="dns-prefetch"/><link href="//urchin.lstat.youku.com" rel="dns-prefetch"/><link href="//html.atm.youku.com" rel="dns-prefetch"/><meta content="text/html; charset=utf-8" http-equiv="content-type"/><meta content="zh-cn" http-equiv="content-language"/><title>剧综影漫_土豆视频</title><meta content="视频,视频分享,视频搜索,视频播放,土豆视频" name="keywords"/><meta content="土豆-中国第一视频网站,提供视频播放,视频发布,视频搜索 - 视频服务平台,提供视频播放,视频发布,视频搜索,视频分享 - 土豆视频" name="description"/><meta content="a2h28" name="data-spm"/><link href="/favicon.ico" rel="shortcut icon"/><link href="//static.youku.com/yk/lib/css/tudou.8f50c0ed37.css" rel="stylesheet"/><link href="//static.youku.com/yk/newtudou/css/pc/category/category.340b6db21c.css" rel="stylesheet"/><script>var Local={"domain":{"default":"www.youku.com","test":"test.youku.com","subscribe":"ding.youku.com","uc":"i.youku.com","video":"v.youku.com","rz":"rz.youku.com","userlive":"userlive.youku.com","esign":"hetong.youku.com","listpage":"list.youku.com","xinterest":"x.youku.com","ypartner":"yp.youku.com","interact":"hudong.pl.youku.com","creation":"mp.tudou.com","uctg":"uctg.youku.com","playlists":"playlists.youku.com","static":"static.youku.com","passport":"account.youku.com","static_ext":"static.ykimg.com","static_ext_js":"js.ykimg.com","static_ext_css":"css.ykimg.com"},"service":{"push":"push.youku.com","interact":"hudong.pl.youku.com"},"debug":false};</script><script>var require = {"baseUrl": "//static.youku.com/newtudou/js/"};</script><script>if(require){require.paths={"main.category": "//static.youku.com/yk/newtudou/js/pc/category/main.category.f22a91da07"};}</script><script data-main="main.category" src="//static.youku.com/yk/lib/js/base.tudou.464a1349ea.js"></script></head>
print(soup.title)
<title>剧综影漫_土豆视频</title>
print(soup.title.text)
剧综影漫_土豆视频
这是个获取tag的小窍门,可以在文档树的tag中多次调用这个方法.下面的代码可以获取<body>标签中的第一个<b>标签:
print(soup.body.b)
<b class="line-after"></b>
print(soup.body.b['class'])
['line-after']
按照CSS类名搜索tag的功能非常实用,但标识CSS类名的关键字 class 在Python中是保留字,使用 class 做参数会导致语法错误.从Beautiful Soup的4.1.1版本开始,可以通过 class_
<div class="hd">
<a class="" href="https://movie.douban.com/subject/1291546/">
<span class="title">霸王别姬</span>
<span class="other"> / 再见,我的妾 / Farewell My Concubine</span>
</a>
<span class="playable">[可播放]</span>
</div>
<div class="hd">
<a class="" href="https://movie.douban.com/subject/1295644/">
<span class="title">这个杀手不太冷</span>
<span class="title"> / Léon</span>
<span class="other"> / 杀手莱昂 / 终极追杀令(台)</span>
</a>
<span class="playable">[可播放]</span>
</div>
import requests
from bs4 import BeautifulSoup
link="https://movie.douban.com/top250?start=1"
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'}
req=requests.get(link,headers=headers)
soup=BeautifulSoup(req.text)
div_list=soup.find_all("div",class_="hd")
for ls in div_list:
print(ls.a.span.text)
霸王别姬
这个杀手不太冷
阿甘正传
美丽人生
泰坦尼克号
千与千寻
辛德勒的名单
盗梦空间
忠犬八公的故事
机器人总动员
三傻大闹宝莱坞
.......
import requests
from bs4 import BeautifulSoup
link="http://category.tudou.com/category/c_96_r_2019_p_1.html"
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'}
req=requests.get(link,headers=headers)
soup=BeautifulSoup(req.text,"lxml")
'''print(soup.p)'''
div_list=soup.find_all('a',class_='v-meta__title__link')
for ls in div_list:
print(ls.text)
print(ls.attrs['href'])
print(ls.attrs['title'])
print(ls.attrs)
print(ls)
雪暴
//video.tudou.com/v/XNDIyNjAzNzg0OA==.html
雪暴
{'href': '//video.tudou.com/v/XNDIyNjAzNzg0OA==.html', 'target': 'video', 'title': '雪暴', 'class': ['v-meta__title__link'], 'data-spm': ''}
<a class="v-meta__title__link" data-spm="" href="//video.tudou.com/v/XNDIyNjAzNzg0OA==.html" target="video" title="雪暴">雪暴</a>
流浪地球
//video.tudou.com/v/XNDE0ODQ5NzczNg==.html
流浪地球
{'href': '//video.tudou.com/v/XNDE0ODQ5NzczNg==.html', 'target': 'video', 'title': '流浪地球', 'class': ['v-meta__title__link'], 'data-spm': ''}
<a class="v-meta__title__link" data-spm="" href="//video.tudou.com/v/XNDE0ODQ5NzczNg==.html" target="video" title="流浪地球">流浪地球</a>
..........................