BeautifulSoup实现博文简介与过滤恶意标签（xxs攻击）

BeautifulSoup实现博文简介与过滤恶意标签（xxs攻击）

一、BeautifulSoup模块

二、博文简介

三、过滤恶意标签

一、BeautifulSoup模块

pip install bs4 # 安装bs4

from bs4 import BeautifulSoup # 导入BeautifulSoup

二、博文简介

from bs4 import BeautifulSoup

content = '<a href="http://example.com/">I linked to <i>example.com</i></a>'

soup = BeautifulSoup(content, 'html.parser')

overview = soup.text[0:9]

print(overview)

三、过滤恶意标签

from bs4 import BeautifulSoup

content = '<a href="http://example.com/">I linked to <i>example.com</i></a><div><img src=""></img>image</div><a>link</a><script>alert(123)</script>'

soup = BeautifulSoup(content, 'html.parser')

print(soup)  # 这里带有script标签的脚本

for tag in soup.find_all():

    if tag.name in ['script', 'link']:

        tag.decompose()

print(soup)  # 这里已经把带有script标签的脚本去掉了
相关阅读:
多表查询
 Java基础
 group by 和 having 用法
 多态
 修改用户权限
 集成测试过程
 系统测试
 软件验收测试通过准则
 性能测试、负载测试、压力测试的区别
 白盒测试
原文地址：https://www.cnblogs.com/changwoo/p/9623487.html