• Python爬虫之正则表达式(3)


     1 # re.sub
     2 # 替换字符串中每一个匹配的子串后返回替换后的字符串
     3 import re
     4 content = 'Extra strings Hello 1234567 World_This is a Regex Demo Extra strings'
     5 content = re.sub('d+', '', content)
     6 print(content)
     7 
     8 import re
     9 content = 'Extra strings Hello 1234567 World_This is a Regex Demo Extra strings'
    10 content = re.sub('d+', 'Replacement', content)
    11 print(content)
    12 
    13 # 1 是转义字符
    14 import re
    15 content = 'Extra strings Hello 1234567 World_This is a Regex Demo Extra strings'
    16 content = re.sub('(d+)', r'1 8910', content)
    17 print(content)
    18 
    19 # re.compile
    20 # 将正则字符串编译成正则表达式对象
    21 # 将一个正则表达式串编译成正则对象,以便于复用该匹配模式
    22 import re
    23 content = '''Hello 1234567 World_This
    24 is a Regex Demo'''
    25 pattern = re.compile('Hello.*Demo', re.S)
    26 result = re.match(pattern, content)
    27 print(result)

    下面是爬取豆瓣图书的实战代码

     1 import requests
     2 import re
     3 content = requests.get('https://book.douban.com/').text
     4 # print(content)
     5 pattern = re.compile('<li.*?cover.*?title="(.*?)".*?author">(.*?)</div>.*?year">(.*?)</span>.*?</li>', re.S)
     6 results = re.findall(pattern, content)
     7 for result in results:
     8     name, author, date = result
     9     author = re.sub("s", "", author)
    10     date = re.sub("s", "", date)
    11     print("【书名】:", name, " 【作者】:", author, " 【出版年】:", date)

    本篇内容为:崔庆才爬虫学习笔记

  • 相关阅读:
    centos7上修改lv逻辑卷的大小
    centos6上调整lv逻辑卷
    nginx的日志配置
    修改Linux系统默认编辑器
    mysqldump命令的安装
    centos7上设置中文字符集
    nginx的80端口跳转到443
    ubuntu上安装docker和docker-compose
    javascript递归、循环、迭代、遍历和枚举概念
    Lattice 开发工具Diamond 相关版本下载地址
  • 原文地址:https://www.cnblogs.com/duxie/p/10033388.html
Copyright © 2020-2023  润新知