【Python爬虫】第五课（b站弹幕）

首先，非常感谢大神的文章 https://www.cnblogs.com/LexMoon/p/pyspider03.html#4361286

import requests
import re
av_id = '67946325'
headers = {
    'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
    'Accept': 'text/html',
    'Cookie': "嘿嘿"

}
resp = requests.get('https://www.bilibili.com/video/av'+av_id,headers=headers)

match_rule = r'cid=(.*?)&aid'
oid = re.search(match_rule,resp.text).group().replace('cid=','').replace('&aid','')
print('oid='+oid)

xml_url = 'https://api.bilibili.com/x/v1/dm/list.so?oid='+oid

resp = requests.get(xml_url,headers=headers)


if resp.encoding == 'ISO-8859-1':
    encodings = requests.utils.get_encodings_from_content(resp.text)
    if encodings:
        encoding = encodings[0]
    else:
        encoding = resp.apparent_encoding
    global encode_content
    encode_content = resp.content.decode(encoding,'replace')
    
print(encode_content)

#爬虫headers需要包含什么内容才不会返回404呢？我尝试7个全写，发现就不对。
#正则表达式快忘记了……
#最后的乱码解决方案

相关阅读:
【C】中malloc的使用
【C++】const成员函数的作用
C声明和定义
【C++】指针与引用的区别
【C】external/internal/static/register variable and function
【C++】Sizeof与Strlen
【面试题目】string类实现
【C++】public,private,protected
【Linux】僵尸进程
【面试题目】翻转句子中单词的顺序

原文地址：https://www.cnblogs.com/break03/p/11575327.html