下半场ING,好吧,本来准备明天写的(拖延真快乐.gif),请然而,,,早上八点多跑公司加班(看书+学习)去,发现大门上挂着一把大锁,我只想说门禁是拿来看的嘛,加啥破锁o(╥﹏╥)o,严重打击了好员工的加班热情[○·`Д'·○],所以滚回来写博客了,忧伤,写了点代码还踩了半天坑(偷懒惹的祸),忧伤10086 ...
7.31号上午,也就是离职那天,机智的我突然想到,requests.get()下载文件实际是获取到的二进制内容然后写入到自己创建的文件中,以此来实现下载图片,文档,视频和等等,既然如此,那我直接打开已有的视频文件将新获取的二进制内容添加到后面,不就可以更直接的解决ts流合并的问题了嘛,机智如我,立刻进行了尝试,特意注意了文件打开和写入时的编码问题,果然搞定了,,,然后下午跑去签离职协议,完美收工。
运行环境:windows和linux(注意修改下路径),python36
代码如下:
# !/user/bin/env python # -*- coding: utf-8 -*- # au: caopeiya # 201808011 import os, shutil import urllib.request, urllib.error, requests # 打开并读取网页内容 def getUrlData(url): try: urlData = urllib.request.urlopen(url, timeout=20) # .read().decode('utf-8', 'ignore') # urlData = requests.get(url, timeout=20) # .read().decode('utf-8', 'ignore') return urlData except Exception as err: print(f'err getUrlData({url}) ', err) return -1 # 下载文件-requests def getDown_reqursts(url, file_path): try: header = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"} response = requests.get(url, timeout=120, headers=header) with open(file_path, mode='ab+') as f: f.write(response.content) # 下载文件较大时,使用循环下载 # with open(file_path, mode='wb') as f: # for content in response.iter_content(1024): # f.write(content) print("down successful!") except Exception as e: print(e) def getVideo_requests(url_m3u8, path, videoName): print('begin run ~~ ') urlData = getUrlData(url_m3u8) tempName_video = os.path.join(path, f'{videoName}.ts') # f'{}' 相当于'{}'.format() 或 '%s'%videoName open(tempName_video, "wb").close() # 清空(顺带创建)tempName_video文件,防止中途停止,继续下载重复写入 # print(urlData) for line in urlData: # 解码decode("utf-8"),由于是直接使用了所抓取的链接内容,所以需要按行解码,如果提前解码则不能使用直接进行for循环,会报错 url_ts = str(line.decode("utf-8")).strip() # 重要:strip(),用来清除字符串前后存在的空格符和换行符 if not '.ts' in url_ts: continue else: if not url_ts.startswith('http'): # 判断字符串是否以'http'开头,如果不是则说明url链接不完整,需要拼接 # 拼接ts流视频的url url_ts = url_m3u8.replace(url_m3u8.split('/')[-1], url_ts) print(url_ts) getDown_reqursts(url=url_ts, file_path=tempName_video) # 下载视频流 filename = os.path.join(path, f'{videoName}.mp4') shutil.move(tempName_video, filename) print(f'Great, {videoName}.mp4 finish down!') if __name__ == '__main__': url_m3u8 = 'http://wscdn.alhls.xiaoka.tv/201886/2f5/75a/HoHdTc1LjUaBjZbJ/index.m3u8' path = r'D:\' videoName = url_m3u8.split('/')[-2] getVideo_requests(url_m3u8, path, videoName) # getDown_reqursts('http://wscdn.alhls.xiaoka.tv/201886/2f5/75a/HoHdTc1LjUaBjZbJ/147.ts', f'D:/videos/84.ts')
要注意以下几点:
1.用于解码(“utf-8”)进行解码,由于循环获取的每一行t都是是ASCII编码,必须解码为utf-8才能变为可识别的字符串;
2.用.strip()去除每一行的空格符和换行符,