• (七) 爬虫之爬取视频和音频文件


      之前都是爬取网页中的文本信息,没有爬取过视频和音频文件,所以爬取了下b站和网易云音乐,记录下整个过程,留着学习。

    1. 爬取b站视频

      1.1 网页分析

      最近python机器学习比较火热,那就爬取点机器学习的视频吧。首先打开b站网页,输入“python机器”进行搜索,返回页面中,审查元素可以发现每个视频系列都有一个唯一的ID,如下图所示: av28879057, 即为当前视频的一个ID值。

      得知每个视频对应的唯一ID后,点击视频进去查看下,发现视频url主要有这下面这两种:

              1:https://www.bilibili.com/video/av28879057  (视频只有一集,url即为上面我们观察到的ID值)

       

        2. https://www.bilibili.com/video/av30292394/?p=3 (视频为一个系列,后面参数p=3,表示该ID下的第三集)

      至此我们基本上对于每个视频界面的url构造清楚了,接下来就是寻找视频的下载地址了。刷新下网页,点击播放,查看下网络请求,对结果按大小排序,可以发现一个x-flv格式的大文件的传输请求,应该就是视频的下载地址,如下图所示,可以看到请求需要7个参数,研究了下别的视频后发现,有两个参数是动态变化的:ssig和trid。查看了下其他的json返回请求,并没有发现这两个参数,最后只能去网页源码里搜索下,看看有没有相关的动态生成函数,却发现网页源码中直接包含视频的下载地址,存在于一个window.__playinfo__={} 的字典json中,只需对其正则匹配就行了,这下就简单了。

          将这个字典匹配后进行查看,结果如下:可以发现整个视频被拆分成了多个小的视频,按顺序进行了编号,order为序号,url即为视频下载地址,因此只需要分别对这些视频进行下载,最后再拼接就可以了。

    {
        "code": 0,
        "message": "0",
        "ttl": 1,
        "data": {
            "from": "local",
            "result": "suee",
            "message": "",
            "quality": 32,
            "format": "flv480",
            "timelength": 7121936,
            "accept_format": "flv720,flv480,flv360",
            "accept_description": ["高清 720P", "清晰 480P", "流畅 360P"],
            "accept_quality": [64, 32, 16],
            "video_codecid": 7,
            "seek_param": "start",
            "seek_type": "offset",
            "durl": [{
                "order": 1,
                "length": 363246,
                "size": 24653145,
                "ahead": "EZA=",
                "vhead": "AWQAH//hAB5nZAAfrNlAvD3m//DQEM/xAAADAAEAAAMAPA8YMZYBAAVo6+zyPA==",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?expires=1554535500&platform=pc&ssig=tz7ktrLd7bdj8qukIG9cjQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?expires=1554535500&platform=pc&ssig=tz7ktrLd7bdj8qukIG9cjQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=732e5ee7aad2a9a08406b92aa0bb2ca3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 2,
                "length": 330944,
                "size": 23865726,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?expires=1554535500&platform=pc&ssig=LemBQ8rVic-aAAN9iXwWGg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?expires=1554535500&platform=pc&ssig=LemBQ8rVic-aAAN9iXwWGg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?e=ig8euxZM2rNcNbR3hwdVhoM1nwdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=bb0c67342e48e1a8b438dcc9606f9e91&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 3,
                "length": 352981,
                "size": 25848758,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?expires=1554535500&platform=pc&ssig=vSDeETHYfUOLYf8caLiW5Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?expires=1554535500&platform=pc&ssig=vSDeETHYfUOLYf8caLiW5Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?e=ig8euxZM2rNcNbR3hbUVhoM1nwNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=30faa351c57a559f7b69654809418da9&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 4,
                "length": 394413,
                "size": 26565740,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?expires=1554535500&platform=pc&ssig=uaupgm_tbgSyVbou66oO-A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?expires=1554535500&platform=pc&ssig=uaupgm_tbgSyVbou66oO-A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=2bb21503e670b1a82769ed6524ea7c25&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 5,
                "length": 388312,
                "size": 26901267,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?expires=1554535500&platform=pc&ssig=DM7BjFfnFGzoux7NA7Ix5g&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?expires=1554535500&platform=pc&ssig=DM7BjFfnFGzoux7NA7Ix5g&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=68a9f6b8213285eb7fba15736e2c683b&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 6,
                "length": 239979,
                "size": 15473865,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?expires=1554535500&platform=pc&ssig=KGQ7DIH2XeAfW0QU4C7X7w&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?expires=1554535500&platform=pc&ssig=KGQ7DIH2XeAfW0QU4C7X7w&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?e=ig8euxZM2rNcNbRjhwdVhoM17bdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=4e27dfa3076edd399b0e6ee547f1dd51&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 7,
                "length": 426645,
                "size": 29245686,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?expires=1554535500&platform=pc&ssig=X_NsbB2FEjaE4W2yGI2YMQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?expires=1554535500&platform=pc&ssig=X_NsbB2FEjaE4W2yGI2YMQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?e=ig8euxZM2rNcNbRahwdVhoM17zdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=c3ef5ea3bdd2ab1ac310970d85341c80&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 8,
                "length": 423211,
                "size": 30372670,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?expires=1554535500&platform=pc&ssig=rU90cc9rkqn--2je747LAQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?expires=1554535500&platform=pc&ssig=rU90cc9rkqn--2je747LAQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?e=ig8euxZM2rNcNbRa7zUVhoM17zuBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=98d5301937834486e0bd9c2996cd73f4&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 9,
                "length": 291178,
                "size": 19475045,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?expires=1554535500&platform=pc&ssig=sMfGnyjVuKCsOzIp9EAanQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?expires=1554535500&platform=pc&ssig=sMfGnyjVuKCsOzIp9EAanQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?e=ig8euxZM2rNcNbRj7WdVhoM17bUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=08439132f1831b423be6577c7bd5ef89&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 10,
                "length": 370880,
                "size": 25219151,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?expires=1554535500&platform=pc&ssig=kKqhofi4ayRRMoquCxz-pw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?expires=1554535500&platform=pc&ssig=kKqhofi4ayRRMoquCxz-pw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=3d5d140e0dd02a83245ae86da23eb8b9&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 11,
                "length": 381612,
                "size": 26624914,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?expires=1554535500&platform=pc&ssig=HFFhsFFGyXOV8Q3QmF8sJQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?expires=1554535500&platform=pc&ssig=HFFhsFFGyXOV8Q3QmF8sJQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=63f2b71981080c752eed5166a9a85332&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 12,
                "length": 361344,
                "size": 25254786,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?expires=1554535500&platform=pc&ssig=UuAqqNbr1xC5gMlu5FUYdQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?expires=1554535500&platform=pc&ssig=UuAqqNbr1xC5gMlu5FUYdQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=1705e8d3f1075a717c6a91ae018396fe&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 13,
                "length": 334912,
                "size": 24639608,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?expires=1554535500&platform=pc&ssig=MQbcDgFo8iqQ2Uf4yO-L0A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?expires=1554535500&platform=pc&ssig=MQbcDgFo8iqQ2Uf4yO-L0A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?e=ig8euxZM2rNcNbR3hbUVhoM1nwNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=a5f1be479528b8a92a462acab849af46&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 14,
                "length": 365845,
                "size": 24930389,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?expires=1554535500&platform=pc&ssig=bpVSp4oDvkaLf1HTlWl5xA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?expires=1554535500&platform=pc&ssig=bpVSp4oDvkaLf1HTlWl5xA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?e=ig8euxZM2rNcNbRahwdVhoM17zdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=31a76487e1d32acd5b573f45a4169997&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 15,
                "length": 338347,
                "size": 23943047,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?expires=1554535500&platform=pc&ssig=ieioDVAxcZLksQ55egulgg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?expires=1554535500&platform=pc&ssig=ieioDVAxcZLksQ55egulgg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=36cdd93257a88fdc90a0c85f2b9babe3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 16,
                "length": 475181,
                "size": 34293360,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?expires=1554535500&platform=pc&ssig=Ps_lae8ZoX800sJZh-eRRA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?expires=1554535500&platform=pc&ssig=Ps_lae8ZoX800sJZh-eRRA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?e=ig8euxZM2rNcNbR3hwdVhoM1nwdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=fe34135d841548c79f78f687282c6bc3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 17,
                "length": 204846,
                "size": 13746922,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?expires=1554535500&platform=pc&ssig=mzbEJYcCFWAO0ioYePxG_Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?expires=1554535500&platform=pc&ssig=mzbEJYcCFWAO0ioYePxG_Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=915e7dc2c91a4072e91bd43988379c8b&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 18,
                "length": 469078,
                "size": 32875195,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?expires=1554535500&platform=pc&ssig=gdm21_hyrHYWZfsmgPkMDA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?expires=1554535500&platform=pc&ssig=gdm21_hyrHYWZfsmgPkMDA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=e644f16f487b7bd5625326a550716479&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 19,
                "length": 328213,
                "size": 21350561,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?expires=1554535500&platform=pc&ssig=3LoFiUwUGXFRJHBpigewOw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?expires=1554535500&platform=pc&ssig=3LoFiUwUGXFRJHBpigewOw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?e=ig8euxZM2rNcNbRjhbUVhoM17bNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=ec0d506311d0efdbfbf576d297c3ebba&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }, {
                "order": 20,
                "length": 280769,
                "size": 19777669,
                "ahead": "",
                "vhead": "",
                "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?expires=1554535500&platform=pc&ssig=r8NbvnHMQ58qfdYJHoD4kw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1",
                "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?expires=1554535500&platform=pc&ssig=r8NbvnHMQ58qfdYJHoD4kw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=1991fbc0d72dfe2aaac05943f26d54e4&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"]
            }]
        },
        "session": "e5c0e030d13633062a9889d1390010d9",
        "videoFrame": {}
    }
    View Code

     

      1.2 视频下载

       根据上面的分析过程,视频爬取步骤如下:

          1,根据视频的ID,构造该视频的url

          2,访问视频url,对返回的网页进行正则匹配,拿到所有的视频下载地址和编号

          3,根据视频下载地址,将视频保存到本地 (请求头中注意加入Referer和Origin,否则会返回Http 458)

       代码如下:

    #coding:utf-8
    import requests
    import re
    import json
    import os
    import time 
    import subprocess
    
    #传入视频的url
    def down_video(video_url,path="temp_videos"):
        """
        video_url 待下载的video的url
        path 下载的视频保存地址
        """
        #video_url = "https://www.bilibili.com/video/av30292394?p=3"
        #video_url = "https://www.bilibili.com/video/av28879057"
        
        headers = {
            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; r…) Gecko/20100101 Firefox/66.0",
        }
        response = requests.get(video_url,headers=headers)
    
        #在网页源码中匹配视频地址信息
        match_text = re.search(r'<script>window.__playinfo__=({.*?})</script>',response.text,re.S) #re.S,将字符窜中有换行时,将字符窜作为一个整体进行匹配;(否则一行匹配不到时,再匹配下一行)
    
        json_data = json.loads(match_text.group(1),encoding="utf-8")  #match_text.group(1)为unicode字符窜
        urls = json_data["data"]["durl"]  #视频包括多个部分,拿到包括各个部分url的列表
        content_size = sum([item["size"] for item in urls]) #视频总大小
        print("视频总大小为:%0.2f Mb"%(content_size/(1024*1024)))
    
        if not os.path.exists(path):
            os.mkdir(path)
    
        header={
            "Origin":"https://www.bilibili.com",
            "Referer":video_url,                    #请求头必须添加referer
        }
        headers.update(header)
        size=0
        start = time.time()
        for i,item in enumerate(urls):
            url = item["url"]
            try:
                result = requests.get(url,headers=headers,stream=True,verify=False)
                print result.status_code
                video_path = os.path.join(path,"{}.mp4".format(i))
                with open(video_path,"wb") as f:
                    for chunk in result.iter_content(1024):
                        f.write(chunk)
                        f.flush() #清空缓存
                        size = size+len(chunk)
                #print("已下载:%0.2f Mb"%(size/(1024*1024)))
            except Exception as e:
                print("url下载错误:%s"%url)
                print(e)
        stop = time.time()
        print("下载完成,耗时:%0.2f秒"%(stop-start))
    View Code

      1.3 视频拼接

        上面下载下来的视频也可以直接播放,但逐个播放比较麻烦,可以利用ffmpeg进行拼接。

        首先需要下载ffmpeg(https://ffmpeg.zeranoe.com/builds/),解压将其拷贝到相应的文件夹,然后将bin目录下的ffmpeg.exe加入到环境变量,命令行输入ffmpeg -version, 返回提示信息即安装成功

        ffmpeg拼接视频的命令语句为: ffmpeg -f concat -safe 0 -i path.txt -c copy output.mp4

        其中path.txt包含需要拼接的视频的路径,格式如下:(表示video路径下的v_1.mp4)   

    file 'video/v_1.mp4'
    file 'video/v_2.mp4'
    file 'video/v_3.mp4'

        output.mp4表示拼接后的视频存放地址,也可以写成 video/output.mp4,即保存到video文件夹下。

        最终拼接的代码如下:

    #将下载的多个视频拼接成一个完整的视频    
    def concatenate(path,title,output="vidoes"):
        """
        path 为待拼接的视频的保存地址
        title 为拼接后视频的名称
        output 为拼接后视频保存的地址
        """
        with open("path.txt",'w') as f:
            for root,dirs,files in os.walk(path):
                for file in files:
                    if os.path.splitext(file)[1] in [".flv",".mkv",".mp4"]:
                        v_path = os.path.join(root,file)
                        f.write("file '{}'
    ".format(v_path))
                        
        if os.path.exists("path.txt"):
            if not os.path.exists(output):
                os.mkdir(output)
            try:
                print("开始合并视频")
                path_name = os.path.join(output,title+".mp4")
                ffmpeg_command = r"D:ffmpeg-win32-staticinffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name)
                #若将D:ffmpeg-win32-staticinffmpeg.exe路径加入环境变量,可以用"ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name)
                #print ffmpeg_command
                subprocess.call(ffmpeg_command)
                subprocess.call("rmdir /s %s"%path) #windows 删除目录
                subprocess.call("del path.txt")  #windows 删除文件
            except Exception as e:
                print(e)    
    View Code

      完整的代码如下:

    #coding:utf-8
    import requests
    import re
    import json
    import os
    import time 
    import subprocess
    
    #传入视频的url
    def down_video(video_url,path="temp_videos"):
        """
        video_url 待下载的video的url
        path 下载的视频保存地址
        """
        #video_url = "https://www.bilibili.com/video/av30292394?p=3"
        #video_url = "https://www.bilibili.com/video/av28879057"
        
        headers = {
            "User-Agent":"Mozilla/5.0 (Windows NT 6.1; r…) Gecko/20100101 Firefox/66.0",
        }
        response = requests.get(video_url,headers=headers)
    
        #在网页源码中匹配视频地址信息
        match_text = re.search(r'<script>window.__playinfo__=({.*?})</script>',response.text,re.S) #re.S,将字符窜中有换行时,将字符窜作为一个整体进行匹配;(否则一行匹配不到时,再匹配下一行)
    
        json_data = json.loads(match_text.group(1),encoding="utf-8")  #match_text.group(1)为unicode字符窜
        urls = json_data["data"]["durl"]  #视频包括多个部分,拿到包括各个部分url的列表
        content_size = sum([item["size"] for item in urls]) #视频总大小
        print("视频总大小为:%0.2f Mb"%(content_size/(1024*1024)))
    
        if not os.path.exists(path):
            os.mkdir(path)
    
        header={
            "Origin":"https://www.bilibili.com",
            "Referer":video_url,                    #请求头必须添加referer
        }
        headers.update(header)
        size=0
        start = time.time()
        for i,item in enumerate(urls):
            url = item["url"]
            try:
                result = requests.get(url,headers=headers,stream=True,verify=False)
                print result.status_code
                video_path = os.path.join(path,"{}.mp4".format(i))
                with open(video_path,"wb") as f:
                    for chunk in result.iter_content(1024):
                        f.write(chunk)
                        f.flush() #清空缓存
                        size = size+len(chunk)
                #print("已下载:%0.2f Mb"%(size/(1024*1024)))
            except Exception as e:
                print("url下载错误:%s"%url)
                print(e)
        stop = time.time()
        print("下载完成,耗时:%0.2f秒"%(stop-start))    
    
    #将下载的多个视频拼接成一个完整的视频    
    def concatenate(path,title,output="vidoes"):
        """
        path 为待拼接的视频的保存地址
        title 为拼接后视频的名称
        output 为拼接后视频保存的地址
        """
        with open("path.txt",'w') as f:
            for root,dirs,files in os.walk(path):
                for file in files:
                    if os.path.splitext(file)[1] in [".flv",".mkv",".mp4"]:
                        v_path = os.path.join(root,file)
                        f.write("file '{}'
    ".format(v_path))
                        
        if os.path.exists("path.txt"):
            if not os.path.exists(output):
                os.mkdir(output)
            try:
                print("开始合并视频")
                path_name = os.path.join(output,title+".mp4")
                ffmpeg_command = r"D:ffmpeg-win32-staticinffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name)
                #若将D:ffmpeg-win32-staticinffmpeg.exe路径加入环境变量,可以用"ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name)
                #print ffmpeg_command
                subprocess.call(ffmpeg_command)
                subprocess.call("rmdir /s %s"%path) #windows 删除目录
                subprocess.call("del path.txt")  #windows 删除文件
            except Exception as e:
                print(e)    
        
    if __name__=="__main__":
        # down_video("https://www.bilibili.com/video/av28879057")
        # concatenate("temp_videos",title="python")
        down_video("https://www.bilibili.com/video/av30292394?p=3")
        concatenate("temp_videos",title="python机器学习与量化分析")
                        
        
        
    View Code

     参考:

    https://amberwest.github.io/2018/09/11/%E7%94%A8python%E4%B8%8B%E8%BD%BD%E5%93%94%E5%93%A9%E5%93%94%E5%93%A9%E8%A7%86%E9%A2%91/

    https://github.com/Henryhaohao/Bilibili_video_download

    2. 爬取网易云音乐

      2.1 网页分析

        查看了下网页版的网易云音乐,也是每首歌有一个ID,如下,对应的网址组成为 https://music.163.com/song?id=1353372483(请求时网易自动添加了一个“#”,从而变成了https://music.163.com/#/song?id=1352541009)

        接着刷新网页,看下网络请求,同样按大小排序,可以发现一个较大的mp3传输请求,如下图所示:该url即为音乐的下载url,直接发送请求就能下载该视频,剩下就是如何获得每首歌的下载url。

     

       查看了下其他xhr请求的返回值,发现了如下的返回值,可以看到其包含了歌曲的相关信息,从中可以拿到我们需要的url。观察这个请求,发现是一个post请求,需要提交表单数据,主要是两个参数'params' 和'encSecKey', 但是是加密后的数据,如下第二张图所示,因此需要对加密方法进行解析。

        整理下思路,下载音乐的整个流程可以分为三步,如下:

          1.通过get请求,访问https://music.163.com/song?id=1353372483,能拿到歌曲的名字,歌词等基本信息

          2.通过post请求,提交两个参数'params' 和'encSecKey',访问https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=,从返回的json       数据中能拿到歌曲的下载地址和大小等信息

          3. 访问歌曲的下载地址(http://m10.music.126.net/20190407154531/74f897c9d014dede19a0905644433907/ymusic/035c/5458/530f/46ebf59083c2f04cc090de3b1e0beaf0.mp3),将其写到本地,即完成下载信息

        因此,剩下的就是如何构造加密后的两个参数'params' 和'encSecKey'。点击浏览器的source选项,在每个js文件下搜索下encSecKey(或者直接ctrl+shift +f 全局搜索),在如下js文件中找到了相关的代码,正好包括了我们需要的两个参数。

        对上面的代码进行分析,主要是var bYl2x = window.asrsea()这个函数完成具体的工作,搜索这个函数发现了如下的语句 window.asrsea = d, 即该函数是d函数,而d函数中调用了一次a函数,两次b函数和一次c函数

     

           其中a函数主要是产生一组随机的字符窜,这里是a(16)产生一个包含16个字符的随机字符窜,上面js代码和对应的python实现如下:

    #a 函数
     function a(a) {
            var d, e, b = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", c = "";
            for (d = 0; a > d; d += 1)
                e = Math.random() * b.length,
                e = Math.floor(e),
                c += b.charAt(e);
            return c
        }
    
    #对应python产生随机字符窜代码
    
    def random_str(size):
        return binascii.hexlify(os.urandom(size))[:16] #binascii.hexlify()接受byte字符窜,返回ascii字符窜
    python实现a函数

       b函数是对数据进行AES对称加密,js代码和对应的python实现如下: 

               python需要用到Crypto模块,pip install crypto安装会有问题,通过如下方式安装:(windows 7和python2.7环境安装成功)

                               python -m pip install pycrypto

    #b函数
    function b(a, b) {
            var c = CryptoJS.enc.Utf8.parse(b)
              , d = CryptoJS.enc.Utf8.parse("0102030405060708")
              , e = CryptoJS.enc.Utf8.parse(a)
              , f = CryptoJS.AES.encrypt(e, c, {
                iv: d,
                mode: CryptoJS.mode.CBC
            });
            return f.toString()
        }
    
    #python 实现b函数
    from Crypto.Cipher import AES
    import base64
    
    def get_params(text,key):  #AES对称加密
        iv = '0102030405060708'
        pad = 16 - len(text)%16
        text = text + pad * chr(pad) 
        encryptor = AES.new(key, AES.MODE_CBC, iv) 
        result = encryptor.encrypt(text) 
        result_str = base64.b64encode(result).decode('utf-8') 
        return result_str
    python实现b函数

       c函数是对数据进行RSA不对称加密,s代码和对应的python实现如下:

    #c函数
    function c(a, b, c) {
            var d, e;
            return setMaxDigits(131),
            d = new RSAKeyPair(b,"",c),
            e = encryptedString(d, a)
        }
    
    #python实现c函数
    def get_encSecKey(text,pubkey,modulus):  #rsa不对称加密
        text = text[::-1]
        rs = pow(int(binascii.hexlify(text),16),int(pubkey,16),int(modulus,16))
        return format(rs,'x').zfill(256)
    python实现c函数

      接下来就该分析下window.asrsea()传入的四个参数了,需要插入断点,如图所示,点击某一行插入断点,然后点击播放音乐,执行到断点处后,点击右边红圈处的两个按钮(第一个向下执行一个过程,第二个向下执行一句),当我们选中四个参数中的某一个时(复制时那样选中),即能看到该参数的值。

       如下图是选中第二个参数时,显示的值为“010001”,说明第二个参数为一个常量,查看其它参数后发现第二三四个参数都为常量,第一个参数为与id相关的json数据。四个参数的示例可以见下面:

    四个参数示例:

    first_param = {"ids":"[1353194608]","level":"standard","encodeType":"aac","csrf_token":""}
    second_param = "010001"
    third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7"
    fourth_param = "0CoJUm6Qyw8W8jud"

        上面整个过程只需要利用歌曲的ID值和上面三个常量参数,就可以构造最终的加密数据了,剩下的就是写代码了

      2.2 歌曲下载

      根据上面的分析过程,代码书写流程如下:

        1,根据歌曲id值,访问https://music.163.com/song?id=1353372483,利用正则表达式匹配网页内容,获得歌曲名称

                2,计算加密后的参数'params' 和'encSecKey',post请求访问https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=,拿到歌曲url和size

       3. 访问歌曲的下载地址,将结果写到本地

      完整代码如下:

    #coding:utf-8
    import os
    import binascii
    from Crypto.Cipher import AES
    import base64
    import json
    import requests
    import re
    
    first_param = {"ids":"[1353194608]","level":"standard","encodeType":"aac","csrf_token":""}
    second_param = "010001"
    third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7"
    fourth_param = "0CoJUm6Qyw8W8jud"
    headers={
                "Referer":"https://music.163.com/",
                "User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Mobile Safari/537.36"
                }
    def random_str(size):
        return binascii.hexlify(os.urandom(size))[:16] #binascii.hexlify()接受byte字符窜,返回ascii字符窜
    
    def get_params(text,key):  #AES对称加密
        iv = '0102030405060708'
        pad = 16 - len(text)%16
        text = text + pad * chr(pad) 
        encryptor = AES.new(key, AES.MODE_CBC, iv) 
        result = encryptor.encrypt(text) 
        result_str = base64.b64encode(result).decode('utf-8') 
        return result_str
    
    def get_encSecKey(text,pubkey,modulus):  #rsa不对称加密
        text = text[::-1]
        rs = pow(int(binascii.hexlify(text),16),int(pubkey,16),int(modulus,16))
        return format(rs,'x').zfill(256)
        
    def encrypt_data(first_param,second_param,third_param,fourth_param):
        data={}
        i = random_str(16)
        temp = get_params(json.dumps(first_param),fourth_param)
        params = get_params(temp,i)
        encSecKey = get_encSecKey(i,second_param,third_param)
        data['params']=params.encode("utf-8")
        data['encSecKey']=encSecKey
        return data
        
    #获取歌曲名称
    def get_song_title(id):
        url = "https://music.163.com/song?id=%s"%(id)
        response = requests.get(url,headers=headers)
        title = re.search(r'<title>(.*?)s-',response.text).group(1)    #匹配歌曲标题
        #print(title)
        return title
        
    #获取歌曲的下载地址,大小等信息    
    def get_song_info(id):
        first_param['ids'] = "[%s]"%id
        data = encrypt_data(first_param,second_param,third_param,fourth_param)
        url="https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token="
        response = requests.post(url,headers=headers,data=data)
        #print response.status_code
        json_data = json.loads(response.text)
        return json_data
        
    #下载歌曲
    def down_song(id,down_url,song_title,size):
        filename = song_title+str(id)+".mp3"
        print("歌曲大小为:%0.2f Mb"%(size/(1024*1024)))
        try:
            result = requests.get(down_url,headers=headers)
            with open(filename,"wb") as f:
                for chunk in result.iter_content(1024):
                    f.write(chunk)
                    f.flush()
        except Exception as e:
            print("下载失败,id值为:%s"%id)
            print(e)
        print("下载完成")
        
    
    if __name__=="__main__":
        
        id=input("请输入歌曲的id值,如:1353194608  ")
        song_title = get_song_title(id)
        song_info=get_song_info(id)
        down_url = song_info["data"][0]["url"]
        size = song_info["data"][0]["size"]
        #print down_url,size
        down_song(id,down_url,song_title,size)
        
        
        
        
        
        
    网易云音乐下载

     参考:

      https://blog.csdn.net/qq_38282706/article/details/80251666

      https://github.com/Jack-Cherish/python-spider/blob/master/Netease/Netease.py

  • 相关阅读:
    软件工程第六次作业
    软件工程第五次作业
    软件工程第四次作业
    软件工程第三次作业
    软件工程第二次作业
    软件工程第一次作业
    《CLSZS团队》:团队项目选题报告
    计算机软件工程 作业五
    计算机软件工程 作业四
    计算机软件工程 作业三
  • 原文地址:https://www.cnblogs.com/silence-cho/p/10663000.html
Copyright © 2020-2023  润新知