经常听VOK(Voice of Korea),就是感觉一个个点链接非常不方便
所以准备做个爬虫自动爬VOK的中文广播,然后利用开源工具自动合成MP3
data中的参数经过encodeURI转码
转码前的数据为:
{"databaseName":"cbc","tableName1":"pd","tableName2":"pro","FIXTITLEID":"34","CHANNEL":"4"}
{"databaseName":"cbc","tableName1":"pd","tableName2":"pro","FIXTITLEID":"13","CHANNEL":"4","LIMIT":"2"}
FIXTITLEID是新闻编号
CHANNEL代表语言,4是中文
下面的CURL指令是获取当前新闻内容的
curl "http://www.vok.rep.kp/model/view.php?data={"%"22databaseName"%"22:"%"22cbc"%"22,"%"22tableName1"%"22:"%"22pd"%"22,"%"22tableName2"%"22:"%"22pro"%"22,"%"22FIXTITLEID"%"22:"%"2234"%"22,"%"22CHANNEL"%"22:"%"224"%"22}" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" -H "Accept: application/json, text/javascript, */*; q=0.01" -H "Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2" --compressed -H "X-Requested-With: XMLHttpRequest" -H "Connection: keep-alive" -H "Referer: http://www.vok.rep.kp/index.php?CHANNEL=4&lang=" -H "Cookie: PHPSESSID=mh0uh8gdhbh68gf3v7f59cdmc3"
curl "http://www.vok.rep.kp/model/viewlimit.php?data={"%"22databaseName"%"22:"%"22cbc"%"22,"%"22tableName1"%"22:"%"22pd"%"22,"%"22tableName2"%"22:"%"22pro"%"22,"%"22FIXTITLEID"%"22:"%"2213"%"22,"%"22CHANNEL"%"22:"%"224"%"22,"%"22LIMIT"%"22:"%"222"%"22}" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0" -H "Accept: application/json, text/javascript, */*; q=0.01" -H "Accept-Language: zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2" --compressed -H "X-Requested-With: XMLHttpRequest" -H "Connection: keep-alive" -H "Referer: http://www.vok.rep.kp/index.php?CHANNEL=4&lang=" -H "Cookie: PHPSESSID=mh0uh8gdhbh68gf3v7f59cdmc3"
返回数据为:
{"JSON":[{"EDATE":"2019-10-06 00:00:00","FIXTITLEID":"13","ORD":"0","REGID":"ice190906039","TITLE":"조중친선의 잊지 못할 자욱들","CHANNEL":"4","FLAG":"0","PDID":"ice190906039","CONTENT":"","CONTENTKIND":"4","FTITLE":"中朝友谊的光辉历史篇章","CTITLE":"中朝友谊的光辉历史篇章","LTITLE":"中朝友谊的光辉历史篇章","KINDID":"50"},{"EDATE":"2019-10-06 00:00:00","FIXTITLEID":"13","ORD":"1","REGID":"ice190809024","TITLE":"조선의 현실을 보다 -제2회-","CHANNEL":"4","FLAG":"0","PDID":"ice190809024","CONTENT":"","CONTENTKIND":"10","FTITLE":"看到了朝鲜现实(第二回)","CTITLE":"看到了朝鲜现实(第二回)","LTITLE":"看到了朝鲜现实(第二回)","KINDID":"60"}],"预览":{"HTML_PREVIEW":{"responseContent":{"content":{"mimeType":"text/html; charset=UTF-8","text":" [{"EDATE":"2019-10-06 00:00:00","FIXTITLEID":"13","ORD":"0","REGID":"ice190906039","TITLE":"\uc870\uc911\uce5c\uc120\uc758 \uc78a\uc9c0 \ubabb\ud560 \uc790\uc6b1\ub4e4","CHANNEL":"4","FLAG":"0","PDID":"ice190906039","CONTENT":"","CONTENTKIND":"4","FTITLE":"\u4e2d\u671d\u53cb\u8c0a\u7684\u5149\u8f89\u5386\u53f2\u7bc7\u7ae0","CTITLE":"\u4e2d\u671d\u53cb\u8c0a\u7684\u5149\u8f89\u5386\u53f2\u7bc7\u7ae0","LTITLE":"\u4e2d\u671d\u53cb\u8c0a\u7684\u5149\u8f89\u5386\u53f2\u7bc7\u7ae0","KINDID":"50"},{"EDATE":"2019-10-06 00:00:00","FIXTITLEID":"13","ORD":"1","REGID":"ice190809024","TITLE":"\uc870\uc120\uc758 \ud604\uc2e4\uc744 \ubcf4\ub2e4\n-\uc81c2\ud68c-","CHANNEL":"4","FLAG":"0","PDID":"ice190809024","CONTENT":"","CONTENTKIND":"10","FTITLE":"\u770b\u5230\u4e86\u671d\u9c9c\u73b0\u5b9e\uff08\u7b2c\u4e8c\u56de\uff09","CTITLE":"\u770b\u5230\u4e86\u671d\u9c9c\u73b0\u5b9e\uff08\u7b2c\u4e8c\u56de\uff09","LTITLE":"\u770b\u5230\u4e86\u671d\u9c9c\u73b0\u5b9e\uff08\u7b2c\u4e8c\u56de\uff09","KINDID":"60"}]","size":1004,"transferredSize":1351},"contentDiscarded":false,"from":"server1.conn0.netEvent3563"}}},"响应载荷(payload)":{"EDITOR_CONFIG":{"text":" [{"EDATE":"2019-10-06 00:00:00","FIXTITLEID":"13","ORD":"0","REGID":"ice190906039","TITLE":"\uc870\uc911\uce5c\uc120\uc758 \uc78a\uc9c0 \ubabb\ud560 \uc790\uc6b1\ub4e4","CHANNEL":"4","FLAG":"0","PDID":"ice190906039","CONTENT":"","CONTENTKIND":"4","FTITLE":"\u4e2d\u671d\u53cb\u8c0a\u7684\u5149\u8f89\u5386\u53f2\u7bc7\u7ae0","CTITLE":"\u4e2d\u671d\u53cb\u8c0a\u7684\u5149\u8f89\u5386\u53f2\u7bc7\u7ae0","LTITLE":"\u4e2d\u671d\u53cb\u8c0a\u7684\u5149\u8f89\u5386\u53f2\u7bc7\u7ae0","KINDID":"50"},{"EDATE":"2019-10-06 00:00:00","FIXTITLEID":"13","ORD":"1","REGID":"ice190809024","TITLE":"\uc870\uc120\uc758 \ud604\uc2e4\uc744 \ubcf4\ub2e4\n-\uc81c2\ud68c-","CHANNEL":"4","FLAG":"0","PDID":"ice190809024","CONTENT":"","CONTENTKIND":"10","FTITLE":"\u770b\u5230\u4e86\u671d\u9c9c\u73b0\u5b9e\uff08\u7b2c\u4e8c\u56de\uff09","CTITLE":"\u770b\u5230\u4e86\u671d\u9c9c\u73b0\u5b9e\uff08\u7b2c\u4e8c\u56de\uff09","LTITLE":"\u770b\u5230\u4e86\u671d\u9c9c\u73b0\u5b9e\uff08\u7b2c\u4e8c\u56de\uff09","KINDID":"60"}]","mode":"application/json"}}}
其中的:REGID是MP3文件名称
然后将下面的URL中的地址替换掉就可以了
http://175.45.176.83/vod/media/cbc_pddata/cbc_ice190906039/ice190906039.mp3
注意:经过实测,VOK网站中有bug,部分MP3文件无法下载。
最近混合APP研究的还可以,有空弄个VOK的安卓app出来看看。