转自:
https://blog.csdn.net/felcon/article/details/38524317
JSON(JavaScript Object Notation) 是一种轻量级的数据交换格式。 易于人阅读和编写,同时也易于机器解析和生成。
写爬虫程序时发现页面很多内容都是基于json传输的,而且都是unicode编码,需要读取并转换为汉字,这可以直接使用python的json包处理
python的json.dumps方法默认会输出成这种格式"u535au5ba2u56ed",。
要输出中文需要指定ensure_ascii参数为False,如下代码片段:
json.dumps({'text':"中文"},ensure_ascii=False)
json的一个简单示例为:
{ "firstName":"Bill" , "lastName":"Gates" }
其中“firstName”和”lastName“为健(key),“Bill”和“Gates”为值(value)
首先需要导入json包
import json
使用
info = json.JSONDecoder().decode(info)
可以读取json数据,同时将unicode转换为汉字
使用
info["firstName"]
来读取健”firstName"所对应的值“Bill”
比如最近爬爆米花视频,得到存储数据的是unicode的,要获取里面的list,就可以使用上面的方法
response.text =
{
"Videolist":[{"appID":"35943","appName":"逗影视频","appPic":"//p001.baomihua.com/b94ec748-abcf-4096-ba42-cc986d28ea83_61503983.png","appUrl":"//www.baomihua.com/user/35943","videoId":"38136258","videoTitle":"如今拉客都用这套路?长点心吧!贪小便宜吃大亏","videoCost":"00:22","videoPlayUrl":"//video.baomihua.com/v/38136258","videoImgUrl":"//img04.video.baomihua.com/x/38136258.jpg","companyName":"太原瑞宝达文化传媒有限公司","isRec":"0"},{"appID":"34845","appName":"娱闻壹姐","appPic":"//p001.baomihua.com/1e3abb19-64a7-4c3a-9845-da1e15cec126.jpg","appUrl":"//www.baomihua.com/user/34845","videoId":"38134456","videoTitle":"蔡徐坤微博大方表白!内容让粉丝们彻夜难眠,不愧是偶像!","videoCost":"01:07","videoPlayUrl":"//video.baomihua.com/v/38134456","videoImgUrl":"//img04.video.baomihua.com/x/38134456.jpg","companyName":"杭州匠星影视传媒有限公司","isRec":"0"},{"appID":"34923","appName":"嗨儿原创","appPic":"//p001.baomihua.com/f1613b50-a19a-4ddb-82c8-393edba276ab.jpg","appUrl":"//www.baomihua.com/user/34923","videoId":"38134104","videoTitle":"张馨予结婚嫁给了老实人,李晨没祝福!","videoCost":"04:01","videoPlayUrl":"//video.baomihua.com/v/38134104","videoImgUrl":"//img04.video.baomihua.com/x/38134104.jpg","companyName":"秦皇岛屿海文化传媒有限公司","isRec":"0"}]
}
content = json.JSONDecoder().decode(response.text)
print type(content["Videolist"]) #输出: <type 'list'>