• 爬虫-微信公众平台消息获取


    帮朋友抓取微信公众平台的用户评论信息。

    以下仅仅说核心的部分,怎么获取评论信息。

    查看HTML代码,没有发现关于评论部分的标签。看来是用JS动态生成的,可是查找ajax请求也没有找到哪里有返回数据。

    最后搜索一下。原来是在这里,非常直白的写在了JS里:

       <script type="text/javascript">
            wx.cgiData = {
                total_count : 91,
                latest_msg_id : '200325222',
                count : "20"*1 || 20,
                day : "7",
                frommsgid : "",
                can_search_msg : "1",
                offset : "",
                action : "",
                keyword : "",
                list : ({"msg_item":[{"id":200322761,"type":1,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854675,"content":"记得帮我查一下是不是这个电话!","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322760,"type":2,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854664,"source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322759,"type":1,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854659,"content":"勐璇,我看到那人了!","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322344,"type":2,"fakeid":"1994400010","nick_name":"ABC的CBA","date_time":1398839849,"source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200321209,"type":1,"fakeid":"1591078101","nick_name":"倚(纺织服装)","date_time":1398788906,"content":"/::<","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200321206,"type":2,"fakeid":"1591078101","nick_name":"倚(纺织服装)","date_time":1398788859,"source":"","msg_status":4,"has_reply":1,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},

    用的是JSON格式。代码太乱,放在Eclipse里格式化一下,消息列表大概就是这个样了:

    {"msg_item" :[ {
    	"id" : 200322761,
    	"type" : 1,
    	"fakeid" : "593656935",
    	"nick_name" : "Suang 1",
    	"date_time" : 1398854675,
    	"content" : "记得帮我查一下是不是这个电话!",
    	"source" : "",
    	"msg_status" : 4,
    	"has_reply" : 0,
    	"refuse_reason" : "",
    	"multi_item" : [],
    	"to_uin" : 3071594631,
    	"send_stat" : {
    		"total" : 0,
    		"succ" : 0,
    		"fail" : 0
    	}
    }, {
    	"id" : 200322760,
    	"type" : 2,
    	"fakeid" : "593656935",
    	"nick_name" : "Suang 1",
    	"date_time" : 1398854664,
    	"source" : "",
    	"msg_status" : 4,
    	"has_reply" : 0,
    	"refuse_reason" : "",
    	"multi_item" : [],
    	"to_uin" : 3071594631,
    	"send_stat" : {
    		"total" : 0,
    		"succ" : 0,
    		"fail" : 0
    	}
    }
    ]
    }

    上面就是 json字符串 中 msg_item 所相应的列表里的对象。

    能够看出这个是一个数组,每一个评论是里面的一个对象。怎么生成对于的Java类呢 ?


    这里有一个在线的工具:http://jsongen.byingtondesign.com/

    能够依据JSON 字符串,生成相应的java类:

     类1

    import java.util.List;
    
    public class MessageList{
        private List<Message> msg_item;
    
    	public List<Message> getMsg_item() {
    		return msg_item;
    	}
    
    	public void setMsg_item(List<Message> msgItem) {
    		msg_item = msgItem;
    	}
        
    }
    类2。部分字段没实用。删掉了

    public class Message {
    
    	private String content;
    	private long date_time;
    	private String fakeid;
    	private int has_reply;
    	private long id;
    	private int msg_status;
    	private String nick_name;
    	private String refuse_reason;
    	private String source;
    	private long to_uin;
    	private int type;
    // get set 略去
    }


    以下来做个測试。

    用google的 Gson 来进行处理,把json字符串解析为 java对象。

    //jsonstr 为 msg_item 的json字符串
    		MessageList msgList = new Gson().fromJson(jsonstr, MessageList.class);
    		System.out.println(msgList.getMsg_item().size());

    解析成功。全部的对象都在 msgList里了

  • 相关阅读:
    p(str or array) 传递数据以易于阅读的样式格式化后输出 bootstarp样式的打印函数
    [Err] 1067
    php 正则表达式
    Docker使用及dnmp构建
    记一次Ubuntu18.04升级到19.10的经历
    面试-Redis
    ubuntu截图软件deepin scrot
    docker 搭建 Hadoop
    Docker 遇到的坑
    RabbitMQ遇到的坑
  • 原文地址:https://www.cnblogs.com/yjbjingcha/p/8324027.html
Copyright © 2020-2023  润新知