• springboot批量爬取微信公众号信息及视频下载


    springboot批量爬取微信公众号信息及视频下载

    1. 准备需要爬取的公众号链接(例如:https://mp.weixin.qq.com/s/GPz-w3_gS8jsgINJH9t6vw).下面的是整合了160多个公众号文章的地址.

    2. 你们应该知道需要获取160个视频,先需要知道这160视频文章的地址.

        a. 搭建springboot框架.demo直通车.https://chenqiwei.lanzoui.com/isaWAschwji

        b.导入爬取网页的依赖在pom文件下.

    <!-- 这个是爬取网页地址 -->
        <dependency>
                <groupId>org.jsoup</groupId>
                <artifactId>jsoup</artifactId>
                <version>1.10.2</version>
            </dependency>
    <!-- 这个主要数据解析存储用到--> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.47</version> </dependency>

    3.获取每个视频文章的地址

    String url="https://mp.weixin.qq.com/s/GPz-w3_gS8jsgINJH9t6vw";
            Document document = Jsoup.parse(new URL(url), 30000);
            Elements section = document.getElementsByTag("a");
            StringBuffer stringBuffer = new StringBuffer();
            for (Element s:section){ //遍历每个视频文章集数
                VideoInfo videoInfo = new VideoInfo();
                String href = s.attr("href");//获取每个文章的地址
                String name = s.text();//获取每个文章的文本
    }

    4.每个的文章的视频,是需要再次请求微信服务器才会给你的 (获取的每个文章地址的mpvip的值替换下面这个地址的vid的地址才会返回真正的视频地址给你,就是文章mpcid替换途中红色部分再次请求,返回的才是视频地址)

    https://mp.weixin.qq.com/mp/videoplayer?action=get_mp_video_play_url&preview=0&__biz=&mid=&idx=&vid=1070106698888740864&uin=&key=&pass_ticket=&wxtoken=&appmsg_token=&x5=0&f=json
    if (href.contains("http")){
                    Document parse = Jsoup.parse(new URL(href), 30000);
                    String mpvid=parse.getElementsByTag("iframe").attr("data-mpvid");
                    String sun="https://mp.weixin.qq.com/mp/videoplayer?action=get_mp_video_play_url&preview=0&__biz=&mid=&idx=&vid=1070106698888740864&uin=&key=&pass_ticket=&wxtoken=&appmsg_token=&x5=0&f=json";
                    sun=sun.replace("1070106698888740864",mpvid);
                    String s1 = sendGet(sun,"");
                    String videourl=JSONObject.parseObject(s1).getJSONArray("url_info").getJSONObject(0).get("url").toString();
                    String title=JSONObject.parseObject(s1).get("title").toString();
                    String sss="<tr>
    " +
                            "        <td>"+name+"</td>
    " +
                            "        <td>"+title+"</td>
    " +
                            "        <td><a href='"+videourl.replace("http://","https://")+"'  target="_blank">"+title+"视频地址点击下载</a></td>
    " +
                            "    </tr>";
                    stringBuffer.append(sss);
                }

    5.出来的效果.(每个视频地址就整理出来了)

     6.完整的源码

     public static void main(String[] args) throws Exception {
            String url="https://mp.weixin.qq.com/s/GPz-w3_gS8jsgINJH9t6vw";
            Document document = Jsoup.parse(new URL(url), 30000);
            Elements section = document.getElementsByTag("a");
            StringBuffer stringBuffer = new StringBuffer();
            for (Element s:section){
                VideoInfo videoInfo = new VideoInfo();
                String href = s.attr("href");
                String name = s.text();
                if (href.contains("http")){
                    Document parse = Jsoup.parse(new URL(href), 30000);
                    String mpvid=parse.getElementsByTag("iframe").attr("data-mpvid");
                    String sun="https://mp.weixin.qq.com/mp/videoplayer?action=get_mp_video_play_url&preview=0&__biz=&mid=&idx=&vid=1070106698888740864&uin=&key=&pass_ticket=&wxtoken=&appmsg_token=&x5=0&f=json";
                    sun=sun.replace("1070106698888740864",mpvid);
                    String s1 = sendGet(sun,"");
                    String videourl=JSONObject.parseObject(s1).getJSONArray("url_info").getJSONObject(0).get("url").toString();
                    String title=JSONObject.parseObject(s1).get("title").toString();
                    String sss="<tr>
    " +
                            "        <td>"+name+"</td>
    " +
                            "        <td>"+title+"</td>
    " +
                            "        <td><a href='"+videourl.replace("http://","https://")+"'  target="_blank">"+title+"视频地址点击下载</a></td>
    " +
                            "    </tr>";
                    stringBuffer.append(sss);
                }
            }
            System.out.println(stringBuffer.toString());
        }
    
        public static String sendGet(String url, String param) {
            String result = "";
            BufferedReader in = null;
            try {
                String urlNameString = url + "?" + param;
                URL realUrl = new URL(urlNameString);
                // 打开和URL之间的连接
                URLConnection connection = realUrl.openConnection();
                // 设置通用的请求属性
                connection.setRequestProperty("accept", "*/*");
                connection.setRequestProperty("connection", "Keep-Alive");
                connection.setRequestProperty("user-agent",
                        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1)");
                // 建立实际的连接
                connection.connect();
                // 获取所有响应头字段
                Map<String, List<String>> map = connection.getHeaderFields();
                // 遍历所有的响应头字段
                for (String key : map.keySet()) {
                    System.out.println(key + "--->" + map.get(key));
                }
                // 定义 BufferedReader输入流来读取URL的响应
                in = new BufferedReader(new InputStreamReader(
                        connection.getInputStream()));
                String line;
                while ((line = in.readLine()) != null) {
                    result += line;
                }
            } catch (Exception e) {
                System.out.println("发送GET请求出现异常!" + e);
                e.printStackTrace();
            }
            // 使用finally块来关闭输入流
            finally {
                try {
                    if (in != null) {
                        in.close();
                    }
                } catch (Exception e2) {
                    e2.printStackTrace();
                }
            }
            return result;
        }

    7.关注博主公众号(内含更多技巧科技),一起学习更多知识.

  • 相关阅读:
    [Vue] Create Filters in Vue.js
    [Vue] Import component into page
    [Angular Form] ngModel and ngModelChange
    [Ramda] Convert a QueryString to an Object using Function Composition in Ramda
    [Vue] Use basic event handling in Vue
    [Ramda] Declaratively Map Data Transformations to Object Properties Using Ramda evolve
    Linux2.6内核--VFS层中和进程相关的数据结构
    [置顶] Firefox OS 学习——Gaia 编译分析
    ORACLE 索引概述
    【笔试&面试】C#的托管代码与非托管代码
  • 原文地址:https://www.cnblogs.com/chenqiwei/p/RunWsh_20210807_WXshipinDownload.html
Copyright © 2020-2023  润新知