• java 对于url地址的实体符号的处理


    <!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 
            <dependency>
                <groupId>org.apache.commons</groupId>
                <artifactId>commons-lang3</artifactId>
                <version>3.4</version>
            </dependency>
    -->
    
    
    public static String getNextPage(String web) throws Exception {
            HttpComponentsClientHttpRequestFactory factory=new HttpComponentsClientHttpRequestFactory();
    //        factory.setConnectTimeout(60000);
    //        factory.set
            String regx="上一页</a>)(<a.*?href=[\"']?(((http|https)?://)?/?[^\"']+)[\"']?.*?>(.+)</a>";
            RestTemplate template=new RestTemplate();
            URI uri=new URI(URLDecoder.decode(web,"utf-8"));
            String stri = template.getForObject(uri, String.class);
            Pattern pattern=Pattern.compile(regx);
            Matcher matcher = pattern.matcher(stri);
            matcher.find();
            String group = matcher.group();
            group = group.substring(group.indexOf("href="/") + 7, group.indexOf("" title=""));
            group="http://www.youbianku.com/"+group;
            group= StringEscapeUtils.unescapeHtml4(group);
            return group;
    
        }
  • 相关阅读:
    vbscript 过滤 特殊字符
    C#3.0新体验(五)Lambda表达式
    C#3.0新体验(三)对象与集合初始化器收
    C#3.0新体验(四)匿名类型
    郁闷啊
    9.15
    谈话是需要对手的
    中秋节啊
    照片
    回家的 感受
  • 原文地址:https://www.cnblogs.com/wangyang108/p/6010145.html
Copyright © 2020-2023  润新知