java-原生爬虫机制源码

这是一个web搜索的基本程序，从命令行输入搜索条件（起始的URL、处理url的最大数、要搜索的字符串),
它就会逐个对Internet上的URL进行实时搜索,查找并输出匹配搜索条件的页面。这个程序的原型来自《java编程艺术》，
为了更好的分析，站长去掉了其中的GUI部分，并稍作修改以适用jdk1.5。以这个程序为基础，可以写出在互联网上搜索
诸如图像、邮件、网页下载之类的“爬虫”。
先请看程序运行的过程：

package com.utils;

import java.io.DataOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.commons.httpclient.DefaultHttpMethodRetryHandler;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.params.HttpMethodParams;

public class UtilIO {
    

    /* 下载 url 指向的网页 */
    public static void downloadFile(final String url,final String name,final String type , final String path) {
        /* 1.生成 HttpClinet 对象并设置参数 */
        HttpClient httpClient = new HttpClient();
        // 设置 Http 连接超时 5s
        httpClient.getHttpConnectionManager().getParams().setConnectionTimeout(5000);
        /* 2.生成 GetMethod 对象并设置参数 */
        GetMethod getMethod = new GetMethod(url);
        // 设置 get 请求超时 5s
        getMethod.getParams().setParameter(HttpMethodParams.SO_TIMEOUT, 5000);
        // 设置请求重试处理
        getMethod.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, new DefaultHttpMethodRetryHandler());
        /* 3.执行 HTTP GET 请求 */
        try {
            int statusCode = httpClient.executeMethod(getMethod);
            // 判断访问的状态码
            if (statusCode != HttpStatus.SC_OK) {
                System.err.println("Method failed: " + getMethod.getStatusLine());
            }
            /* 4.处理 HTTP 响应内容 */
            byte[] responseBody = getMethod.getResponseBody();// 读取为字节数组
            // 根据网页 url 生成保存时的文件名
            saveToLocalNewFile(responseBody, path,name+type);
        } catch (HttpException e) {
            // 发生致命的异常，可能是协议不对或者返回的内容有问题
            System.out.println("Please check your provided http address!");
            e.printStackTrace();
        } catch (IOException e) {
            // 发生网络异常
            e.printStackTrace();
        } finally {
            // 释放连接
            getMethod.releaseConnection();
        }
    }

    private static void saveToLocalNewFile(byte[] data, String fileDir,String fileName) {
        try {
            String filePath = fileDir+"/"+fileName;
            System.out.println(filePath);
            File fileNew=new File(filePath);//new 一个文件 构造参数是字符串
            System.out.println();
            File rootFile=fileNew.getParentFile();//得到父文件夹
            if( !fileNew.exists()) {
                rootFile.mkdirs();
                fileNew.createNewFile();
            }
            
            DataOutputStream out = new DataOutputStream(new FileOutputStream(fileNew));
            for (int i = 0; i < data.length; i++)
                out.write(data[i]);
            out.flush();
            out.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    
}

相关阅读:
Android 基于人脸识别 SDK使用总结
 基于虹软人脸识别Demo android人脸识别
 Android Arcface 2.0人脸识别注册失败问题
 C#人脸识别之人脸特征值的提取及识别
 Arcface demo
人脸识别基于Android
基于Android 虹软人脸、人证对比，活体检测
 虹软AI 人脸识别SDK接入 — 性能优化篇（多线程）
Android 安卓人脸识别（百度人脸识别）快速集成采坑
 [mysql 存储过程]MySQL存储过程详解 mysql 存储过程
原文地址：https://www.cnblogs.com/hwaggLee/p/4904937.html