• JAVA验证码识别:基于jTessBoxEditorFX和Tesseract-OCR训练样本


    JAVA验证识别:基于jTessBoxEditorFXTesseract-OCR训练样本

    工具准备:

    jTessBoxEditorFX下载:https://github.com/nguyenq/jTessBoxEditorFX

    Tesseract-OCR下载:https://sourceforge.net/projects/tesseract-ocr/

    主要步骤:

    1. JTessBoxEditorFXTesseract-OCR(环境变量配置)下载,jar包准备(maven,见下面pom文件
    2. 下载验证码到本地(代码)
    3. 转换验证码图片格式
    4. 将转换后的验证码去噪二值化,剪切边缘(代码)
    5. 使用jTessBoxEditorFX进行.box文件的校对(改正识别错误的验证码):https://www.cnblogs.com/zhongtang/p/5555950.html
    6. 使用tesseract命令行进行.traineddata的生成,然后在java中调用:https://www.cnblogs.com/zhongtang/p/5555950.html

    代码如下:

     

    package yanZhengMaTest.pikachu;
    
    import java.awt.image.BufferedImage;
    import java.io.BufferedInputStream;
    import java.io.File;
    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.net.MalformedURLException;
    import java.net.URL;
    
    import javax.imageio.ImageIO;
    import javax.net.ssl.HttpsURLConnection;
    
    import org.opencv.core.Core;
    import org.opencv.core.CvType;
    import org.opencv.core.Mat;
    import org.opencv.core.Rect;
    import org.opencv.core.Size;
    import org.opencv.imgcodecs.Imgcodecs;
    import org.opencv.imgproc.Imgproc;
    
    import net.sourceforge.tess4j.Tesseract;
    import net.sourceforge.tess4j.TesseractException;
    
    public class Test {
    
        static {
            System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
        }; // 用来调用OpenCV库文件,必须添加
    
        public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException {
            
            //保存验证码的文件夹
            File imgFile = new File("C:\Users\pc\Desktop\formPic\unFormPic");
            //验证码保存地址
            String downAddress = "C:\Users\pc\Desktop\formPic\unFormPic\";
            //验证码下载地址
            String downURL = "https://www.qichamao.com/usercenter/varifyimage?t=0.6488481170232967";
            if (imgFile.listFiles().length < 400) {
                for (int i = 1; i <= 400; i++) {
                    downloadPic(downURL, downAddress + i + ".gif");
                    Thread.sleep(10 + (i % 100));
                }
            }
            
            //获取保存的验证码并转换为tif格式(Tesseract不支持识别gif图片)
            File imgFile0 = new File("C:\Users\pc\Desktop\formPic\unFormPic");
            for (File image : imgFile0.listFiles()) {
                changePicFormat("tif", image, "C:\Users\pc\Desktop\formPic\formedPic\");
            }
            System.out.println("图片格式转换成功");
    
            //获取转换为tif格式后的验证码,并进行加工(图片去噪,二值化),增加验证码识别度
            int picNum = 1;
            File imageFile1 = new File("C:\Users\pc\Desktop\formPic\formedPic");
            for (File image : imageFile1.listFiles()) {
                filterPic(image.getName(), picNum + ".tif");
                picNum++;
            }
    
            //获取加工后的
            File resultImgs = new File("C:\Users\pc\Desktop\result_cut");
            for (File link : resultImgs.listFiles()) {
                String reslut = getResult(link);
                System.out.println(link.getName() + "识别结果:" + reslut);
            }
    
        }
    
        // 图片处理及处理后的图片储存
        public static void filterPic(String imgName, String fileName) throws FileNotFoundException, IOException {
            // 图片去噪
            Mat src = Imgcodecs.imread("C:\Users\pc\Desktop\formPic\formedPic\" + imgName, Imgcodecs.IMREAD_UNCHANGED);
            Mat dst = new Mat(src.width(), src.height(), CvType.CV_8UC1);
    
            if (src.empty()) {
                System.out.println("没有图片");
            } else {
                System.out.println("图片处理成功");
            }
    
            Imgproc.boxFilter(src, dst, src.depth(), new Size(3.2, 3.2));
            Imgcodecs.imwrite("C:\Users\pc\Desktop\filter\" + fileName, dst);
    
            // 图片阈值处理,二值化
            Mat src1 = Imgcodecs.imread("C:\Users\pc\Desktop\filter\" + fileName, Imgcodecs.IMREAD_UNCHANGED);
            Mat dst1 = new Mat(src1.width(), src1.height(), CvType.CV_8UC1);
    
            Imgproc.threshold(src1, dst1, 165, 200, Imgproc.THRESH_TRUNC);
            Imgcodecs.imwrite("C:\Users\pc\Desktop\process\" + fileName, dst1);
    
            // 图片截取
            Mat src2 = Imgcodecs.imread("C:\Users\pc\Desktop\process\" + fileName, Imgcodecs.IMREAD_UNCHANGED);
            Rect roi = new Rect(4, 2, src2.cols() - 7, src2.rows() - 4); // 参数:x坐标,y坐标,截取的长度,截取的宽度
            Mat dst2 = new Mat(src2, roi);
    
            Imgcodecs.imwrite("C:\Users\pc\Desktop\result_cut\" + fileName, dst2);
    
        }
    
        // 获取验证码
        public static String getResult(File imageFile) {
            if (!imageFile.exists()) {
                System.out.println("图片不存在");
            }
            Tesseract tessreact = new Tesseract();
            tessreact.setDatapath("F:\Program Files (x86)\Tesseract-OCR\tessdata");
            tessreact.setLanguage("fontyp");    //将默认库设置为自己训练的库
    
            String result;
            try {
                result = tessreact.doOCR(imageFile);
                return result;
            } catch (TesseractException e) {
                e.printStackTrace();
                return null;
            }
        }
    
        /**
         * 图片格式转换
         * 
         * @param outputFormat
         *            转换的格式
         * @param file
         *            要转换的图片
         * @param downAddress
         *            转换后保存的地址
         * @sourse: http://www.open-open.com/code/view/1453300186683
         */
        public static void changePicFormat(String outputFormat, File image, String downAddress) {
    
            try {
                BufferedImage bim = ImageIO.read(image);
                File output = new File(
                        downAddress + image.getName().substring(0, image.getName().lastIndexOf(".") + 1) + outputFormat);
                ImageIO.write(bim, outputFormat, output);
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    
        /**
         * 下载验证码
         * 
         * @param picUrl
         *            验证码获取地址
         * @param address
         *            图片保存地址
         */
        public static void downloadPic(String picUrl, String imgAddress) {
            try {
                URL url = new URL(picUrl);
                HttpsURLConnection conn = (HttpsURLConnection) url.openConnection();
                //需要设置头信息,否则会被识别为机器而获取不到验证码图片
                conn.setRequestProperty("User-Agent",
                        "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36");
                conn.connect();
    
                int result = -1;
                byte[] buf = new byte[1024];
                BufferedInputStream bis = new BufferedInputStream(conn.getInputStream());
                FileOutputStream fos = new FileOutputStream(imgAddress);
                while ((result = bis.read(buf)) != -1) {
                    fos.write(buf);
                }
                fos.flush();
    
                fos.close();
                bis.close();
                System.out.println("图片下载成功");
            } catch (MalformedURLException e) {
                System.out.println("图片读取失败");
                e.printStackTrace();
            } catch (IOException e) {
                System.out.println();
                e.printStackTrace();
            }
        }
    
    }

    pom文件:

            <dependency>
                <groupId>net.sourceforge.tess4j</groupId>
                <artifactId>tess4j</artifactId>
                <version>4.1.1</version>
                <exclusions>
                    <exclusion>
                        <groupId>com.sun.jna</groupId>
                        <artifactId>jna</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <dependency>
                <groupId>org.openpnp</groupId>
                <artifactId>opencv</artifactId>
                <version>3.2.0-0</version>
            </dependency>

    参考文章:

    opensv的使用:https://blog.csdn.net/u012706811/article/details/52779271
    opensv教程:https://www.w3cschool.cn/opencv/opencv-me9i28vh.html
    opensv二值化:https://blog.csdn.net/liyuqian199695/article/details/53925046
    opensv的maven地址:https://mvnrepository.com/artifact/org.openpnp/opencv/3.4.2-0
    opensv图片过滤:https://blog.csdn.net/u012393192/article/details/78528550
    opensv图片修剪:https://blog.csdn.net/sileixinhua/article/details/72811093
    opensv案例含tesserate命令:https://www.cnblogs.com/zhongtang/p/5555950.html
    
    附好文:https://blog.csdn.net/lmj623565791/article/details/23960391

    异常处理:

    1. 加载库异常

    Exception in thread "main" java.lang.UnsatisfiedLinkError: no opencv_java320
    in java.library.path at
    java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) at
    java.lang.Runtime.loadLibrary0(Runtime.java:870) at
    java.lang.System.loadLibrary(System.java:1122) at
    yanZhengMaTest.pikachu.Test.<clinit>(Test.java:38)  

    解决:

    将以下图片位置的路径设置为:G:Program Files (x86)apache-maven epoorgopenpnpopencv3.2.0-0opencv-3.2.0-0 upatternopencvwindowsx86_64(根据自己maven的opencv包地址进行指定)。

    2. jdk版本和opencv版本不匹配(Exception in thread "main" java.lang.UnsatisfiedLinkError: no jniopencv_highgui in java.library.path

    解决:更换opencv版本

    3. 使用命令行生成.tr文件时候出现异常

    Page 406
    Warning. Invalid resolution 1 dpi. Using 70 instead.
    Estimating resolution as 269
    Error during processing.

    解决:可能图片转换格式或者下载的时候出错,将图片替换即可

  • 相关阅读:
    Oracle:SQL语句--对表的操作——修改列的数据类型( 即 修改字段的数据类型)
    Oracle:SQL语句--对表的操作——修改列的数据类型( 即 修改字段的数据类型)
    有一分数序列:2/1,3/2,5/3,8/5,13/8,21/13...求出这个数列的前20项之和
    有一分数序列:2/1,3/2,5/3,8/5,13/8,21/13...求出这个数列的前20项之和
    鸡兔同笼:笼子里一共有鸡和兔子35只,一共有94条退, 笼子里一共有鸡和兔子共多少只
    jQuery,使用on代替delegate,live 写法区别
    安卓手机微信页面position: fixed位置错误
    表单提交是ajax提交,PC提交没问题但是手机提交就会一直跳到error,并且也没状态码一直是0
    新版本的jquery checkbox 全选反选代码只能执行一遍,第二次就失败attr与prop区别
    倒计时js代码
  • 原文地址:https://www.cnblogs.com/zengbojia/p/9470214.html
Copyright © 2020-2023  润新知