• java 库 pdfbox 将 pdf 文件转换成高清图片方法


    近期需要将 pdf 文件转成高清图片,使用库是 pdfbox、fontbox。可以使用 renderImageWithDPI 方法指定转换的清晰度,当然清晰度越高,转换需要的时间越长,转换出来的图片越大,越清晰。

    说明:由于 adobo 软件越来越强大,支持的格式越来越多,这造成了 java 软件有些不能转换。所以对于新的格式可能会有转换问题。

    1 引入依赖

    <dependency>
                <groupId>org.apache.pdfbox</groupId>
                <artifactId>pdfbox</artifactId>
                <version>2.0.16</version>
            </dependency>
            <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
            <dependency>
                <groupId>org.apache.pdfbox</groupId>
                <artifactId>fontbox</artifactId>
                <version>2.0.16</version>
            </dependency>

    2 代码如下

    public static void convertPdf2Image(String pdfPath, String imageDirPath) {
            log.info("start convert pdf file:[{}] to image path:[{}]", pdfPath, imageDirPath);
            if (!new File(pdfPath).exists()) {
                log.info("pdfFilename:[{}] not exist", pdfPath);
                return;
            }
            if (!new File(imageDirPath).exists()) {
                log.info("imageDir:[{}] not exist", imageDirPath);
                return;
            }
            byte[] pdfContent = FileUtil.getFileContentByte(pdfPath);
            String filename = FileUtil.getFilename(pdfPath);
            float dpi = 200;
            convertPdf2Image(pdfContent, filename, imageDirPath, dpi);
            log.info("convert pdf file:[{}] to image success", filename);
        }
    
    private static void convertPdf2Image(byte[] pdfContent, String pdfFilename, String imageDirPath, float dpi) {
            log.info("convert pdfFilename:[{}] to imageDir:[{}] with dpi:[{}]", pdfFilename, imageDirPath, dpi);
            if (ArrayUtils.isEmpty(pdfContent)) {
                return;
            }
            // 为了保证显示清除,至少 90
            if (dpi < 90) {
                dpi = 90;
            }
            String baseSir = imageDirPath;
            if (baseSir.endsWith("/") || baseSir.endsWith("\")) {
                baseSir += pdfFilename + "_";
            } else {
                baseSir += File.separator + pdfFilename + "_";
            }
            PDDocument document = null;
            BufferedOutputStream outputStream = null;
            try {
                document = PDDocument.load(pdfContent);
                int pageCount = document.getNumberOfPages();
                PDFRenderer pdfRenderer = new PDFRenderer(document);
                String imgPath;
                for (int i = 0; i < pageCount; i++) {
                    imgPath = baseSir + i + ".png";
                    outputStream = new BufferedOutputStream(new FileOutputStream(imgPath));
                    BufferedImage image = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
                    ImageIO.write(image, "png", outputStream);
                    outputStream.close();
                    log.info("convert to png, total[{}], now[{}], ori:[{}], des[{}]", pageCount, i + 1, pdfFilename, imgPath);
                }
            } catch (IOException e) {
                log.error("convert pdf to image error, pdfFilename:" + pdfFilename, e);
            } finally {
                IOUtil.closeSilently(outputStream);
                IOUtil.closeSilently(document);
            }
        }
    
    // IOUtil.closeSilently 代码
    public static void closeSilently(Closeable io) {
            if (io != null) {
                try {
                    io.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

    在实际使用中遇到问题

    1)ERROR o.a.p.contentstream.PDFStreamEngine 911 - Cannot read JBIG2 image: jbig2-imageio is not installed

    2)Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed

    3) java.lang.IllegalArgumentException: Numbers of source Raster bands and source color space components do not match at java.awt.image.ColorConvertOp.filter

    以上两个问题需要使用 JAI 插件和 jbig2 插件支持,通过引入 jai-imageio-core、jai-imageio-jpeg2000、jbig2-imageio

    <dependency>
    <groupId>com.twelvemonkeys.imageio</groupId>
    <artifactId>imageio-jpeg</artifactId>
    <version>3.4.2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.github.jai-imageio/jai-imageio-core -->
    <dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-core</artifactId>
    <version>1.4.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.github.jai-imageio/jai-imageio-jpeg2000 -->
    <dependency>
    <groupId>com.github.jai-imageio</groupId>
    <artifactId>jai-imageio-jpeg2000</artifactId>
    <version>1.3.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/jbig2-imageio -->
    <dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>jbig2-imageio</artifactId>
    <version>3.0.2</version>
    </dependency>

    参考问题文件

    https://github.com/crazyCodeLove/studentservice/blob/master/sys/src/main/resources/pdffile/000208-p1.pdf

    https://github.com/crazyCodeLove/studentservice/blob/master/sys/src/main/resources/pdffile/001659-p14.pdf

    https://github.com/crazyCodeLove/studentservice/blob/master/sys/src/main/resources/pdffile/main%20doc.pdf

    https://github.com/crazyCodeLove/studentservice/blob/master/sys/src/main/resources/pdffile/573636.pdf

    参考文献

    https://stackoverflow.com/questions/42169154/pdfbox1-8-12-convert-pdf-to-white-page-image

    https://stackoverflow.com/questions/20424796/pdf-box-generating-blank-images-due-to-jbig2-images-in-it

    https://blog.csdn.net/qq_15801963/article/details/80746830

    https://my.oschina.net/u/2345654/blog/1058192

    https://stackoverflow.com/questions/18351583/illegalargumentexception-numbers-of-source-raster-bands-and-source-color-space

    https://stackoverflow.com/questions/10416378/imageio-read-illegal-argument-exception-raster-bands-colour-space-components

  • 相关阅读:
    RAID技术
    敏捷开发
    如何写出高质量的代码?现在知道还不晚
    Java大型互联网架构技术经验
    Chrome精品插件
    2018 java BAT最新面试宝典
    Java成神之路(2018版)
    三分钟读懂摘要算法
    我的Mac应用清单
    事务隔离级别
  • 原文地址:https://www.cnblogs.com/zhaopengcheng/p/11377458.html
Copyright © 2020-2023  润新知