Html to PDF - 润新知

Html to PDF
更新: 2020-05-19 print css

refer https://www.html.cn/archives/4731

网页要打印，通常需要写另一分 css 样式. 把 px 转换成 cm 或者 inch

做 pdf 其实也是一样的道理.

更新 : 2020-02-09

升级到 3.1 后发现 asp.net core 要废弃 NodeService 了

https://github.com/dotnet/aspnetcore/issues/12890

微软给出的理由是, 当初设计 NodeService 的目的是为了解决 server side render 的问题

而如今他们已经有了其它方案. 卧槽...

然后他们要社区自己维护... 卧槽...说甩就甩

幸好...有个奇人早在 2 年前就已经写了一个社区版的... 真不明白他怎么想的...

官方的不用，自己跑去写...难道有先知？

https://github.com/JeringTech/Javascript.NodeJS

然后就很顺利的 migrations 过去了.

更新 : 2019-06-27

后端制作好 pdf 后，下一个工作就是让前端下载或者 print 了.

下载可以用 <a href="123.pdf" download="abc.pdf" > 的方式

print 的话, 可以用 iframe
```
const a = document.createElement('a');
a.href = '123.pdf';
a.download = 'abc.pdf';
a.click();

const iframe = document.createElement('iframe');
document.body.appendChild(iframe);
iframe.src = 'abc.pdf';
iframe.onload = () => {
  iframe.focus();
  iframe.contentWindow!.print();
};
```
比较常遇到的问题是跨域,

iframe 不支持 allow origin 所以我们不可以像处理 ajax 那样去跨域

<a> 不会出现 error 但是不会下载只会开多一个 new tab

一个通用的方法是先用 ajax + allow origin 去把 pdf 拿下来. 获取到 Blob

然后用 createObjectUrl 来实现
```
const result = await this.httpClient.get('http://192.168.1.152:61547/123.pdf', {
  observe: 'response',
  responseType: 'blob'
}).toPromise();

const url = window.URL.createObjectURL(result.body);
const a = document.createElement('a');
a.href = url;
a.download = '123.pdf';
a.click();

const iframe = document.createElement('iframe');
document.body.appendChild(iframe);
iframe.src = url;
iframe.onload = () => {
  iframe.focus();
  iframe.contentWindow!.print();
};
```
这样就是本地资源了.

针对 <a> 还有一个方法就是不要使用静态资源路径, 你改成比如... /api/download-file 然后后端返回 file 是可以下载的哦.

refer :

https://stackoverflow.com/questions/28318017/html5-download-attribute-not-working-when-downloading-from-another-server-even

https://github.com/crabbly/Print.js/issues/95

https://segmentfault.com/a/1190000015597029

做企业项目，经常会需要保存或输出 pdf 格式。

比如 invoice, purchase order 等等。

pdf 是 90 年 adobe 公司设计的, 08 年开放.

制作 pdf 不难，但是互联网时代，大多数文本都用 html 格式来写.

要弄一份一模一样的 pdf 出来就 double work 了。

于是就有了 html to pdf 的需求。

市场上已经有好多公司做这个出来卖钱了。

比如:

https://www.nrecosite.com/pdf_generator_net.aspx#

https://www.paperplane.app/pricing

价格都不便宜呢.

html 是 93 年问世的, 很长一段时间, 游览器解析引擎都是闭源的. 所以要做 html to pdf 是蛮累的，要解析 html css。

直到 04 年苹果开源了 safari 的引擎 webkit。

就有人基于 webkit 做出了 webkit html to pdf

https://wkhtmltopdf.org/

只要安装 webkit 然后通过 commond 调用一下就可以把 html 变成 pdf 文件了，开源的哦。

.net https://github.com/rdvojmoc/DinkToPdf 就是基于 wkhtmltopdf 的封装

有了前车之鉴, 后来的 phantomjs 也有 html to pdf 的功能

jsreport 早年就是基于 phantomjs 实现的 to html pdf 功能

https://github.com/jsreport/jsreport

https://github.com/jsreport/jsreport-dotnet (.net 版本)

一直到 2017 年, chromium 推出了 headless 版.

chromium 是 google 基于 webkit 的 folk, 现在 chrome 用着的是 blink, chromuim 算是前身吧.

这个 chromium headless 可厉害了, google 出品, phantomjs 作者随后就宣布不在维护了，让位给 chromium.

chromium 自然是可以实现 html to pdf 的了. jsreport 后来的版本也切换到 chromium 了.

chromium 的底层接口不太友好，google 自己又推出了 Puppeteer

这是 js 的库, 可以用 nodejs 运行, 它是 chromium 友好的 api. 几行代码就可以实现各做游览器行为. 当然就包括了 html to pdf。

.net core 和 nodejs 是好朋友, 要在 core 调用 nodejs 的 package 是很简单的.

所以又有了 https://github.com/kblok/puppeteer-sharp

我就是使用 asp.net core node service + Puppeteer 来实现 html to pdf 的.

上面说的都是 server side 的做法

client side 也有一个做法.

用的是一个叫 jsPDF 的插件, 它是做 pdf 的插件, 而不是 html to pdf

实现手法是通过打开 browser 使用 url data:application/pdf 的方式输出 base64

游览器支持这种 data 直接打开 file 的功能，不只 pdf, image 也是可以这样打开.

如果要 html to pdf 可以配上一个 html to canvas 的插件

把 html 转成 canvas 然后变成图片在输入进 pdf ( 当然这样文字就变成不可以 search 了 )

目前这个方案问题还是比较多的, 比如 htmltocanvas 依然处于不稳定版本，而且维护也很弱.

记入一下 Puppeteer 的一些东西
```
module.exports = function (callback, bodyTemplate, format, headerTemplate, footerTemplate) {
    const puppeteer = require('puppeteer');
    (async () => {
        const browser = await puppeteer.launch();
        const page = await browser.newPage();

        //await page.goto('https://www.stooges.com.my');
        //await page.screenshot({ path: 'example.png' });
        //await page.goto('https://www.stooges.com.my', { waitUntil: 'networkidle2' });

        await page.setContent(htmlTemplate, { waitUntil: 'networkidle2' });
        await page.pdf({
            //printBackground: true,
            path: 'demo.pdf',
            format: format, //format or follow width 792 & height 1121, got footer gap... bug 
            landscape: (format === 'A5') ? true : false,
            margin: { top: 125, right: 16, bottom: 60, left: 32 },// 左右会缩小比例, 上下不会; 如果宽没有hit 到paper的宽就不会缩小; top bottom min 21 for display element
            displayHeaderFooter: true,
            headerTemplate: headerTemplate, //header no backgroung color
            footerTemplate: footerTemplate
        });
        await browser.close();
        //console.log(process.version); // check node version 
        callback(null);
    })();
}
```
流程大概就是开一个 browser，开一个 page，然后访问你要的网址或者直接写入 html

然后调用 pdf 方法.

当你设定 format A4 时, 你的 width height 就自动设定了, 因该是 792x1121px

header 的开始时 margin 21，也就是说如果你的 header 内容时 50, 那么你得要 set margin 71 才可以哦

还有一个比较严重的是 header 和 footer 的字体和 width 都会比 body 大

比如一个 div 在 header 和 body 都 set 500px, 但是出来的效果就是 header 比较大

大多少呢？大概是 4/3 . github 还有 issues 讨论这个鬼呢.

更新 : 2020-05-19 上次忘了记入 github follow 了

refer :

https://github.com/puppeteer/puppeteer/issues/4132 (header default margin)

https://github.com/puppeteer/puppeteer/issues/3672 (header default margin)

https://github.com/puppeteer/puppeteer/issues/1853 (header 比较大)

最后说一说 node_modules 的事儿.

我也是做前端的, 开发前端多少会接触 node_modules

这里说说，前端和后端的小区别.

1. 后端不需要打包

前端依赖的库也是存放在 node_modules 里的, 然后通过 webpack 去打包出来

做后端就不需要去打包了，打包的目的是为了减少下载, 在服务端 require 另一个模块是很快的.

2. node_modules 需要在 server side

前端打包以后，我们的服务器是不需要存在 node_modules 这个文档的，但是如果是 nodejs 就需要

我们对 node_modules 的印象是...很大很大，很多文件.

但其实很多是 dev 才需要的, 所以在 server side npm install 的时候记得写上 --production

只使用 puppeteer 的话，大概 3x 个 files 而已.

3. puppeteer 自带的 chromium

安装 puppeteer 时它会去安装一个 chromium, 放在 node module 里.

如果有多个项目，你可以做一个 share 的 chromium

调用的时候放一个 path 就可以了, 我们可以通过 Environment Variables 来设置这些 (比如要不要 skip download chromium 和 chromium 的路径等等...)
```
const browser = await puppeteer.launch({
    executablePath: 'C:/keatkeat/my projects/asp.net core/2.2/html-to-pdf/Project/.local-chromium/win64-669486/chrome-win/chrome.exe'
});
```
参考资源 :

https://wkhtmltopdf.org/

https://www.nrecosite.com/pdf_generator_net.aspx#

https://github.com/wkhtmltopdf/wkhtmltopdf

https://github.com/rdvojmoc/DinkToPdf

https://github.com/jsreport/jsreport

https://github.com/jsreport/jsreport-dotnet

https://github.com/kblok/puppeteer-sharp

https://www.paperplane.app/blog/modern-html-to-pdf-conversion-2019?utm_source=medium&utm_campaign=redirect

https://www.paperplane.app/pricing

https://juejin.im/entry/5ac1e7c05188257ddb0fc853

https://blog.risingstack.com/pdf-from-html-node-js-puppeteer/

https://zhaoqize.github.io/puppeteer-api-zh_CN/#/

https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagepdfoptions

http://www.rotisedapsales.com/snr/cloud2/website/jsPDF-master/docs/jsPDF.html

https://github.com/niklasvh/html2canvas

https://dev.to/damcosset/generate-a-pdf-from-html-and-back-on-the-front-5ff5

https://developers.google.com/web/updates/2017/04/headless-chrome

https://zhuanlan.zhihu.com/p/33015883

https://www.jianshu.com/p/db1b230e3415

https://stackoverflow.com/questions/564650/convert-html-to-pdf-in-net

https://www.quora.com/What-is-the-best-HTML-to-PDF-converter-tool

https://www.reddit.com/r/dotnet/comments/7i6ljt/what_is_currently_the_best_way_to_convert_html_to/
相关阅读:
关于程序出现 “因为应用程序正在发送一个输入同步呼叫,所以无法执行传出的呼叫”
循环物理依赖
 重新生成执行计划
 SQL SERVER 2008 存储过程传表参数
 关于operator void* 操作符
 关于C++编译时内链接和外链接
 低级键盘钩子，在WIN7以上版本的问题
 关于SendMessage和PostMessage的理解的例子
 一个简单代码
 GET 和 POST 比较整理
原文地址：https://www.cnblogs.com/keatkeat/p/11078480.html