python 使用 wechatsogou wkthmltopdf 导出微信公众号文章

python 使用 wechatsogou wkthmltopdf 导出微信公众号文章
1.安装wkhtmltopdf

下载地址:https://wkhtmltopdf.org/downloads.html

我测试用的是windows的，下载安装后结果如下

 2 编写python 代码导出微信公众号文章

不能直接使用wkhtmltopdf 导出微信公众号文章，导出的文章会缺失图片，所以需要使用 wechatsogou 将微信公众号文章页面抓取，之后将html文本转化为pdf

pip install wechatsogou --upgrade

pip install pdfkit

踩坑！！！，看了很多人的代码，都是一个模板，大家都是抄来抄去，结果还是运行不了，可能是因为依赖包更新的原因，也可能是因为我本地没有配置wkhtmltopdf 的环境变量
1. import os
2. import pdfkit
3. import datetime
4. import wechatsogou
5. # 初始化API
7. ws_api = wechatsogou.WechatSogouAPI(captcha_break_time=3)
8. def url2pdf(url, title, targetPath):
9. '''
10. 使用pdfkit生成pdf文件
11. :param url: 文章url
12. :param title: 文章标题
13. :param targetPath: 存储pdf文件的路径
14. '''
15. try:
16. content_info = ws_api.get_article_content(url)
17. except:
18. return False
19. # 处理后的html
20. html = f'''
21. <!DOCTYPE html>
22. <html lang="en">
23. <head>
24. <meta charset="UTF-8">
25. <title>{title}</title>
26. </head>
27. <body>
28. <h2 style="text-align: center;font-weight: 400;">{title}</h2>
29. {content_info['content_html']}
30. </body>
31. </html>
32. '''
33. try:
34. path_wk="E:/softwareAPP/wkhtmltopdf/bin/wkhtmltopdf.exe";
35. config=pdfkit.configuration(wkhtmltopdf=path_wk)
36. pdfkit.from_string(input=html, output_path=targetPath,configuration=config)
38. except:
39. # 部分文章标题含特殊字符，不能作为文件名
40. filename = datetime.datetime.now().strftime('%Y%m%d%H%M%S') + '.pdf'
41. pdfkit.from_string(html, targetPath + os.path.sep + filename)
45. if __name__ == '__main__':
46. # 此处为要爬取公众号的名称
48. url2pdf("https://mp.weixin.qq.com/s/wwT5n2JwEEAkrrmOhedziw", "HBase的系统架构全视角解读","G:/test/hbase文档.pdf" )
49. # gzh_name = ''
50. # # 如果不存在目标文件夹就进行创建
51. # if not os.path.exists(targetPath):
52. # os.makedirs(targetPath)
53. # # 将该公众号最近10篇文章信息以字典形式返回
54. # data = ws_api.get_gzh_article_by_history(gzh_name)
55. # article_list = data['article']
56. # for article in article_list:
57. # url = article['content_url']
58. # title = article['title']
59. # url2pdf(url, title, targetPath
  
  本文首发于python黑洞网，博客园同步更新
相关阅读:
pycharm 对mysql的可视化操作
 pycharm连接linux创建django工程
 linux上安装pycharm
pycharm激活码
 Windows下安装pip
migrate设置
 python相对目录的基本用法(一)
pycharm设置连接github
在shell终端操作oracle数据库的常用命令
 在windows中把一个文件夹打成war包
原文地址：https://www.cnblogs.com/pythonzhilian/p/13584391.html

python 使用 wechatsogou wkthmltopdf 导出微信公众号文章

1.安装wkhtmltopdf

2 编写python 代码导出微信公众号文章