python读取docx内容

python读取docx内容

环境：PyCharm python3.7

需要下载python-docx、docx（打开Settings->Project Interpreter->+搜索docx、python-docx install即可）

来源：https://blog.csdn.net/xtfge0915/article/details/83479922

#获取文章全部内容

doc=docx.Document('D:\Users\Administrator\PycharmProjects\BigData\Detail\a.docx')

一级标题

for p in doc.paragraphs:

if p.style.name=='Heading 1':

print(p.text)

#二级标题

for p in doc.paragraphs:

if p.style.name=='Heading 2':

print(p.text)

#所有标题

import re

for p in doc.paragraphs:

if re.match("^Heading d+$",p.style.name):

print(p.text)

#所有内容

for p in doc.paragraphs:

if p.style.name=='Normal':

print(p.text)

#从前面可以看出，如果知道不同内容的style.name，那么要读这些内容是极其方便的，这些style.name可以通过：

#print(p.style.name)得到

for p in doc.paragraphs:

if p.style.name=='级别3：黑体 13磅 20行距段落前后20 左对齐':

print(p.text)

#输出对应内容
相关阅读:
Windows 下ftp命令基本使用
 Oracle学习笔记：oracle和serverver在过程sql中通过select对变量进行赋值的区别
 分享最新36款高质量免费英文字体
 分享31个漂亮的矢量背景素材
 30个使用大自然元素设计的 Logo 欣赏
 分享5款精美的WordPress免费主题
 分享最新40个很不错的 PSD 资源
 WordPress精美免费主题分享系列之杂志风格篇
 分享25个很棒的网页设计教程和资源网站
 分享50个 CSS3 最佳应用示例
原文地址：https://www.cnblogs.com/watm/p/10570158.html