`from docx import Document` 是 Python 中使用 `python-docx` 库读写 `.docx` 文件的常见导入语句

原创于 2026-06-20 00:00:00 发布 · 220 阅读

2 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#python #c# #开发语言

JavaSpace(Shared Memory) 同时被 3 个专栏收录

231 篇文章

订阅专栏

WebSphere(Application Server)

230 篇文章

订阅专栏

MBSE(Model-Based Systems）

217 篇文章

订阅专栏

from docx import Document 是 Python 中使用 python-docx 库读写 .docx 文件的常见导入语句。但需注意：该写法是错误的，正确导入方式应为：

from docx import Document  # ❌ 错误！这是旧版或混淆写法（如误以为是 python-docx 的标准导入）

✅ 正确写法是：

from docx import Document  # ⚠️ 看似一样，但实际取决于安装的包——真正可用的是：
# 实际应安装并导入自 python-docx 包：
from docx import Document  # ✅ 这行本身语法无错，但前提是已正确安装 python-docx（不是 python-docx2 或 docx）

⚠️ 常见误区澄清：

python-docx 是当前主流、维护良好的库（PyPI 包名：python-docx），安装命令：
```
pip install python-docx
```
docx（单独的 docx 包）是过时且不维护的旧库（仅支持 Python 2，不兼容 Python 3.7+），切勿使用。
因此 from docx import Document 若报错 ModuleNotFoundError: No module named 'docx'，说明你可能：
- 未安装 python-docx，或
- 错误安装了废弃的 docx 包（应卸载：pip uninstall docx，再装 python-docx）。

✅ 正确完整示例：

from docx import Document

# 创建新文档
doc = Document()
doc.add_heading('Hello Word', level=1)
doc.add_paragraph('This is a paragraph.')
doc.save('demo.docx')

使用 python-docx 读取已有 Word 文档（.docx）中所有段落的文本，只需加载文档并遍历其 paragraphs 属性。每个段落对象的 .text 属性即为纯文本内容（不含格式、表格、页眉页脚等——注意：段落不包含表格内的文本）。

✅ 正确示例代码：

from docx import Document

def read_all_paragraphs(docx_path):
    try:
        doc = Document(docx_path)
        paragraphs_text = []
        for para in doc.paragraphs:
            paragraphs_text.append(para.text)  # 获取段落纯文本
        return paragraphs_text
    except Exception as e:
        print(f"读取文档失败：{e}")
        return []

# 使用示例
texts = read_all_paragraphs("example.docx")
for i, text in enumerate(texts, 1):
    print(f"段落 {i}: {text}")

⚠️ 重要说明：

✅ doc.paragraphs 仅包含正文区域的段落（即 body 中的段落），不包括页眉、页脚、文本框、脚注、尾注或表格内的文字。
❌ 表格中的文本需单独遍历：for table in doc.tables: → for row in table.rows: → for cell in row.cells: → for paragraph in cell.paragraphs: → paragraph.text
🔁 若需提取全部可见文本（含表格），需合并段落 + 表格遍历（见下方增强版）：

🔧 增强版：提取正文段落 + 所有表格单元格中的段落文本：

def read_full_text(docx_path):
    doc = Document(docx_path)
    full_texts = [p.text for p in doc.paragraphs]
    
    # 遍历所有表格
    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                for paragraph in cell.paragraphs:
                    full_texts.append(paragraph.text)
    
    return [t.strip() for t in full_texts if t.strip()]  # 过滤空行

# 调用
all_texts = read_full_text("example.docx")
print("\n".join(all_texts))