使用 PyPDF2 操作 pdf 文件
Python 中读写 pdf 文件最常用的模块是 PyPDF2。
PyPDF2 将读与写分成两个类来操作:
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open("document1.pdf", "rb"))
如果是要修改一个已有的 pdf 文件,可以将 reader 的页面添加到 writer 中:
writer.appendPagesFromReader(reader)
添加书签:
writer.addBookmark(title, pagenum, parent=parent)
一个包含添加书签方法的类:
# -*- coding: utf-8 -*-
import os
from PyPDF2 import PdfFileWriter, PdfFileReader
class Pdf(object):
def __init__(self, path):
self.path = path
reader = PdfFileReader(open(path, "rb"))
self.writer = PdfFileWriter()
self.writer.appendPagesFromReader(reader)
self.writer.addMetadata(reader.getDocumentInfo())
@property
def new_path(self):
name, ext = os.path.splitext(self.path)
return name + '_new' + ext
def add_bookmark(self, title, pagenum, parent=None):
return self.writer.addBookmark(title, pagenum, parent=parent)
def save_pdf(self):
with open(self.new_path, 'wb') as out:
self.writer.write(out)
官方示例:
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input1 = PdfFileReader(open("document1.pdf", "rb"))
# print how many pages input1 has:
print "document1.pdf has %d pages." % input1.getNumPages()
# add page 1 from input1 to output document, unchanged
output.addPage(input1.getPage(0))
# add page 2 from input1, but rotated clockwise 90 degrees
output.addPage(input1.getPage(1).rotateClockwise(90))
# add page 3 from input1, rotated the other way:
output.addPage(input1.getPage(2).rotateCounterClockwise(90))
# alt: output.addPage(input1.getPage(2).rotateClockwise(270))
# add page 4 from input1, but first add a watermark from another PDF:
page4 = input1.getPage(3)
watermark = PdfFileReader(open("watermark.pdf", "rb"))
page4.mergePage(watermark.getPage(0))
output.addPage(page4)
# add page 5 from input1, but crop it to half size:
page5 = input1.getPage(4)
page5.mediaBox.upperRight = (
page5.mediaBox.getUpperRight_x() / 2,
page5.mediaBox.getUpperRight_y() / 2
)
output.addPage(page5)
# add some Javascript to launch the print window on opening this PDF.
# the password dialog may prevent the print dialog from being shown,
# comment the the encription lines, if that's the case, to try this out
output.addJS("this.print({bUI:true,bSilent:false,bShrinkToFit:true});")
# encrypt your new PDF and add a password
password = "secret"
output.encrypt(password)
# finally, write "output" to document-output.pdf
outputStream = file("PyPDF2-output.pdf", "wb")
output.write(outputStream)
其他接口可参考官方文档
本文介绍如何使用 Python 中的 PyPDF2 模块进行 PDF 文件的读写操作,包括读取已有文件、修改页面以及添加书签等功能。通过示例代码展示 PyPDF2 的基本用法。

3861

被折叠的 条评论
为什么被折叠?



