pikepdf is a Python library for reading and writing PDF files.

Last update: Jan 03, 2023

Overview

pikepdf

pikepdf is a Python library for reading and writing PDF files.

pikepdf is based on QPDF, a powerful PDF manipulation and repair library.

Python + QPDF = "py" + "qpdf" = "pyqpdf", which looks like a dyslexia test. Say it out loud, and it sounds like "pikepdf".

# Elegant, Pythonic API
with pikepdf.open('input.pdf') as pdf:
    num_pages = len(pdf.pages)
    del pdf.pages[-1]
    pdf.save('output.pdf')

To install:

pip install pikepdf

For users who want to build from source, see installation.

pikepdf is documented and actively maintained. Commercial support is available. We support just about everything x86-64, including PyPy, and Apple Silicon on a best effort basis.

Features

This library is similar to PyPDF2 and pdfrw - it provides low level access to PDF features and allows editing and content transformation of existing PDFs. Some knowledge of the PDF specification may be helpful. It does not have the capability to render a PDF to image.

Feature	pikepdf	PyPDF2	pdfrw
Editing, manipulation and transformation of existing PDFs	✔	✔	✔
Based on an existing, mature PDF library	QPDF	✘	✘
Implementation	C++ and Python	Python	Python
PDF versions supported	1.1 to 1.7	1.3?	1.7
Python versions supported	3.7-3.10 ¹	2.6-3.6	2.6-3.6
Save and load password protected (encrypted) PDFs	✔ (except public key)	✘ (Only obsolete RC4)	✘ (not at all)
Save and load PDF compressed object streams (PDF 1.5)	✔	✘	✘
Creates linearized ("fast web view") PDFs	✔	✘	✘
Actively maintained
Test suite coverage		very low	unknown
Creates PDFs that pass PDF validation tests	✔	✘	?
Modifies PDF/A without breaking PDF/A compliance	✔	✘	?
Automatically repairs PDFs with internal errors	✔	✘	✘
PDF XMP metadata editing	✔	read-only	✘
Documentation	✔	basic	✔
Integrates with Jupyter and IPython notebooks for rapid development	✔	✘	✘

Testimonials

I decided to try writing a quick Python program with pikepdf to automate [something] and it "just worked". –Jay Berkenbilt, creator of QPDF

"Thanks for creating a great pdf library, I tested out several and this is the one that was best able to work with whatever I threw at it." –@cfcurtis

In Production

OCRmyPDF uses pikepdf to graft OCR text layers onto existing PDFs, to examine the contents of input PDFs, and to optimize PDFs.
pdfarranger is a small Python application that provides a graphical user interface to rotate, crop and rearrange PDFs.
PDFStitcher is a utility for stitching PDF pages into a single document (i.e. N-up or page imposition).

License

pikepdf is provided under the Mozilla Public License 2.0 license (MPL) that can be found in the LICENSE file. By using, distributing, or contributing to this project, you agree to the terms and conditions of this license.

Informally, MPL 2.0 is a not a "viral" license. It may be combined with other work, including commercial software. However, you must disclose your modifications to pikepdf in source code form. In other works, fork this repository on GitHub or elsewhere and commit your contributions there, and you've satisfied your obligations. MPL 2.0 is compatible with the GPL and LGPL - see the guidelines for notes on use in GPL.

The debian/copyright file describes licensing terms for the test suite and the provenance of test resources.

pikepdf 3.x and older support Python 3.6. ↩

pikepdf is a Python library for reading and writing PDF files.

Related tags

Overview

pikepdf

Features

Testimonials

In Production

License

Owner

This is PDF Merger Application Developed using Just Python

Excalibur: A web interface to extract tabular data from PDFs

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

Split given PDF document into 4 page groups and convert them to booklet format

Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator

Performing the following operations using python on PDF.

PyMuPDF is a Python binding with support for MuPDF

Convert Lecture Videos to PDF

Extract the table in the PDF，outputs the data similar to the json format

Generate a preview image for a PDF.

Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza.

An application which enables the users to perform simple yet intriguing PDF operations

A tool for certificate PDF generation.

Camelot is a Python library that makes it easy for anyone to extract tables from PDF files

A simple Python script to convert multiple images (well technically also a single image) into a pdf.

Svg2pdfgen - Svg To PDF gen with python

Mipdfcompressor - 💕A simple pdf size compressing telegram robot

Python bindings for MuPDF's rendering library.

Python PDF Parser (Not actively maintained). Check out pdfminer.six.

Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.

pikepdf is a Python library for reading and writing PDF files.

Related tags

Overview

pikepdf

Features

Testimonials

In Production

License

Footnotes

Owner

This is PDF Merger Application Developed using Just Python

Excalibur: A web interface to extract tabular data from PDFs

pystitcher stitches your PDF files together, generating nice customizable bookmarks for you using a declarative markdown file as input

Split given PDF document into 4 page groups and convert them to booklet format

Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator

Performing the following operations using python on PDF.

PyMuPDF is a Python binding with support for MuPDF

Convert Lecture Videos to PDF

Extract the table in the PDF，outputs the data similar to the json format

Generate a preview image for a PDF.

Trata PDF para torná-lo compatível com PDF/X e com impressoras em escala de cinza.

An application which enables the users to perform simple yet intriguing PDF operations

A tool for certificate PDF generation.

Camelot is a Python library that makes it easy for anyone to extract tables from PDF files

A simple Python script to convert multiple images (well technically also a single image) into a pdf.

Svg2pdfgen - Svg To PDF gen with python

Mipdfcompressor - 💕A simple pdf size compressing telegram robot

Python bindings for MuPDF's rendering library.

Python PDF Parser (Not actively maintained). Check out pdfminer.six.

Converting Html files to pdf using python script, pdfkit module and wkhtmltopdf.