DietPDF aims at reducing PDF file size while not degrading quality nor losing metadata

Last update: Jul 27, 2022

Related tags

PDF Files Processing dietpdf

Overview

dietpdf

DietPDF aims at reducing PDF file size while not degrading quality nor losing metadata.

Description

DietPDF aims at reducing PDF file size while not degrading quality.

Here are some tricks used to achieve this goal:

Use Zopfli instead of Zlib to get better compression ratio while being compatible with Zlib.
Use JpegTran to optimize and remove unnecessary data from embedded JPEGs.
Use of Run-Length Encoding to help Zopfli achieve better compression.
Use Zopfli on embedded JPEGs, it helps sometimes
Remove unnecessary spaces in the PDF
Converts end of lines to spaces in Form Objects or Contents (this helps compression)

It also comes with extractpdf which extract all the streams contained in a PDF file.

Notes

This program is not ready for production!

It does not support cross-reference objects for the moment.

This project has been set up using PyScaffold 3.3.1. For details and usage information on PyScaffold see https://pyscaffold.org/.

Requirements

This is plain Python 3 using (quite) only standard libraries.

It uses the following external programs:

zopfli (apt install zopfli)
jpegtran (apt install libjpeg-turbo-progs)

Installation

In dietpdf directory:

pip3 install .

python3 setup.py install --home=~

DietPDF aims at reducing PDF file size while not degrading quality nor losing metadata

Related tags

Overview

dietpdf

Description

Notes

Requirements

Installation

Owner

Frédéric BISSON

Produce pdf in python backend from simple bootstrap vue frontend and download to browser

Small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

pikepdf is a Python library for reading and writing PDF files.

Split given PDF document into 4 page groups and convert them to booklet format

WeasyPrint is a smart solution helping web developers to create PDF documents.

Camelot is a Python library that makes it easy for anyone to extract tables from PDF files

Auto Convert PDFs to png files in python

Compare-pdf - A Flask driven restful API for comparing two PDF files

Performing the following operations using python on PDF.

Extract the table in the PDF，outputs the data similar to the json format

x-ray is a Python library for finding bad redactions in PDF documents.

rst2pdf: Use a text editor. Make a PDF.

Simple pdf editor while preserving structure and format.

Simple HTML and PDF document generator for Python - with built-in support for popular data analysis and plotting libraries.

JoplinPdf2Images - Converts a PDF to images in Joplin and adds it to the specified note as a printout

Python bindings for MuPDF's rendering library.

Mipdfcompressor - 💕A simple pdf size compressing telegram robot

Python PDF Parser (Not actively maintained). Check out pdfminer.six.

Generate a bunch of malicious pdf files with phone-home functionality. Can be used with Burp Collaborator