Cleaner script to normalize knock's output EPUBs

Overview

clean-epub

The excellent knock application by Benton Edmondson outputs EPUBs that seem to be DRM-free. However, if you run the application twice on the same ACSM file, the hashes do not match.

This script normalizes EPUB files, and it is specifically written to normalize the output files of knock. It strips away all the differences between different EPUB files for the same book.

Usage

./clean-epub.py -i input.epub -o output.epub

Details

In essence, it does this:

  • Create a temporary directory, and unzip the input EPUB into it
  • Set all the access and modification times for files and directories in the temporary directory to a fixed date
  • Zip the contents of the temporary directory into a new EPUB, without adding extra file attributes

I tested it on Ubuntu 20.04, with EPUBs bought from Bol(dot)com.

There are three reasons why you might want to test it more extensively before depending on it:

  • Ebooks from other sellers might contain more identifying details than just the timestamps and order of the files in the zip.
  • The zip implementation of another OS might not deterministically order the input files of the new zip file.
  • Your OS might track more file metadata fields than just access and modification times (not sure if this exists)
Some of the best ways and practices of doing code in Python!

Pythonicness ❤ This repository contains some of the best ways and practices of doing code in Python! Features Properly formatted codes (PEP 8) for bet

Samyak Jain 2 Jan 15, 2022
Tips for Writing a Research Paper using LaTeX

Tips for Writing a Research Paper using LaTeX

Guanying Chen 727 Dec 26, 2022
A course-planning, course-map rendering and GPA-calculation web service, designed for the SFU (Simon Fraser University) student.

SFU Course Planner What is the overall goal of the project (i.e. what does it do, or what problem is it solving)? As the title suggests, this project

Ash Peng 1 Oct 21, 2021
Rust Markdown Parsing Benchmarks

Rust Markdown Parsing Benchmarks This repo tries to assess Rust markdown parsing

Ed Page 1 Aug 24, 2022
A simple XLSX/CSV reader - to dictionary converter

sheet2dict A simple XLSX/CSV reader - to dictionary converter Installing To install the package from pip, first run: python3 -m pip install --no-cache

Tomas Pytel 216 Nov 25, 2022
Course materials and handouts for #100DaysOfCode in Python course

#100DaysOfCode with Python course Course details page: talkpython.fm/100days Course Summary #100DaysOfCode in Python is your perfect companion to take

Talk Python 1.9k Dec 31, 2022
Explicit, strict and automatic project version management based on semantic versioning.

Explicit, strict and automatic project version management based on semantic versioning. Getting started End users Semantic versioning Project version

Dmytro Striletskyi 6 Jan 25, 2022
Coursera learning course Python the basics. Programming exercises and tasks

HSE_Python_the_basics Welcome to BAsics programming Python! You’re joining thousands of learners currently enrolled in the course. I'm excited to have

PavelRyzhkov 0 Jan 05, 2022
A tool that allows for versioning sites built with mkdocs

mkdocs-versioning mkdocs-versioning is a plugin for mkdocs, a tool designed to create static websites usually for generating project documentation. mk

Zayd Patel 38 Feb 26, 2022
A markdown wiki and dashboarding system for Datasette

datasette-notebook A markdown wiki and dashboarding system for Datasette This is an experimental alpha and everything about it is likely to change. In

Simon Willison 19 Apr 20, 2022
Collection of Summer 2022 tech internships!

Collection of Summer 2022 tech internships!

Pitt Computer Science Club (CSC) 15.6k Jan 03, 2023
This repo provides a package to automatically select a random seed based on ancient Chinese Xuanxue

🤞 Random Luck Deep learning is acturally the alchemy. This repo provides a package to automatically select a random seed based on ancient Chinese Xua

Tong Zhu(朱桐) 33 Jan 03, 2023
Clases y ejercicios del curso de python diactodo por la UNSAM

Programación en Python En el marco del proyecto de Inteligencia Artificial Interdisciplinaria, la Escuela de Ciencia y Tecnología de la UNSAM vuelve a

Maximiliano Villalva 3 Jan 06, 2022
ReStructuredText and Sphinx bridge to Doxygen

Breathe Packagers: PGP signing key changes for Breathe = v4.23.0. https://github.com/michaeljones/breathe/issues/591 This is an extension to reStruct

Michael Jones 643 Dec 31, 2022
Essential Document Generator

Essential Document Generator Dead Simple Document Generation Whether it's testing database performance or a new web interface, we've all needed a dead

Shane C Mason 59 Nov 11, 2022
Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

ht-getter Searches a document for hash tags. Supports multiple natural languages. Works in various contexts. This package uses a non-regex approach an

Rairye 1 Mar 01, 2022
level2-data-annotation_cv-level2-cv-15 created by GitHub Classroom

[AI Tech 3기 Level2 P Stage] 글자 검출 대회 팀원 소개 김규리_T3016 박정현_T3094 석진혁_T3109 손정균_T3111 이현진_T3174 임종현_T3182 Overview OCR (Optimal Character Recognition) 기술

6 Jun 10, 2022
k3heap is a binary min heap implemented with reference

k3heap k3heap is a binary min heap implemented with reference k3heap is a component of pykit3 project: a python3 toolkit set. In this module RefHeap i

pykit3 1 Nov 13, 2021
Always know what to expect from your data.

Great Expectations Always know what to expect from your data. Introduction Great Expectations helps data teams eliminate pipeline debt, through data t

Great Expectations 7.8k Jan 05, 2023
Demonstration that AWS IAM policy evaluation docs are incorrect

The flowchart from the AWS IAM policy evaluation documentation page, as of 2021-09-12, and dating back to at least 2018-12-27, is the following: The f

Ben Kehoe 15 Oct 21, 2022