A Python module and command line utility for working with web archive data using the WACZ format specification

Overview

py-wacz

The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification. Web Archive Collection Zipped (WACZ) allows web archives to be shared and distributed by providing a predictable way of packaging up web archive data and metadata as a ZIP file. The wacz command line utility supports converting any WARC files into WACZ files, and optionally generating full-text search indices of pages.

Install

Use pip to install the module and a command line utility:

pip install wacz

Once installed you can use the wacz command line utility to create and validate WACZ files.

Create

To create a WACZ package you can point wacz at a WARC file and tell it where to write the WACZ with the -o option:

wacz create -o myfile.wacz 
   

   

The resulting myfile.wacz should be loadable via ReplayWeb.page.

wacz accepts the following options for customizing how the WACZ file is assembled.

-f --file

Explicitly declare the file being passed to the create function.

wacz create -f tests/fixtures/example-collection.warc

-o --output

Explicitly declare the name of the wacz being created

wacz create tests/fixtures/example-collection.warc -o mywacz.wacz

-t --text

Generates pages.jsonl page index with a full-text index, must be run in conjunction with --detect-pages. Will have no effect if run alone

wacz create tests/fixtures/example-collection.warc -t

--detect-pages

Generates pages.jsonl page index without a full-text index

wacz create tests/fixtures/example-collection.warc --detect-pages

-p --pages

Overrides the pages index generation with the passed jsonl pages.

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl

-t --text

You can add a full text index by including the --text tag

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl --text

--ts

Overrides the ts metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --ts TIMESTAMP

--url

Overrides the url metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --url URL

--title

Overrides the titles metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --title TITLE

--desc

Overrides the desc metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --desc DESC

--hash-type

Allows the user to specify the hash type used: (sha256 or md5):

wacz create tests/fixtures/example-collection.warc --hash-type md5

Validate

You can also validate an existing WACZ file by running:

wacz validate myfile.wacz

-f --file

Explicitly declare the file being passed to the validate function.

wacz validate -f tests/fixtures/example-collection.warc

Testing

If you are developing wacz you can run the unit tests with pytest:

pytest tests
Owner
Webrecorder
Webrecorder Project
Webrecorder
A simple CLI tool for getting region-specific status of Logz.io components.

About A simple CLI tool for checking the current status of Logz.io components per region. Built With Python 3 The following packeges (see requirements

Yotam Bernaz 1 Dec 11, 2021
Python commandline tool for remembering linux/terminal commands

ehh Remember linux commands Commandline tool for remembering linux/terminal commands. It stores your favorite commands in ~/ehh.json in your homedir a

56 Nov 10, 2022
A lightweight Python module and command-line tool for generating NATO APP-6(D) compliant military symbols from both ID codes and natural language names

Python military symbols This is a lightweight Python module, including a command-line script, to generate NATO APP-6(D) compliant military symbol icon

Nick Royer 5 Dec 27, 2022
Access hacksec.in from your command-line

Access hacksec.in from your command-line

hacksec.in 3 Oct 26, 2022
A command line tool made in Python for the popular rhythm game

osr!name A command line tool made in Python for the popular rhythm game "osu!" that changes the player name of a .osr file (replay file). Example: Not

2 Dec 28, 2021
Shazam is a Command Line Application that checks the integrity of the file by comparing it with a given hash.

SHAZAM - Check the file's integrity Shazam is a Command Line Application that checks the integrity of the file by comparing it with a given hash. Crea

Anaxímeno Brito 1 Aug 21, 2022
A CLI minesweeper application written in 60 LoC python

This is a CLI minesweeper application written in 60 LoC python. You can use d row,column to dig and f row,column to flag/unflag

1 Dec 21, 2021
Project scoped command execution to just do your work

Judoka is a command line utility that lets you define project scoped commands and call them through their alias. It lets you just do (= judo) your work.

Eelke van den Bos 2 Dec 17, 2021
A python-based terminal application that displays current cryptocurrency prices

CryptoAssetPrices A python-based terminal application that displays current cryptocurrency prices. Covered Cryptocurrencies Bitcoin (BTC) Ethereum (ET

3 Apr 21, 2022
A simple CLI to convert snapshots into EAVT log, and EAVT log into SCD.

EAVT helper CLI Simple CLI to convert snapshots into eavt log, and eavt log into slowly changing dimensions Usage Installation Snapshot to EAVT log EA

2 Apr 07, 2022
xonsh is a Python-powered, cross-platform, Unix-gazing shell language and command prompt.

xonsh xonsh is a Python-powered, cross-platform, Unix-gazing shell language and command prompt. The language is a superset of Python 3.6+ with additio

xonsh 6.7k Jan 08, 2023
Python-based implementation and comparison of strategies to guess words at Wordle

Solver and comparison of strategies for Wordle Motivation The goal of this repository is to compare, in terms of performance, strategies that minimize

Ignacio L. Ibarra 4 Feb 16, 2022
🎮 An easy to use tool to change the mapping of your input device buttons.

Input Remapper Formerly Key Mapper An easy to use tool to change the mapping of your input device buttons. Supports mice, keyboards, gamepads, X11, Wa

Tobi 1.9k Jan 05, 2023
PyDropper - pick colors everywhere

PyDropper - pick colors everywhere Downloads Settings PyDropper is an eyedropper

Herman Brunberg 2 Jan 04, 2022
This is a CLI program which can help you generate your own QR Code.

Python-QR-code-generator This is a CLI program which can help you generate your own QR Code. Single.py This will allow you only to input a single mess

1 Dec 24, 2021
A cd command that learns - easily navigate directories from the command line

NAME autojump - a faster way to navigate your filesystem DESCRIPTION autojump is a faster way to navigate your filesystem. It works by maintaining a d

William Ting 14.5k Jan 03, 2023
An awesome Python wrapper for an awesome Docker CLI!

An awesome Python wrapper for an awesome Docker CLI!

Gabriel de Marmiesse 303 Jan 03, 2023
EODAG is a command line tool and a plugin-oriented Python framework for searching, aggregating results and downloading remote sensed images while offering a unified API for data access regardless of the data provider

EODAG (Earth Observation Data Access Gateway) is a command line tool and a plugin-oriented Python framework for searching, aggregating results and downloading remote sensed images while offering a un

CS GROUP 205 Jan 03, 2023
A terminal slots programme in PY

PYSlots PyPI and Test PyPI External Links PyPI Test PyPI Install Look directly at the bugs! Version pip install pyslots "Don't look directly at the bu

Luke Batema 4 Nov 30, 2022
This is an app for creating your own color scheme for Termux!

Termux Terminal Theme Creator [WIP] If you need help on how to use the program, you can either create a GitHub issue or join this temporary Discord se

asxlvm 3 Dec 31, 2022