A Python module and command line utility for working with web archive data using the WACZ format specification

Overview

py-wacz

The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification. Web Archive Collection Zipped (WACZ) allows web archives to be shared and distributed by providing a predictable way of packaging up web archive data and metadata as a ZIP file. The wacz command line utility supports converting any WARC files into WACZ files, and optionally generating full-text search indices of pages.

Install

Use pip to install the module and a command line utility:

pip install wacz

Once installed you can use the wacz command line utility to create and validate WACZ files.

Create

To create a WACZ package you can point wacz at a WARC file and tell it where to write the WACZ with the -o option:

wacz create -o myfile.wacz 
   

   

The resulting myfile.wacz should be loadable via ReplayWeb.page.

wacz accepts the following options for customizing how the WACZ file is assembled.

-f --file

Explicitly declare the file being passed to the create function.

wacz create -f tests/fixtures/example-collection.warc

-o --output

Explicitly declare the name of the wacz being created

wacz create tests/fixtures/example-collection.warc -o mywacz.wacz

-t --text

Generates pages.jsonl page index with a full-text index, must be run in conjunction with --detect-pages. Will have no effect if run alone

wacz create tests/fixtures/example-collection.warc -t

--detect-pages

Generates pages.jsonl page index without a full-text index

wacz create tests/fixtures/example-collection.warc --detect-pages

-p --pages

Overrides the pages index generation with the passed jsonl pages.

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl

-t --text

You can add a full text index by including the --text tag

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl --text

--ts

Overrides the ts metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --ts TIMESTAMP

--url

Overrides the url metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --url URL

--title

Overrides the titles metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --title TITLE

--desc

Overrides the desc metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --desc DESC

--hash-type

Allows the user to specify the hash type used: (sha256 or md5):

wacz create tests/fixtures/example-collection.warc --hash-type md5

Validate

You can also validate an existing WACZ file by running:

wacz validate myfile.wacz

-f --file

Explicitly declare the file being passed to the validate function.

wacz validate -f tests/fixtures/example-collection.warc

Testing

If you are developing wacz you can run the unit tests with pytest:

pytest tests
Owner
Webrecorder
Webrecorder Project
Webrecorder
Python3 command-line tool for the inference of Boolean rules and pathway analysis on omics data

BONITA-Python3 BONITA was originally written in Python 2 and tested with Python 2-compatible packages. This version of the packages ports BONITA to Py

1 Dec 22, 2021
A simple and easy-to-use CLI parse tool.

A simple and easy-to-use CLI parse tool.

AbsentM 1 Mar 04, 2022
Plumbum: Shell Combinators

Plumbum: Shell Combinators Ever wished the compactness of shell scripts be put into a real programming language? Say hello to Plumbum Shell Combinator

Tomer Filiba 2.5k Dec 28, 2022
Another (unofficial) Qt CLI Installer on multi-platforms

Another Qt installer(aqt) Release: Documentation: Test status: and Coverage: This is a utility alternative to the official graphical Qt installer, for

Hiroshi Miura 528 Jan 02, 2023
Dart Version Manager CLI implemented with Python and Typer.

Dart Version Manager CLI implemented with Python and Typer.

EducUp 6 Jun 26, 2022
Double Pendulum visualised with fetching system information in Python.

Show off your terminal, in style. A nice relaxing double pendulum simulation using ASCII, able to simulate multiple pendulums at once, and provide tra

Nekurone 62 Dec 14, 2022
Commandline Python app to Autodownload mediafire folders and files.

Commandline Python app to Autodownload mediafire folders and files.

Tharuk Renuja 3 May 12, 2022
Command-line script to upload videos to Youtube using theYoutube APIv3.

Introduction Command-line script to upload videos to Youtube using theYoutube APIv3. It should work on any platform (GNU/Linux, BSD, OS X, Windows, ..

Arnau Sanchez 1.9k Jan 09, 2023
Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

Baselining, on steroids! Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems. The proje

Nelson 4 Dec 09, 2022
Python-based implementation and comparison of strategies to guess words at Wordle

Solver and comparison of strategies for Wordle Motivation The goal of this repository is to compare, in terms of performance, strategies that minimize

Ignacio L. Ibarra 4 Feb 16, 2022
This is a Command Line program to interact with your NFTs, Cryptocurrencies etc

This is a Command Line program to interact with your NFTs, Cryptocurrencies etc. via the ThirdWeb Platform. This is just a fun little project that I made to be able to connect to blockchains and Web3

Arpan Pandey 5 Oct 02, 2022
Vsm - A manager for the under-utilized mksession command in vim

Vim Session Manager A manager for the under-utilized `mksession` command in vim

Matt Williams 3 Oct 12, 2022
Pyreadline3 - Windows implementation of the GNU readline library

pyreadline3 The pyreadline3 package is based on the stale package pyreadline loc

32 Jan 06, 2023
Openstack bucket retention cli

Openstack bucket retention cli

Fatih Sarhan 3 Apr 03, 2022
liquidctl – liquid cooler control Cross-platform tool and drivers for liquid coolers and other devices

Cross-platform CLI and Python drivers for AIO liquid coolers and other devices

1.7k Jan 08, 2023
A simple command line virtual operating system, written in python

Virtual operating system A simple virtual operating system written in python. (Under development). Currently, the following commands are supported: Co

B.Jothin kumar 7 Nov 15, 2022
f90nml - A Fortran namelist parser, generator, and editor

f90nml - A Fortran namelist parser, generator, and editor A Python module and command line tool for parsing Fortran namelist files Documentation The c

Marshall Ward 110 Dec 14, 2022
Wordle-textual - Play Wordle from the CLI, using Textual

Wordle, playable from the CLI This project seeks to emulate Wordle in your shell

PhenoM4n4n 3 Mar 29, 2022
xonsh is a Python-powered, cross-platform, Unix-gazing shell language and command prompt.

xonsh xonsh is a Python-powered, cross-platform, Unix-gazing shell language and command prompt. The language is a superset of Python 3.6+ with additio

xonsh 6.7k Jan 08, 2023
A command line tool (and Python library) for archiving Twitter JSON

A command line tool (and Python library) for archiving Twitter JSON

Documenting the Now 1.3k Dec 28, 2022