A Python module and command line utility for working with web archive data using the WACZ format specification

Overview

py-wacz

The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification. Web Archive Collection Zipped (WACZ) allows web archives to be shared and distributed by providing a predictable way of packaging up web archive data and metadata as a ZIP file. The wacz command line utility supports converting any WARC files into WACZ files, and optionally generating full-text search indices of pages.

Install

Use pip to install the module and a command line utility:

pip install wacz

Once installed you can use the wacz command line utility to create and validate WACZ files.

Create

To create a WACZ package you can point wacz at a WARC file and tell it where to write the WACZ with the -o option:

wacz create -o myfile.wacz 
   

   

The resulting myfile.wacz should be loadable via ReplayWeb.page.

wacz accepts the following options for customizing how the WACZ file is assembled.

-f --file

Explicitly declare the file being passed to the create function.

wacz create -f tests/fixtures/example-collection.warc

-o --output

Explicitly declare the name of the wacz being created

wacz create tests/fixtures/example-collection.warc -o mywacz.wacz

-t --text

Generates pages.jsonl page index with a full-text index, must be run in conjunction with --detect-pages. Will have no effect if run alone

wacz create tests/fixtures/example-collection.warc -t

--detect-pages

Generates pages.jsonl page index without a full-text index

wacz create tests/fixtures/example-collection.warc --detect-pages

-p --pages

Overrides the pages index generation with the passed jsonl pages.

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl

-t --text

You can add a full text index by including the --text tag

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl --text

--ts

Overrides the ts metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --ts TIMESTAMP

--url

Overrides the url metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --url URL

--title

Overrides the titles metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --title TITLE

--desc

Overrides the desc metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --desc DESC

--hash-type

Allows the user to specify the hash type used: (sha256 or md5):

wacz create tests/fixtures/example-collection.warc --hash-type md5

Validate

You can also validate an existing WACZ file by running:

wacz validate myfile.wacz

-f --file

Explicitly declare the file being passed to the validate function.

wacz validate -f tests/fixtures/example-collection.warc

Testing

If you are developing wacz you can run the unit tests with pytest:

pytest tests
Owner
Webrecorder
Webrecorder Project
Webrecorder
A cli tool , which shows you all the next possible words you can guess from in the game of Wordle.

wordle-helper A cli tool , which shows you all the next possible words you can guess from the Game Wordle. This repo has the code discussed in the You

1 Jan 17, 2022
A simple CLI tool for tracking Pikud Ha'oref alarms.

Pikud Ha'oref Alarm Tracking A simple CLI tool for tracking Pikud Ha'oref alarms. Polls the unofficial API endpoint every second for incoming alarms.

Yuval Adam 24 Oct 10, 2022
Simple and convenient console ToDo list app

How do you handle remembering all that loads of plans you are going to realize everyday? Producing tons of paper notes, plastered all over the house?

3 Aug 03, 2022
A command-line based, minimal torrent streaming client made using Python and Webtorrent-cli. Stream your favorite shows straight from the command line.

A command-line based, minimal torrent streaming client made using Python and Webtorrent-cli. Installation pip install -r requirements.txt It use

Jonardon Hazarika 17 Dec 11, 2022
Get COVID-19 vaccination schedules from booking.moh.gov.ge in the CLI

vaccination.py Get COVID-19 vaccination schedules from booking.moh.gov.ge in the CLI. Installation $ pip install vaccination Usage Make sure the Pytho

Temuri Takalandze 11 Dec 08, 2021
A simple file transfer tools, similar to rz / sz but compatible with tmux (control mode), which works with iTerm2 and has a nice progress bar

trzsz A simple file transfer tools, similar to rz/sz but compatible with tmux (control mode), which works with iTerm2 and has a nice progress bar. Why

561 Jan 05, 2023
Runs a command in P4wnP1 and displays the output on OLED screen (SH1106)

p4wnp1-oled-terminal Runs a command in P4wnP1 and displays the output on OLED screen (SH1106) Works on Raspberry Pi Zero 2 W Tested successfully on RP

PawnSolo 1 Dec 14, 2021
Wordle breaker: A CLI tool to help you solve Wordle

Wordle Breaker A CLI tool to help you solve Wordle I decided to code a solution

Alex 4 Apr 27, 2022
doq (python docstring generator) extension for coc.nvim

coc-pydocstring doq (python docstring generator) extension for coc.nvim Install CocInstall: :CocInstall coc-pydocstring vim-plug: Plug 'yaegassy/coc-p

yaegassy 27 Jan 04, 2023
Python3 library for multimedia functions at the command terminal

TERMINEDIA This is a Python library allowing using a text-terminal as a low-resolution graphics output, along with keyboard realtime reading, and a co

Joao S. O. Bueno 89 Dec 17, 2022
gcp-doctor - Diagnostics for Google Cloud Platform

gcp-doctor is a command-line diagnostics tool for GCP customers. It finds and helps to fix common issues in Google Cloud Platform projects. It is used to test projects against a wide range of best-pr

Google Cloud Platform 185 Dec 20, 2022
Password manager for the CLI simps.

CLI Password Manager Password manager for the CLI simps. Free software: MIT license

1 Dec 30, 2021
command line interface to manage VALORANT skins

A PROPER RELEASE IS COMING SOON, IF YOU KNOW HOW TO USE PYTHON YOU CAN USE IT NOW! valorant skin manager command line interface simple command line in

colinh 131 Dec 25, 2022
Chopper: An Automated Security Headers Analyzer

____ _ _ / ___| |__ ___ _ __ _ __ ___ _ __| | | | | '_ \ / _ \| '_ \| '_ \ / _ \ '__| | | |___| | | | (_) |

Kamran Saifullah (Frog Man) 2 Nov 27, 2022
Color preview command-line tool written in python

Color preview command-line tool written in python

Arnau 1 Dec 27, 2021
🐍The nx-python plugin allows users to create a basic python application using nx commands.

🐍 NxPy: Nx Python plugin This project was generated using Nx. The nx-python plugin allows users to create a basic python application using nx command

StandUP Communications 74 Aug 31, 2022
A CLI Password Manager made using Python and Postgresql database.

ManageMyPasswords (CLI Edition) A CLI Password Manager made using Python and Postgresql database. Quick Start Guide First Clone The Project git clone

Imira Randeniya 1 Sep 11, 2022
Linux commands Interpreter for Windows and Mac based systems using Python

DBHTermEcIbP Linux commands Interpreter for Windows and Mac based systems using Python Basic Linux commands supported viewing current working director

Vraj Patel 1 Dec 26, 2021
Bonjour Software pypahe is a Python Package Helper command-line tool.

pypahe Bonjour Software pypahe is a Python Package Helper command-line tool. Requirements Docker runtime Usage print the latest available version of a

Bonjour Software 0 Aug 10, 2021
Wordle helper: help you print posible 5-character words based on you input

Wordle Helper This program help you print posible 5-character words based on you

Gwan Thanakrit Juthamongkhon 4 Jan 19, 2022