Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Overview

stream-unzip CircleCI Test Coverage

Python function to stream unzip all the files in a ZIP archive, without loading the entire ZIP file into memory or any of its uncompressed files.

While the ZIP format does have its main directory at the end, each compressed file in the archive can be prefixed with a header that contains its name, compressed size, and uncompressed size: this is what makes streaming decompression of ZIP files possible.

Unfortunately not all ZIP files have this: some have their compressed and uncompressed sizes after the file data in the stream. In this case a ValueError will be raised.

Installation

pip install stream-unzip

Usage

A single function is exposed, stream_unzip, that takes a single argument: an iterable that should yield the bytes of a ZIP file. It returns an iterable, where each yielded item is a tuple of the file name, file size, and another iterable itself yielding the unzipped bytes of that file.

from stream_unzip import stream_unzip
import httpx

def zipped_chunks():
    # Any iterable that yields a zip file
    with httpx.stream('GET', 'https://www.example.com/my.zip') as r:
        yield from r.iter_bytes()

for file_name, file_size, unzipped_chunks in stream_unzip(zipped_chunks()):
    for chunk in unzipped_chunks:
        print(chunk)

The file name and file size are extracted as reported from the file. If you don't trust the creator of the ZIP file, these should be treated as untrusted input.

Owner
Department for International Trade
Department for International Trade
OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

Brian Maloney 100 Dec 13, 2022
Kartothek - a Python library to manage large amounts of tabular data in a blob store

Kartothek - a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store

15 Dec 25, 2022
pytiff is a lightweight library for reading chunks from a tiff file

pytiff is a lightweight library for reading chunks from a tiff file. While it supports other formats to some extend, it is focused on reading tiled greyscale/rgb images, that can also be bigtiffs. Wr

Big Data Analytics group 9 Mar 21, 2022
A simple tool to find and replace all the matches of a regular expression in file(s).

FindREp A simple tool to find and replace all the matches of a regular expression in file(s). You can either select the file(s) directly or select a f

Biraj 5 Oct 18, 2022
Python's Filesystem abstraction layer

PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's

pyFilesystem 1.8k Jan 02, 2023
A tiny Configuration File Parser for Python Projects

A tiny Configuration File Parser for Python Projects. Currently working on JSON Config Files only.

Tanmoy Sen Gupta 1 Feb 12, 2022
Swiss army knife for Apple's .tbd file manipulation

Description Inspired by tbdswizzler, this simple python tool for manipulating Apple's .tbd format. Installation python3 -m pip install --user -U pytbd

10 Aug 31, 2022
Uncompress DEFLATE streams in pure Python

stream-inflate Uncompress DEFLATE streams in pure Python. Installation pip install stream-inflate Usage from stream_inflate import stream_inflate impo

Michal Charemza 7 Oct 13, 2022
MetaMove is written in Python3 and aims at easing batch renaming operations based on file meta data.

MetaMove MetaMove is written in Python3 and aims at easing batch renaming operations based on file meta data. MetaMove abuses eval combined with f-str

Jan Philippi 2 Dec 28, 2021
CleverCSV is a Python package for handling messy CSV files.

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line applicati

The Alan Turing Institute 1k Dec 19, 2022
Import Python modules from any file system path

pathimp Import Python modules from any file system path. Installation pip3 install pathimp Usage import pathimp

Danijar Hafner 2 Nov 29, 2021
๐Ÿงน Create symlinks for .m2ts files and classify them into directories in yyyy-mm format.

๐Ÿงน Create symlinks for .m2ts files and classify them into directories in yyyy-mm format.

Nep 2 Feb 07, 2022
Python script for converting figma produced SVG files into C++ JUCE framework source code

AutoJucer Python script for converting figma produced SVG files into C++ JUCE framework source code Watch the tutorial here! Getting Started Make some

SuperConductor 1 Nov 26, 2021
Maltego transforms to pivot between PE files based on their VirusTotal codeblocks

VirusTotal Codeblocks Maltego Transforms Introduction These Maltego transforms allow you to pivot between different PE files based on codeblocks they

Ariel Jungheit 18 Feb 03, 2022
Generates a clean .txt file of contents of a 3 lined csv file

Generates a clean .txt file of contents of a 3 lined csv file. File contents is the .gml file of some function which stores the contents of the csv as a map.

Alex Eckardt 1 Jan 09, 2022
Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

3 Feb 09, 2022
File storage with API access. Used as a part of the Swipio project

API File storage File storage with API access. Used as a part of the Swipio project ๐Ÿ“ About The Project File storage allows you to upload and downloa

25 Sep 17, 2022
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 07, 2022
Python Fstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems.

PyFstab Generator PyFstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems. NOTE : Th

Mahdi 2 Nov 09, 2021
CSV-Handler written in Python3

CSVHandler This code allows you to work intelligently with CSV files. A file in CSV syntax is converted into several lists, which are combined in a to

Max Tischberger 1 Jan 13, 2022