Extract an archive file (zip file or tar file) stored on AWS S3

Overview

S3 Extract

Extract an archive file (zip file or tar file) stored on AWS S3.

Details

Downloads archive from S3 into memory, then extract and re-upload to given destination.

The following S3 information is expected to be given as Environment Variables:

  • AWS_ENDPOINT_URL
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • CERT_PATH (optional)
  • CERT_DL_URL (optional)
  • SIGNATURE_VERSION (optional, defaults to "s3v4")
  • REGION_NAME (optional, defaults to "us-east-1")

Additionally, these information needed for ClearML remote execution can also be given as Env Variable (optional, args will override env var):

  • DEFAULT_DOCKER_IMG
  • DEFAULT_QUEUE

For those not familiar, environment variables can be set through various ways, some being:

  • export = in terminal
  • can be set in ~/.bashrc as well for more permanence

Iteratively extracting a folder of zips/tars is supported as well, through --src-is-dir flag.

Usage

usage: run.py [-h] [--src-is-dir] [--dst-bucket DST_BUCKET] [--verbose] [--remote] [--clml-proj CLML_PROJ] [--clml-task-name CLML_TASK_NAME] [--clml-task-type CLML_TASK_TYPE]
              [--docker-img DOCKER_IMG] [--queue QUEUE]
              src_bucket src_path dst_path

positional arguments:
  src_bucket            Source bucket
  src_path              Source path
  dst_path              Destination path

optional arguments:
  -h, --help            show this help message and exit
  --src-is-dir          Flag to indicate that given src path is a directory. Will iteratively extract any files in it ending with .zip or .tar.
  --dst-bucket DST_BUCKET
                        Destination bucket (optional), will default to Source bucket.
  --verbose             print out current upload filename as it progresses
  --remote              use clearml to remotely run job
  --clml-proj CLML_PROJ
                        ClearML Project Name
  --clml-task-name CLML_TASK_NAME
                        ClearML Task Name
  --clml-task-type CLML_TASK_TYPE
                        ClearML Task Type, e.g. training, testing, inference, etc
  --docker-img DOCKER_IMG
                        Base docker image to pull for ClearML remote execution
  --queue QUEUE         ClearML remote execution queue

Example usage:

python run.py my-bucket dataset/coco/images.tar dataset/coco/ --verbose --remote --clml-proj coco --clml-task-name coco_extraction --docker-img ubuntu/20.04 --queue 1xGPU
Owner
Evan
Evan
Generates a clean .txt file of contents of a 3 lined csv file

Generates a clean .txt file of contents of a 3 lined csv file. File contents is the .gml file of some function which stores the contents of the csv as a map.

Alex Eckardt 1 Jan 09, 2022
QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions.

QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions. It aims at facilitating code deobfuscation. The algorithm is greybox approach combining both a blackbox I/

Quarkslab 103 Dec 30, 2022
A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

Benedikt Schmitt 497 Jan 05, 2023
Python module that parse power builder file (PBD) and analyze code

PowerBuilder-decompile Python module that parse power builder file (PBD) and analyze code (Incomplete) this tool is composed of: pbd_dump.py pbd file

Samy Sultan 8 Dec 15, 2022
Python's Filesystem abstraction layer

PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's

pyFilesystem 1.8k Jan 02, 2023
FUSE filesystem Python scripts for Nintendo console files

ninfs (formerly fuse-3ds) is a FUSE program to extract data from Nintendo game consoles. It works by presenting a virtual filesystem with the contents of your games, NAND, or SD card contents, and yo

Ian Burgwin 343 Jan 02, 2023
Ini adalah program python untuk mengubah background foto dalam 1 folder, tidak perlu satu satu

Myherokuapp my web drive You can see my web drive and can request film/Application do you want in here my blog you can visit my blog RemBg ini adalah

XnuxersXploitXen 13 Dec 01, 2022
A file utility for accessing both local and remote files through a unified interface.

A file utility for accessing both local and remote files through a unified interface.

AI2 19 Nov 16, 2022
FileGenerator - File Generator for sites that accepts documents

File Generator for sites that accepts documents This code generates files as per

Shaunak 2 Mar 19, 2022
OnedataFS is a PyFilesystem interface to Onedata virtual file system

OnedataFS OnedataFS is a PyFilesystem interface to Onedata virtual file system. As a PyFilesystem concrete class, OnedataFS allows you to work with On

onedata 0 Jan 10, 2022
Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file w

2 Oct 15, 2022
Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags.

Tiff Tools Pure Python tools for reading and writing all TIFF IFDs, sub-IFDs, and tags. Developed by Kitware, Inc. with funding from The National Canc

Digital Slide Archive 32 Dec 14, 2022
A tool written in python to generate basic repo files from github

A tool written in python to generate basic repo files from github

Riley 7 Dec 02, 2021
A simple bulk file renamer, written in python.

Python File Editor A simple bulk file renamer, written in python. There are two functions, the bulk rename and the bulk file extention change. Bulk Fi

Sam Bloomfield 2 Dec 22, 2021
Various technical documentation, in electronically parseable format

a-pile-of-documentation Various technical documentation, in electronically parseable format. You will need Python 3 to run the scripts and programs in

Jonathan Campbell 2 Nov 20, 2022
Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs

⚡ Listreqs Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs. Where in Pipreqs, it helps you to Generate requirements.tx

Soumyadip Sarkar 4 Oct 15, 2021
Simple archive format designed for quickly reading some files without extracting the entire archive

Simple archive format designed for quickly reading some files without extracting the entire archive

Jarred Sumner 336 Dec 30, 2022
Find potentially sensitive files

find_files Find potentially sensitive files This script searchs for potentially sensitive files based off of file name or string contained in the file

4 Aug 20, 2022
gitfs is a FUSE file system that fully integrates with git - Version controlled file system

gitfs is a FUSE file system that fully integrates with git. You can mount a remote repository's branch locally, and any subsequent changes made to the files will be automatically committed to the rem

Presslabs 2.3k Jan 08, 2023
Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

3 Feb 09, 2022