Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

Overview

open(LARGE)

Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

Motivation

Oftentimes, especially when working with data-heavy applications, large files can proliferate in a repository. Version controlling them is an obvious next step, however, GitHub's git LFS implementation doesn't support the deletion of large files, making it easy for them to eat-up the LFS quota and explode the size of your repos.

Solution

pip install open-large

Simple example

from open_large import LargeFile

LargeFile.configure_credentials({
    "aws_region_name": "your_region_like_eu-west-2",
    "aws_access_key_id": "YOUR_ACCESS_KEY_ID",
    "aws_secret_access_key": "YOUR_VERY_SECRET_ACCESS_KEY",
    "large_files_bucket_name": "create_a_bucket_and_put_its_name_here",
})

# Creates a new version and deletes the older version leaving the 3 most recently used intact
with LargeFile("test.txt", "w", keep_last_n=3) as f:
    for i in range(100000):
        f.write('test\n')

# By default the latest version is returned
# but an optional `version` keyword argument can be provided as well
with LargeFile("test.txt", "r") as f:
    print(f.readlines()[0])

Automatically creates a file, writes to it, uploads it to S3, and then queries the most recent version of it. In this case, the latest version is already in the local cache, no download is required.

More details

LargeFile behaves like an opened file (in the background it is a temp file after all). Binary reading and writing is supported along with the different keywords open() accepts.

The local cache can be configured with these properties:

LargeFile.cache_path = Path('.cache')
LargeFile.max_cache_size = "30 GB"

I only need a path

In case you only need a path to the "remote" file, this pattern can be applied:

path_to_model = LargeFile("folder-of-my-bert-model", version=31).get()

This will first download the file/folder into your local cache folder. Then, it returns a Path object to the local version. Which can be turned into a string with str(path_to_model).

The same approach works for uploads:

LargeFile("folder-of-my-bert-model").push('path_to_local/folder_or_file')

This way, both regular files and folders can be handled. The uploaded file is called folder-of-my-bert-model, the local name is ignored.

Lastly, all version of the remote object can be deleted by calling LargeFile("my-file").delete(). It will still reside in your local cache afterwards, its deletion will happen next time your local cache has to be pruned.

Command-line example

The package can be used as a module from the command-line to give you more flexibility.

Setup

Create an .ini file (or use ~/.aws/credentials). It may look like this:

[DEFAULT]
aws_region_name = your_region_like_eu-west-2
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_VERY_SECRET_ACCESS_KEY
large_files_bucket_name = my_large_files

Just like in example secrets.

Print the expected options

python3 -m open_large --help

Upload some files

python3 -m open_large --secrets secrets.ini --push my_first_file.json folder/my_second_file my_folder

Only the filename is used as the S3 name, the rest of the path is ignored.

Download some files to the local cache

This can be useful when building a Docker image for example. This way, the files can already reside inside the container and need not be downloaded later.

python3 -m open_large --secrets ~/.aws/credentials --cache my_first_file.json:3 my_second_file my_folder:0

Versions may be specified by using :-s.

Delete remote files

python3 -m open_large --secrets ~/.aws/credentials --delete my_first_file.json
Noto fonts go universal! Download Noto fonts combined to suit your region

noto-cjk Noto CJK fonts Noto Serif CJK update was released on 25 October 2021. We moved the release history and other notes into both Sans and Serif s

Google Fonts 2k Jan 02, 2023
Application Updater using an download link

Application-Updater This tool will update your app using an storage link

ExtremeDev 1 Dec 20, 2021
Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Subreddit Media Downloader Download images and videos from any public subreddit without using Reddit's API Made with ❤ by Nico 💬 About: This script a

Nico 106 Jan 07, 2023
Script that allows to download portable installers of different versions Adobe software for macOS

What is this and for what This is a script that allows you to download portable installers of programs from Adobe for macOS with different versions. T

715 Jan 06, 2023
QGIS plugin to dwonload DEMs from OpenTopography.org

OpenTopography-DEM-Downloader-qgis-plugin QGIS plugin to dwonload DEMs from OpenTopography.org This plug-in allows you to download DEMs from OpenTopgr

Kyaw Naing Win 7 Sep 20, 2022
Download all posts and comments in a subreddit

subreddit downloader This subreddit downloader downloads all posts and comments in a subreddit For a tutorial to use this program please follow this m

Guneet 6 Dec 16, 2022
命令行版本的HLS/DASH流下载器,支持标准AES-128-CBC解密

XstreamDL-CLI 基于python 3.7.4+的,命令行版本的,HLS/DASH流下载器,支持标准AES-128-CBC解密 使用 首先安装必要的库

xhlove 239 Dec 31, 2022
TikTok downloader video without watermark from Telegram bot

⬇️ How to download video from Tik Tok via telegram bot? Send a link to the video from tik tok to our telegram bot and it will send you a video without

1 Mar 04, 2022
Newsemble is an API that provides easy access to the current news for programmatic analysis

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.

Rishabh 43 Dec 16, 2022
Download videos and audio with a graphical interface in python

Youtube-Downloader Download videos and audio with a graphical interface in python Windows To run windows using Command Prompt python main.py linux To

2 Jan 07, 2022
Download YOUR files, documents from vk.

vk-documents-downloader Кароч эта симпл херня качает все ВАШИ документы с вк. Или я еблан, но в гх и тмб гугле я подобного не нашел. py main.py Login:

4 Jun 10, 2022
Libretrofuzz - Fuzzy Retroarch thumbnail downloader

Fuzzy Retroarch thumbnail downloader In Retroarch, when you use the manual scann

8 Nov 26, 2022
lo2: Simple youtube-dl web frontend

Simple youtube-dl web frontend

Denis Volk 22 Jun 03, 2022
squid-dl is a massively parallel yt-dlp-based YouTube downloader.

squid-dl squid-dl is a massively parallel yt-dlp-based YouTube downloader. Installation Run the setup.py, which will install squid-dl and its two depe

tuxlovesyou 51 Jan 05, 2023
Utility for downloading works from AO3 (Archive Of Our Own)

ao3d video preview A small graphical utility for batch downloading works from AO3 (Archive Of Our Own) Features Batch downloading works to supported f

flux 24 Dec 09, 2022
The lyrics module of the repository apple-playlist-downloader

This is the lyrics module of the repository apple-playlist-downloader. With this code you can download the .lrc file (time synced lyrics) from yours t

Antoine Bollengier 6 Oct 07, 2022
This package helps you to directly download an APK from Google Play by providing the package id of the app

Apk Downloader About | Features | Technologies | Requirements | Starting | License | Author 🎯 About This package helps you to directly download an AP

Daniel Agyapong 9 Dec 11, 2022
YouTube to MP3 or 4, you get to choose...

UTubeToMP YouTube to MP3 or 4, you get to choose... If you don't wanna git clone andor dont wanna install python. Here: Repl.it Instructions: Pretty s

1 Jan 29, 2022
📼Command line tool based on youtube-dl to easily download selected channels from your subscriptions.

youtube-cdl Command line tool based on youtube-dl to easily download selected channels from your subscriptions. This tool is very handy if you want to

Anatoly 64 Dec 25, 2022
Download minecraft head or skin, allows TLauncher accounts

Download minecraft head or skin, allows TLauncher accounts

1 Dec 30, 2021