A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Last update: Dec 25, 2022

Overview

Basic-UI-for-GPT-J-6B-with-low-vram

A repository to run GPT-J-6B on low vram systems by using both ram, vram and pinned memory.

There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.

How to run :

Use - pip install git+https://github.com/finetuneanon/[email protected]
Use the link - https://drive.google.com/file/d/1tboTvohQifN6f1JiSV8hnciyNKvj9pvm/view?usp=sharing to dowload the model that has been saved as described here - https://github.com/arrmansa/saving-and-loading-large-models-pytorch

Timing (2000 token context)

1

system -

16 gb ddr4 ram . 1070 8gb gpu.
23 blocks on ram (ram_blocks = 23) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

timing -

single run of the model(inputs) takes 6.5 seconds.
35 seconds to generate 25 tokens at 2000 context. (1.4 seconds/token)

2

system -

16 gb ddr4 ram . 1060 6gb gpu.
26 blocks on ram (ram_blocks = 26) out of which 18 are on shared/pinned memory (max_shared_ram_blocks = 18).

timing -

40 seconds to generate 25 tokens at 2000 context. (1.6 seconds/token)

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Related tags

Overview

Basic-UI-for-GPT-J-6B-with-low-vram

There seem to be some issues with the weights in the drive link. There seems to be some performance loss, most likely because of poor 16 bit conversion.

How to run :

Timing (2000 token context)

1

system -

timing -

2

system -

timing -

Owner

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

Python api wrapper for JellyFish Lights

An ActivityWatch watcher to pose questions to the user and record her answers.

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Datasets of Automatic Keyphrase Extraction

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Neural-Machine-Translation - Implementation of revolutionary machine translation models

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Scikit-learn style model finetuning for NLP

PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

LegalNLP - Natural Language Processing Methods for the Brazilian Legal Language

NeMo: a toolkit for conversational AI

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

Resources for "Natural Language Processing" Coursera course.

Binaural Speech Synthesis

Easy, fast, effective, and automatic g-code compression!

An open source library for deep learning end-to-end dialog systems and chatbots.