Code Generation using a large neural network called GPT-J

Last update: Dec 31, 2022

Overview

CodeGenX

CodeGenX is a Code Generation system powered by Artificial Intelligence! It is delivered to you in the form of a Visual Studio Code Extension and is Free and Open-source!

Installation

You can find installation instructions and additional information about CodeGenX in the documentation here.

About CodeGenX

1. Languages Supported

CodeGenX currently only supports Python. We are planning to add additional languages in future releases.

2. Modules Trained On

CodeGenX was trained on Python code which covers many of its common uses. Some libraries which CodeGenX is specifically trained on are:

Tensorflow
Pytorch
Scikit-Learn
Pandas
NumPy
OpenCV
Django
Flask
PyGame

3. How CodeGenX Works

At the core of CodeGenX lies a large neural network called GPT-J. GPT-J is a 6 billion parameter transformer model which was trained on hundreds of gigabytes of text from the internet. We fine-tuned this model on a dataset of open-source python code. This fine-tuned model can now be used to generate code when given an input with the right instructions.

Contributors ✨

This project would not have been possible without the help of these wonderful people:

_{Arya Manjaramkar}	_{Matthias Wijnsma}	_{Thomas Houtrique}	_{Dominic Rampas}	_{Bilel Medimegh}	_{Josh Hills}	_Alex
_Tiimo

Acknowledgements

Many thanks to the support of the Google TPU Research Cloud for providing the precious compute needed for this project.

Code Generation using a large neural network called GPT-J

Related tags

Overview

CodeGenX

Installation

About CodeGenX

1. Languages Supported

2. Modules Trained On

3. How CodeGenX Works

Contributors ✨

Acknowledgements

Owner

DeepGenX

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

The SVO-Probes Dataset for Verb Understanding

Twitter-Sentiment-Analysis - Analysis of twitter posts' positive and negative score.

Must-read papers on improving efficiency for pre-trained language models.

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Azure Text-to-speech service for Home Assistant

This project converts your human voice input to its text transcript and to an automated voice too.

Translation for Trilium Notes. Trilium Notes 中文版.

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

Ongoing research training transformer language models at scale, including: BERT & GPT-2

A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

Knowledge Management for Humans using Machine Learning & Tags

Simple Text-To-Speech Bot For Discord

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"