Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Overview

Galois Autocompleter

Galois Autocompleter

Version Twitter: iedmrc

An autocompleter for code editors based on OpenAI GPT-2.

🏠 Homepage

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the Github. Currently, it just works properly on Python but not bad at other languages (thanks to GPT-2's power).

This repository now contains the very first release of the Galois Project. With this project, I aim to create a Deep Learning Based Autocompleter such that anyone can run it on their own computer easily. Thus, coding will be more easier and fun!

Galois demo GIF

Installation

With Docker

Either clone the repository and build the image from docker file or directly run the following command:

docker run --rm -dit -p 3030:3030 iedmrc/galois-autocompleter:latest-gpu

P.S: CPU image is not available on the Docker Hub at the moment so if you want to run it on CPU rather than GPU, clone the repository and build the image as follows:

docker build --build-arg TENSORFLOW_VERSION=1.14.0-py3 -t iedmrc/galois-autocompleter:latest .

Without Docker

Clone the repository:

git clone https://github.com/iedmrc/galois-autocompleter

Download the latest model from releases and uncompress it into the directory:

curl -SL https://github.com/iedmrc/galois-autocompleter/releases/latest/download/model.tar.xz | tar -xJC ./galois-autocompleter

Install dependencies:

pip3 install -r requirements.txt

P.S.: Be sure that you have tensorflow version >= 1.13

Run the autocompleter:

python3 main.py

Usage

Currently, there are no extensions for code editors. You can use it through HTTP. When you run the main.py, it will serve an HTTP (flask) server. Then you can easily make a POST request to the http://localhost:3030/ with the some JSON body like the following:

{text: "your python code goes here"}

An example curl command:

curl -X POST \
  http://localhost:3030/autocomplete \
  -H 'Content-Type: application/json' \
  -d '{"text":"import os\nimport sys\n# Count lines of codes in the given directory, separated by file extension.\ndef main(directory):\n  line_count = {}\n  for filename in os.listdir(directory):\n    _, ext = os.path.splitext(filename)\n    if ext not"}'

Check out the gist here for a docker-compose file.

Finetuning The Model

Even you can finetune (re-train over) the model with/for your code files. Just follow the Max Woolf's gpt-2-simple or Neil Shepperd's gpt-2 repositories with 345M version. But don't forget to replace checkpoint (model) with the one in this repository.

You can train it on the Google Colaboratory for free. But if you need a production-grade (i.e. more accurate) one then you may need to train it for more longer time. In my case, it took ~48 hours on a P100 GPU.

Planned Works

  • Train the model to predict in most common programming languages.
  • Create extensions for most common code editors to use galois as an autocompleter.
  • Create a new, more lightweight but powerful model such that anyone can run it in their computer easily.

Contribution

Contributions are welcome. Feel free to create an issue or a pull request.

Author

👤 Ibrahim Ethem DEMIRCI

Twitter: @iedmrc | Github: @iedmrc | Patreon: @iedmrc

Ibrahim's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

It is licensed under MIT License as found in the LICENSE file.

Disclaimer

This repo has no affiliation or relationship with OpenAI.

You might also like...
Text editor on python tkinter to convert english text to other languages with the help of ployglot.
Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Transliterator Text Editor This is a simple transliteration program which is used to convert english word to phonetically matching word in another lan

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

Shirt Bot is a discord bot which uses GPT-3 to generate text
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

OpenAI CLIP text encoders for multiple languages!
OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

Comments
  • train code complete from zero

    train code complete from zero

    Hi, I try to train code complete from zero, below is two settings, and the performance is bad: 1. train from utf-8 encode, the default BPE encoding, 1 batch_size, 3 GPU card, 500K iteration 2. train from ascii encode, 1 batch_size, 3 GPU card, 500K iteration and method 2 is just map ascii from 1 to N, the size is much less, and the performance of the two settings ard bad. do you train by finetune the release model of gpt2? can you share your training settings?

    opened by yuandaxing 2
  • About training ..

    About training ..

    When you specify the training directory to the model, did you extract all .py files and delete the other types or the model will parse all the projects directories that you cloned from Github one by one ?

    opened by dimwael 1
  • Text size limit for the Galois Autocompleter API

    Text size limit for the Galois Autocompleter API

    Hi!

    While testing Galois, I discovered something that seems to be a limit at the size of the text that you can send to the Autocompleter API without getting a 500 error. I could not find the exactly number, but around 2180 characters of code it starts crashing.

    He are the error logs I got:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024) [[{{node sample_sequence/while/model/GatherV2_1}}]]

    bug 
    opened by GabrielTamujo 1
  • Finetuning the Galois' model

    Finetuning the Galois' model

    Hi, @iedmrc I'm finetuning the Galois' model with the gpt-2-simple command aiming at featuring it with our team programming standards. (Well, actually, we hope so!) I'm running the finetune with "steps=-1" (what's, endless run). I'd like to hear from you when should I stop the process. This is the last 4 lines of the current history of the process:

    [310 | 23899.61] loss=0.09 avg=0.36 [320 | 24645.14] loss=0.06 avg=0.35 [330 | 25398.12] loss=0.09 avg=0.34 [340 | 26155.55] loss=0.05 avg=0.33

    Best regard!

    question 
    opened by DenisAraujo68 1
Releases(v0.1.0)
Owner
Galois Autocompleter
Auto code completer for code editors (or any text editor) based on OpenAI GPT-2.
Galois Autocompleter
Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Pretrain and Fine-tune a T5 model with Flax on GCP This tutorial details how pretrain and fine-tune a FlaxT5 model from HuggingFace using a TPU VM ava

Gabriele Sarti 41 Nov 18, 2022
This is a simple item2vec implementation using gensim for recbole

recbole-item2vec-model This is a simple item2vec implementation using gensim for recbole( https://recbole.io ) Usage When you want to run experiment f

Yusuke Fukasawa 2 Oct 06, 2022
Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

BERT-for-Surprisal Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings

7 Dec 05, 2022
Python library for processing Chinese text

SnowNLP: Simplified Chinese Text Processing SnowNLP是一个python写的类库,可以方便的处理中文文本内容,是受到了TextBlob的启发而写的,由于现在大部分的自然语言处理库基本都是针对英文的,于是写了一个方便处理中文的类库,并且和TextBlob

Rui Wang 6k Jan 02, 2023
[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

Cambridge Language Technology Lab 61 Dec 10, 2022
Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

keytotext Idea is to build a model which will take keywords as inputs and generate sentences as outputs. Potential use case can include: Marketing Sea

Gagan Bhatia 364 Jan 03, 2023
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Dec 28, 2022
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

Antlr Project 13.6k Jan 05, 2023
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

Amazon Web Services - Labs 124 Jan 03, 2023
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

GP211-Grand-Projet Ce repertoire contient tout les programmes nécessaires au bon fonctionnement de notre projet-logiciel. Cette interface graphique es

1 Dec 21, 2021
State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Trapper (Transformers wRAPPER) Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps h

Open Business Software Solutions 42 Sep 21, 2022
使用pytorch+transformers复现了SimCSE论文中的有监督训练和无监督训练方法

SimCSE复现 项目描述 SimCSE是一种简单但是很巧妙的NLP对比学习方法,创新性地引入Dropout的方式,对样本添加噪声,从而达到对正样本增强的目的。 该框架的训练目的为:对于batch中的每个样本,拉近其与正样本之间的距离,拉远其与负样本之间的距离,使得模型能够在大规模无监督语料(也可以

58 Dec 20, 2022
Subtitle Workshop (subshop): tools to download and synchronize subtitles

SUBSHOP Tools to download, remove ads, and synchronize subtitles. SUBSHOP Purpose Limitations Required Web Credentials Installation, Configuration, an

Joe D 4 Feb 13, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 06, 2022
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Mutian He 19 Oct 14, 2022
KoBART model on huggingface transformers

KoBART-Transformers SKT에서 공개한 KoBART를 편리하게 사용할 수 있게 transformers로 포팅하였습니다. Install (Optional) BartModel과 PreTrainedTokenizerFast를 이용하면 설치하실 필요 없습니다. p

Hyunwoong Ko 58 Dec 07, 2022
Python3 to Crystal Translation using Python AST Walker

py2cr.py A code translator using AST from Python to Crystal. This is basically a NodeVisitor with Crystal output. See AST documentation (https://docs.

66 Jul 25, 2022
Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-of-speech (POS) tags, chunking (CHK), name entity recognition (

jawahar 20 Apr 30, 2022
Nmt - TensorFlow Neural Machine Translation Tutorial

Neural Machine Translation (seq2seq) Tutorial Authors: Thang Luong, Eugene Brevdo, Rui Zhao (Google Research Blogpost, Github) This version of the tut

6.1k Dec 29, 2022