Auto_code_complete is a auto word-completetion program which allows you to customize it on your needs

Overview

auto_code_complete v1.3

purpose and usage

auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the model for this program is a combined model of a deep-learning NLP(Natural Language Process) model structure called 'GRU(gated recurrent unit)' and 'LSTM(Long Short Term Memory)'.

the model for this program is one of the deep-learning NLP(Natural Language Process) model structure called 'GRU(gated recurrent unit)'.

how to use (terminal)

  • first, download the repository on your local environment.
  • install the neccessary libraries on your dependent environment.

pip install -r requirements.txt

  • change your working directory to auto-complete/ and execute the line below

python -m auto_complete_model

  • it will require for you to enter the data you want to train with the model
ENTER THE CODE YOU WANT TO TRAIN IN YOUR MODEL : tensorflow tf.keras tf.keras.layers LSTM
==== TRAINING START ====
2022-01-08 18:24:14.308919: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
Epoch 1/100
3/3 [==============================] - 1s 59ms/step - loss: 4.7865 - acc: 0.0532
Epoch 2/100
3/3 [==============================] - 0s 62ms/step - loss: 3.9297 - acc: 0.2872
Epoch 3/100
3/3 [==============================] - 0s 58ms/step - loss: 2.9941 - acc: 0.5532
...
Epoch 31/100
3/3 [==============================] - 0s 75ms/step - loss: 0.2747 - acc: 0.8617
Epoch 32/100
3/3 [==============================] - 0s 65ms/step - loss: 0.2700 - acc: 0.8298
==== TRAINING DONE ====
Now, Load the best weights on your model.
  • if you input your dataset successfully, it will ask for any uncompleted word to be entered.
ENTER THE UNCOMPLETED CODE YOU WANT TO COMPLETE : t tf te l la li k ke tf.kera tf.keras.l
t  - best recommendation : tensorflow
		 - all recommendations :  ['tensorflow']
tf  - best recommendation : tf.keras
		 - all recommendations :  ['tfkeras', 'tf.keras']
te  - best recommendation : tensorflow
		 - all recommendations :  ['tensorflow']
l  - best recommendation : list
		 - all recommendations :  ['list', 'layers']
la  - best recommendation : lange
		 - all recommendations :  ['layers', 'lange']
li  - best recommendation : list
		 - all recommendations :  ['list']
k  - best recommendation : keras
		 - all recommendations :  ['keras']
ke  - best recommendation : keras
		 - all recommendations :  ['keras']
tf.kera  - best recommendation : tf.keras
		 - all recommendations :  []
tf.keras.l  - best recommendation : tf.keras.layers
		 - all recommendations :  ['tf.keras.layers']
  • it will return the best matched word to complete and other recommendations
Do you want to check only the recommendations? (y/n) : y
['tensorflow'], 
['tfkeras', 'tf.keras'], 
['tensorflow'], 
['list', 'layers'], 
['layers', 'lange'], 
['list'], 
['keras'], 
['keras'], 
[], 
['tf.keras.layers']

version update & issues

v1.2 update

2022.01.08

  • change deep-learning model from GRU to GRU+LSTM to improve the performance

By adding the same structrue of new LSTM layers to concatenate before the output layer to an existing model, it shows faster learning and better accuracies in predicting matched recommendations for given incomplete words.

v1.3.1 update

2022.01.09

  • fix the glitches in data preprocessing

We solved the problem that it wouldn't add a new dataset on an existing dataset.

  • add plot_history function in a model class

v1.3.2 update

2022.01.09

  • add model_save,model_load mode in order that users can save and load their model while training a customized model
# Load text data
tf_filepath = "../data/text_data/tf_all_symbols.txt"
with open(tf_filepath, 'r') as f:
    tf_code_text = f.read()

# split the data into 10 parts
total_length = len(tf_code_text)
tf_code_ls = []
for i in range(10):
    globals()[f'tf_code_text_{i}'] = tf_code_text[int(total_length*0.1)*i:int(total_length*0.1)]
    tf_code_ls.append(globals()[f'tf_code_text_{i}'])

# train each dataset with a model setting up arguments 'model_save=True, model_name='mymodel', model_load=True' 
for tf_code in tf_code_ls:
    my_model = auto_coding(new_code=tf_code,
                          # verbose=0,
                           batch_size=100,
                           epochs=200,
                           patience=12,
                           model_summary=True,
                           model_save=True,
                           model_name='tf_model', # 'tf_model/tf_model.h5'
                           model_load=True
                          )
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

Software Analytics Lab 11 Oct 31, 2022
To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

Ragesh Hajela 0 Feb 08, 2022
Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP This repository maintains some utility scripts for retrieving and preprocessing Wikipedia text

Masatoshi Suzuki 44 Oct 19, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 902 Jan 06, 2023
NLP tool to extract emotional phrase from tweets 🤩

Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the

Shahul ES 38 Oct 17, 2022
Rethinking the Truly Unsupervised Image-to-Image Translation - Official PyTorch Implementation (ICCV 2021)

Rethinking the Truly Unsupervised Image-to-Image Translation (ICCV 2021) Each image is generated with the source image in the left and the average sty

Clova AI Research 436 Dec 27, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 03, 2023
🐍 A hyper-fast Python module for reading/writing JSON data using Rust's serde-json.

A hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software

Matthias 479 Jan 01, 2023
txtai: Build AI-powered semantic search applications in Go

txtai: Build AI-powered semantic search applications in Go txtai executes machine-learning workflows to transform data and build AI-powered semantic s

NeuML 49 Dec 06, 2022
a chinese segment base on crf

Genius Genius是一个开源的python中文分词组件,采用 CRF(Conditional Random Field)条件随机场算法。 Feature 支持python2.x、python3.x以及pypy2.x。 支持简单的pinyin分词 支持用户自定义break 支持用户自定义合并词

duanhongyi 237 Nov 04, 2022
Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

Jason S. Kessler 2k Dec 27, 2022
Share constant definitions between programming languages and make your constants constant again

Introduction Reconstant lets you share constant and enum definitions between programming languages. Constants are defined in a yaml file and converted

Natan Yellin 47 Sep 10, 2022
A versatile token stream for handwritten parsers.

Writing recursive-descent parsers by hand can be quite elegant but it's often a bit more verbose than expected, especially when it comes to handling indentation and reporting proper syntax errors. Th

Valentin Berlier 8 Nov 30, 2022
[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

New Benchmarks for Learning on Non-Homophilous Graphs Here are the codes and datasets accompanying the paper: New Benchmarks for Learning on Non-Homop

94 Dec 21, 2022
Train 🤗-transformers model with Poutyne.

poutyne-transformers Train 🤗 -transformers models with Poutyne. Installation pip install poutyne-transformers Example import torch from transformers

Lennart Keller 2 Dec 18, 2022
天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

天池中药说明书实体识别挑战冠军方案;中文命名实体识别;NER; BERT-CRF & BERT-SPAN & BERT-MRC;Pytorch

zxx飞翔的鱼 751 Dec 30, 2022
Sploitus - Command line search tool for sploitus.com. Think searchsploit, but with more POCs

Sploitus Command line search tool for sploitus.com. Think searchsploit, but with

watchdog2000 5 Mar 07, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 06, 2023
WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

WikiPron WikiPron is a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary, as well as a database of pronuncia

213 Jan 01, 2023