A proof-of-concept jupyter extension which converts english queries into relevant python code

Overview

Text2Code for Jupyter notebook

A proof-of-concept jupyter extension which converts english queries into relevant python code.

Blog post with more details:

Data analysis made easy: Text2Code for Jupyter notebook

Demo Video:

Text2Code for Jupyter notebook

Supported Operating Systems:

  • Ubuntu
  • macOS

Installation

NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version.

pip uninstall mopp

CPU-only install:

For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.

export JUPYTER_TEXT2CODE_MODE="cpu"

GPU install dependencies:

sudo apt-get install libopenblas-dev libomp-dev

Installation commands:

git clone https://github.com/deepklarity/jupyter-text2code.git
cd jupyter-text2code
pip install .
jupyter nbextension enable jupyter-text2code/main

Uninstallation:

pip uninstall jupyter-text2code

Usage Instructions:

  • Start Jupyter notebook server by running the following command: jupyter notebook
  • If you don't see Nbextensions tab in Jupyter notebook run the following command:jupyter contrib nbextension install --user
  • You can open the sample notebooks/ctds.ipynb notebook for testing
  • If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from tensorflow_hub
  • Click on the Terminal Icon which appears on the menu (to activate the extension)
  • Type "help" to see a list of currently supported commands in the repo
  • Watch Demo video for some examples

Docker containers for jupyter-text2code

We have published CPU and GPU images to docker hub with all dependencies pre-installed.

Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.
CPU image size: 1.51 GB
GPU image size: 2.56 GB

Model training:

Generate training data:

From a list of templates present at jupyter_text2code/jupyter_text2code_serverextension/data/ner_templates.csv, generate training data by running the following command:

cd scripts && python generate_training_data.py

This command will generate data for intent matching and NER(Named Entity Recognition).

Create intent index faiss

Use the generated data to create a intent-matcher using faiss.

cd scripts && python create_intent_index.py

Train NER model

cd scripts && python train_spacy_ner.py

Steps to add more intents:

  • Add more templates in ner_templates with a new intent_id
  • Generate training data. Modify generate_training_data.py if different generation techniques are needed or if introducing a new entity.
  • Train intent index
  • Train NER model
  • modify jupyter_text2code/jupyter_text2code_serverextension/__init__.py with new intent's condition and add actual code for the intent
  • Reinstall plugin by running: pip install .

TODO:

  • Publish Docker image
  • Refactor code and make it mode modular, remove duplicate code, etc
  • Add support for Windows
  • Add support for more commands
  • Improve intent detection and NER
  • Explore sentence Paraphrasing to generate higher-quality training data
  • Gather real-world variable names, library names as opposed to randomly generating them
  • Try NER with a transformer-based model
  • With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline
  • Create a survey to collect linguistic data
  • Add Speech2Code support

Authored By:

Owner
DeepKlarity
DeepKlarity
List of Land Cover datasets in the GEE Catalog

List of Land Cover datasets in the GEE Catalog A list of all the Land Cover (or discrete) datasets in Google Earth Engine. Values, Colors and Descript

David Montero Loaiza 5 Aug 24, 2022
QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios

QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios, that I wrote in Python 3 for my own research and visualisation. It is currently unfinis

Samuel Low 9 Aug 23, 2022
WhiteboxTools Python Frontend

whitebox-python Important Note This repository is related to the WhiteboxTools Python Frontend only. You can report issues to this repo if you have pr

Qiusheng Wu 304 Dec 15, 2022
A ninja python package that unifies the Google Earth Engine ecosystem.

A Python package that unifies the Google Earth Engine ecosystem. EarthEngine.jl | rgee | rgee+ | eemont GitHub: https://github.com/r-earthengine/ee_ex

47 Dec 27, 2022
Tool to display your current position and angle above your radar

๐Ÿ›  Tool to display your current position and angle above your radar. As a response to the CS:GO Update on 1.2.2022, which makes cl_showpos a cheat-pro

Miko 6 Jan 04, 2023
Focal Statistics

Focal-Statistics The Focal statistics tool in many GIS applications like ArcGIS, QGIS and GRASS GIS is a standard method to gain a local overview of r

Ifeanyi Nwasolu 1 Oct 21, 2021
Processing and interpolating spatial data with a twist of machine learning

Documentation | Documentation (dev version) | Contact | Part of the Fatiando a Terra project About Verde is a Python library for processing spatial da

Fatiando a Terra 468 Dec 20, 2022
:earth_asia: Python Geocoder

Python Geocoder Simple and consistent geocoding library written in Python. Table of content Overview A glimpse at the API Forward Multiple results Rev

Denis 1.5k Jan 02, 2023
LicenseLocation - License Location With Python

LicenseLocation Hi,everyone! โค ๐Ÿงก ๐Ÿ’› ๐Ÿ’š ๐Ÿ’™ ๐Ÿ’œ This is my first project! โœ” Actual

The Bin 1 Jan 25, 2022
Digital Earth Australia notebooks and tools repository

Repository for Digital Earth Australia Jupyter Notebooks: tools and workflows for geospatial analysis with Open Data Cube and xarray

Geoscience Australia 335 Dec 24, 2022
Python project to generate Kerala's distrcit level panchayath map.

Kerala-Panchayath-Maps Python project to generate Kerala's distrcit level panchayath map. As of now, geojson files of Kollam and Kozhikode are added t

Athul R T 2 Jan 10, 2022
Logging the position of the car on an sdcard

audi-mmi-3g-gps-logging Logging the position of the car on an sdcard, startup script origin not clear to me, logging setup and time change is what I d

2 May 31, 2022
Starlite-tile38 - Showcase using Tile38 via pyle38 in a Starlite application

Starlite-Tile38 Showcase using Tile38 via pyle38 in a Starlite application. Repo

Ben 8 Aug 07, 2022
Location field and widget for Django. It supports Google Maps, OpenStreetMap and Mapbox

django-location-field Let users pick locations using a map widget and store its latitude and longitude. Stable version: django-location-field==2.1.0 D

Caio Ariede 481 Dec 29, 2022
A NASA MEaSUREs project to provide automated, low latency, global glacier flow and elevation change datasets

Notebooks A NASA MEaSUREs project to provide automated, low latency, global glacier flow and elevation change datasets This repository provides tools

NASA Jet Propulsion Laboratory 27 Oct 25, 2022
Tool to suck data from ArcGIS Server and spit it into PostgreSQL

chupaESRI About ChupaESRI is a Python module/command line tool to extract features from ArcGIS Server map services. Name? Think "chupacabra" or "Chupa

John Reiser 34 Dec 04, 2022
Python tools for geographic data

GeoPandas Python tools for geographic data Introduction GeoPandas is a project to add support for geographic data to pandas objects. It currently impl

GeoPandas 3.5k Jan 03, 2023
framework for large-scale SAR satellite data processing

pyroSAR A Python Framework for Large-Scale SAR Satellite Data Processing The pyroSAR package aims at providing a complete solution for the scalable or

John Truckenbrodt 389 Dec 21, 2022
Minimum Bounding Box of Geospatial data

BBOX Problem definition: The spatial data users often are required to obtain the coordinates of the minimum bounding box of vector and raster data in

Ali Khosravi Kazazi 1 Sep 08, 2022
A public data repository for datasets created from TransLink GTFS data.

TransLink Spatial Data What: TransLink is the statutory public transit authority for the Metro Vancouver region. This GitHub repository is a collectio

Henry Tang 3 Jan 14, 2022