Python3 to Crystal Translation using Python AST Walker

Related tags

Text Data & NLPpy2cr
Overview

py2cr.py

A code translator using AST from Python to Crystal. This is basically a NodeVisitor with Crystal output. See AST documentation (https://docs.python.org/3/library/ast.html) for more information.

Status

Currently more than 80% of the relevant tests are passing. See more information below.

Installation

Execute the following:

pip install py2cr

or

git clone git://github.com/nanobowers/py2cr.git

Versions

  • Python 3.6 .. 3.9
  • Crystal 1.1+

Dependencies

Python

pip install pyyaml

# Probably not needed for much longer since py2 support is going to be removed.
pip install six 

# Probably not really needed since there is no crystal equivalent
pip install numpy

Crystal

currently there are no external dependencies

Methodology

In addition to walking and writing the AST tree and writing a Crystal syntax output, this tool either:

  • Monkey-patches some common Crystal stdlib Structs/Classes in order to emulate the Python equivalent functionality.
  • Calls equivalent Crystal methods to the Python equivalent
  • Calls wrapped Crystal methods that provide Python equivalent functionality

Usage

Generally, py2cr.py somefile.py > somefile.cr

There is a Crystal shim/wrapper library in src/py2cr (and linked into lib/py2cr) that is also referenced in the generated script. You may need to copy that as needed, though eventually it may be appropriate to convert it to a shard if that is more appropriate.

Example

TODO

Tests

$ ./run_tests.py

Will run all tests that are supposed to work. If any test fails, its a bug. (Currently there are a lot of failing tests!!)

$ ./run_tests.py -a

Will run all tests including those that are known to fail (currently). It should be understandable from the output.

$ ./run_tests.py basic

Will run all tests matching basic. Useful because running the entire test-suite can take a while.

$ ./run_tests.py -x or $ ./run_tests.py --no-error

Will run tests but ignore if an error is raised by the test. This is not affecting the error generated by the test files in the tests directory.

For additional information on flags, run:

./run_tests.py -h

Writing new tests

Adding tests for most new or existing functionality involves adding additional python files at tests/ .py .

The test-runner scripts will automatically run py2cr to produce a Crystal script, then run both the Python and Crystal scripts, then compare stdout/stderr and check return codes.

For special test-cases, it is possible to provide a configuration YAML file on a per test basis named tests/ / .config.yaml which overrides defaults for testing. The following keys/values are supported:

min_python_version: [int, int] # minimum major/minor version
max_python_version: [int, int] # maximum major/minor version
expected_exit_status: int      # exit status for py/cr test script
argument_list: [str, ... str]  # list of strings as extra args for argv

Typing

Some amount of typing support in Python is translated to Crystal. Completely untyped Python code in many cases will not be translatable to compilable Crystal. Rudimentary for python Optional and Union should convert appropriately to Crystal typing.

Some inference of bare list/dict types can now convert to [] of X and {} of X, however set and tuple may not work properly.

Status

This is incomplete and many of the tests brought forward from py2rb do not pass. Some of them may never pass as-is due to significant language / compilation differences (even moreso than Python vs. Ruby)

To some extent, it will always be incomplete. The goal is to cover common cases and reduce the additional work to minimum-viable-program.

Limitations

  • Many Python run-time exceptions are not translatable into Crystal as these issues manifest in Crystal as compile-time errors.
  • A significant portion of python code is untyped and may not translate properly in places where Crystal demands type information.
    • e.g. Crystal Lambda function parameters require typing and this is very uncommon in Python, though may be possible with Callable[] on the python side.
  • Python importing is significantly different than Crystal and thus may not ever map well.
  • Numpy and Unittest which are common in Python don't have equivalents in Crystal. With some significant additional work, converting tests into Spec format may be possible via https://github.com/jaredbeck/minitest_to_rspec as a guide

To-do

  • Remove python2/six dependencies to reduce clutter. Py2 has been end-of-lifed for a while now.
  • Remove numpy dependencies unless/until a suitable target for Crystal can be identified
  • Add additional Crystal shim methods to translate common python3 stdlib methods. Consider a mode that just maps to a close Crystal method rather than using a shim-method to reduce the python-ness.
  • Refactor the code-base. Most of it is in the __init__.py
  • Add additional unit-tests
  • Multi-thread the test-suite so it can run faster.

Contribute

Free to submit an issue. This is very much a work in progress, contributions or constructive feedback is welcome.

If you'd like to hack on py2cr, start by forking the repo on GitHub:

https://github.com/nanobowers/py2cr

Contributing

The best way to get your changes merged back into core is as follows:

  1. Fork it (https://github.com/nanobowers/py2cr/fork)
  2. Create a thoughtfully named topic branch to contain your change (git checkout -b my-new-feature)
  3. Hack away
  4. Add tests and make sure everything still passes by running crystal spec
  5. If you are adding new functionality, document it in the README
  6. If necessary, rebase your commits into logical chunks, without errors
  7. Commit your changes (git commit -am 'Add some feature')
  8. Push to the branch (git push origin my-new-feature)
  9. Create a new Pull Request

License

MIT, see the LICENSE file for exact details.

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation This package provides easy to use, state-of-the-art machine translation for more th

Ubiquitous Knowledge Processing Lab 748 Jan 06, 2023
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

Microsoft 7.8k Jan 09, 2023
Line as a Visual Sentence: Context-aware Line Descriptor for Visual Localization

Line as a Visual Sentence with LineTR This repository contains the inference code, pretrained model, and demo scripts of the following paper. It suppo

SungHo Yoon 158 Dec 27, 2022
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Dec 26, 2022
InferSent sentence embeddings

InferSent InferSent is a sentence embeddings method that provides semantic representations for English sentences. It is trained on natural language in

Facebook Research 2.2k Dec 27, 2022
Google's Meena transformer chatbot implementation

Here's my attempt at recreating Meena, a state of the art chatbot developed by Google Research and described in the paper Towards a Human-like Open-Domain Chatbot.

Francesco Pham 94 Dec 25, 2022
End-to-end MLOps pipeline of a BERT model for emotion classification.

image source EmoBERT-MLOps The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this

Dimitre Oliveira 4 Nov 06, 2022
A python package to fine-tune transformer-based models for named entity recognition (NER).

nerblackbox A python package to fine-tune transformer-based language models for named entity recognition (NER). Resources Source Code: https://github.

Felix Stollenwerk 13 Jul 30, 2022
This is a project of data parallel that running on NLP tasks.

This is a project of data parallel that running on NLP tasks.

2 Dec 12, 2021
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

English | įŽ€äŊ“中文 | įšéĢ”中文 | 한ęĩ­ė–´ State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

Hugging Face 77.1k Dec 31, 2022
Open Source Neural Machine Translation in PyTorch

OpenNMT-py: Open-Source Neural Machine Translation OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine trans

OpenNMT 5.8k Jan 04, 2023
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
Snowball compiler and stemming algorithms

Snowball is a small string processing language for creating stemming algorithms for use in Information Retrieval, plus a collection of stemming algori

Snowball Stemming language and algorithms 613 Jan 07, 2023
Convolutional Neural Networks for Sentence Classification

Convolutional Neural Networks for Sentence Classification Code for the paper Convolutional Neural Networks for Sentence Classification (EMNLP 2014). R

Yoon Kim 2k Jan 02, 2023
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

KB-NER: a Knowledge-based System for Multilingual Complex Named Entity Recognition The code is for the winner system (DAMO-NLP) of SemEval 2022 MultiC

116 Dec 27, 2022
A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

A Python package implementing a new model for text classification with visualization tools for Explainable AI đŸŖ Online live demos: http://tworld.io/s

Sergio Burdisso 285 Jan 02, 2023
Training code for Korean multi-class sentiment analysis

KoSentimentAnalysis Bert implementation for the Korean multi-class sentiment analysis ė™œ 한ęĩ­ė–´ 감ė • 다ė¤‘ëļ„ëĨ˜ ëĒ¨ë¸ė€ ęą°ė˜ ė—†ëŠ” 것ėŧ까?ė—ė„œ ė‹œėž‘된 프로ė íŠ¸ Environment: Pytorch, Da

Donghoon Shin 3 Dec 02, 2022
Awesome Treasure of Transformers Models Collection

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. đŸ›Ģ☑ī¸

Ashish Patel 577 Jan 07, 2023