Lingtrain Alignment Studio is an ML based app for texts alignment on different languages.

Last update: Jan 03, 2023

Related tags

Machine Learning a-studio

Overview

Lingtrain Alignment Studio

Intro

Lingtrain Alignment Studio is the ML based app for accurate texts alignment on different languages.

Extracts parallel corpora from two texts.
Makes the formatted parallel book from it with sentence highlightning.

Models

Automated alignment process relies on the sentence embeddings models. Embeddings are multidimensional vectors of a special kind which are used to calculate a distance between the sentences. You can also plug your own model using the interface described in models directory. Supported languages list depend on the selected backend model.

distiluse-base-multilingual-cased-v2
- more reliable and fast
- moderate weights size — 500MB
- supports 50+ languages
- full list of supported languages can be found in this paper
LaBSE (Language-agnostic BERT Sentence Embedding)
- can be used for rare languages
- pretty heavy weights — 1.8GB
- supports 100+ languages
- full list of supported languages can be found here

Running on local machine

You can run the application on your computer using docker.

Make sure that docker is installed by typing the docker version command in your console.
Images configured to run locally are available on Docker Hub.
Run the following commads in your console:
- docker pull lingtrain/aligner:v6
- docker run -v C:\app\data:/app/data -v C:\app\img:/app/static/img -p 80:80 lingtrain/aligner:v6
- Use lingtrain/aligner:v6-labse for LaBSE version (109 languages).
App will be available in your browser on the localhost address.
If you need to run the container on another port (e.g. localhost:8081):
- Change the API_URL parameter in config.js
- Rebuild the docker container
- Start it with changed -p parameter (e.g. -p 8081:80)

Running in development mode

Clone this repo on your machine.

Backend

Flask/uwsgi backend REST API service. It's pretty simple and contains all the alignment logic.

cd /be python main.py

Frontend

SPA. Vue + vuex + vuetify. UI for managing alignment process using BE and a tool for translators to edit processing documents.

cd /fe

Setup

npm install

Compile and run with hot-reloads for development

npm run serve

Feedback

You can crate an issue or send me a message in telegram: @averkij

License

This work is licensed under a Attribution-NonCommercial-NoDerivatives 4.0 International license. See LICENSE.

Lingtrain Alignment Studio is an ML based app for texts alignment on different languages.

Related tags

Overview

Lingtrain Alignment Studio

Intro

Models

Running on local machine

Running in development mode

Backend

Frontend

Setup

Compile and run with hot-reloads for development

Feedback

License

Owner

Sergei Averkiev

This project has Classification and Clustering done Via kNN and K-Means respectfully

使用数学和计算机知识投机倒把

MiniTorch - a diy teaching library for machine learning engineers

Apache (Py)Spark type annotations (stub files).

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

Implemented four supervised learning Machine Learning algorithms

This is a Cricket Score Predictor that predicts the first innings score of a T20 Cricket match using Machine Learning

whylogs: A Data and Machine Learning Logging Standard

JMP is a Mixed Precision library for JAX.

An open-source library of algorithms to analyse time series in GPU and CPU.

Kaggler is a Python package for lightweight online machine learning algorithms and utility functions for ETL and data analysis.

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

The code from the Machine Learning Bookcamp book and a free course based on the book

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Distributed Evolutionary Algorithms in Python

This repo implements a Topological SLAM: Deep Visual Odometry with Long Term Place Recognition (Loop Closure Detection)

PyNNDescent is a Python nearest neighbor descent for approximate nearest neighbors.

scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms.

mlpack: a scalable C++ machine learning library --

This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.