Text language identification using Wikipedia data

The aim of this project is to provide high-quality language detection over all the web's languages. The proxy for all web's languages is Wikipedia. Currently, we support 156 languages that have their Wikipedia entries.

Usage

The main function is text-langs that returns 2 values:

a lang - probability alist (languages are represented by their ISO-639-1 codes)
a vector of tokens with their inferred langs

WILD> (text-langs "це тест")
((:UK . 0.5000003) (:RU . 0.4999998))
#(<це - UK:1.00> <тест - RU:1.00>)

Running as a service

Installation

Install SBCL
Get Quicklisp
Git clone project
$ cd wiki-lang-detect; sbcl --load run.lisp

Running as a Docker

docker build -t wiki-lang-detect:latest .
docker run -it -p 5000:5000 wiki-lang-detect:latest

curl -X POST -H "Content-Type: application/json" -d "{'text': 'Несе Галя'}"  http://localhost:5000/detect | jq '.'

Or you can use prebuilt Docker image maintained outside of this repository.

docker run -it -p 5000:5000 chaliy/wiki-lang-detect:latest

API

See swagger definition

Text language identification using Wikipedia data

Related tags

Overview

Text language identification using Wikipedia data

Usage

Running as a service

Installation

Running as a Docker

API

Helpful links:

Owner

Vsevolod Dyomkin

Maze generator and solver with python

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

Give a solution to recognize MaoYan font.

In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Handwritten Text Recognition (HTR) using TensorFlow 2.x

Learn computer graphics by writing GPU shaders!

This is a real life mario project using python and mediapipe

EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

Document manipulation detection with python

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

GDB python tool to pretty print and debug c++ xtensor containers

BNF Globalization Code (CVPR 2016)

Generating .npy dataset and labels out of given image, containing numbers from 0 to 9, using opencv

Handwritten_Text_Recognition

A tensorflow implementation of EAST text detector

Extract tables from scanned image PDFs using Optical Character Recognition.

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Textboxes implementation with Tensorflow (python)

Memory tests solver with using OpenCV

A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.