Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Last update: Jan 06, 2023

Related tags

Overview

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation

This package provides easy to use, state-of-the-art machine translation for more than 100+ languages. The highlights of this package are:

Easy installation and usage: Use state-of-the-art machine translation with 3 lines of code
Automatic download of pre-trained machine translation models
Translation between 150+ languages
Automatic language detection for 170+ languages
Sentence and document translation
Multi-GPU and multi-process translation

At the moment, we provide the following models:

Opus-MT from Helsinki-NLP, supporting 1200+ translation directions for 150+ languages.
mBART50_m2m from Facebook Research, supporting translation between any direction for 50+ languages.
M2M_100 from Facebook Research, supporting translation between any direction for 100+ languages.

Examples:

Installation

You can install the package via:

pip install -U easynmt

The models are based on PyTorch. If you have a GPU available, see how to install PyTorch with GPU support. If you use Windows and have issues with the installation, see this issue how to solve it.

Usage

The usage is simple:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

#Translate a single sentence to German
print(model.translate('This is a sentence we want to translate to German', target_lang='de'))

#Translate several sentences to German
sentences = ['You can define a list with sentences.',
             'All sentences are translated to your target language.',
             'Note, you could also mix the languages of the sentences.']
print(model.translate(sentences, target_lang='de'))

Document Translation

The available models are based on the Transformer architecture, which provide state-of-the-art translation quality. However, the input length is limited to 512 word pieces for the opus-mt model and 1024 word pieces for the M2M models.

The translate() performs automatic sentence splitting to be able to translate also longer documents:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

document = """Berlin is the capital and largest city of Germany by both area and population.[6][7] Its 3,769,495 inhabitants as of 31 December 2019[2] make it the most-populous city of the European Union, according to population within city limits.[8] The city is also one of Germany's 16 federal states. It is surrounded by the state of Brandenburg, and contiguous with Potsdam, Brandenburg's capital. The two cities are at the center of the Berlin-Brandenburg capital region, which is, with about six million inhabitants and an area of more than 30,000 km2,[9] Germany's third-largest metropolitan region after the Rhine-Ruhr and Rhine-Main regions. Berlin straddles the banks of the River Spree, which flows into the River Havel (a tributary of the River Elbe) in the western borough of Spandau. Among the city's main topographical features are the many lakes in the western and southeastern boroughs formed by the Spree, Havel, and Dahme rivers (the largest of which is Lake Müggelsee). Due to its location in the European Plain, Berlin is influenced by a temperate seasonal climate. About one-third of the city's area is composed of forests, parks, gardens, rivers, canals and lakes.[10] The city lies in the Central German dialect area, the Berlin dialect being a variant of the Lusatian-New Marchian dialects.

First documented in the 13th century and at the crossing of two important historic trade routes,[11] Berlin became the capital of the Margraviate of Brandenburg (1417–1701), the Kingdom of Prussia (1701–1918), the German Empire (1871–1918), the Weimar Republic (1919–1933), and the Third Reich (1933–1945).[12] Berlin in the 1920s was the third-largest municipality in the world.[13] After World War II and its subsequent occupation by the victorious countries, the city was divided; West Berlin became a de facto West German exclave, surrounded by the Berlin Wall (1961–1989) and East German territory.[14] East Berlin was declared capital of East Germany, while Bonn became the West German capital. Following German reunification in 1990, Berlin once again became the capital of all of Germany.

Berlin is a world city of culture, politics, media and science.[15][16][17][18] Its economy is based on high-tech firms and the service sector, encompassing a diverse range of creative industries, research facilities, media corporations and convention venues.[19][20] Berlin serves as a continental hub for air and rail traffic and has a highly complex public transportation network. The metropolis is a popular tourist destination.[21] Significant industries also include IT, pharmaceuticals, biomedical engineering, clean tech, biotechnology, construction and electronics."""

#Translate the document to German
print(model.translate(document, target_lang='de'))

The function breaks down the document into sentences and then translates the sentences individually using the specified model.

Automatic Language Detection

You can set the source_lang for the translate method to define the source language. If source_lang is not set, fastText will be used to automatically determine the source language. This also allows you to provide a list with sentences / documents that have various languages:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

#Translate several sentences to English
sentences = ['Dies ist ein Satz in Deutsch.',   #This is a German sentence
             '这是一个中文句子',    #This is a chinese sentence
             'Esta es una oración en español.'] #This is a spanish sentence
print(model.translate(sentences, target_lang='en'))

Available Models

The following models are currently available. They provide translations between 150+ languages.

Model	Reference	#Languages	Size	Speed GPU (Sentences/Sec on V100)	Speed CPU (Sentences/Sec)	Comment
opus-mt	Helsinki-NLP	186	300 MB	53	6	Inidivudal models (~300 MB) per translation direction
mbart50_m2m	Facebook Research	52	1.2 GB	35	0.9
m2m_100_418M	Facebook Research	100	0.9 GB	39	1.1
m2m_100_1.2B	Facebook Research	100	2.4 GB	23	0.5

Translation Quality

Comparing model translation quality will be added soon here. So far, my personal subjective impression is, that opus-mt and m2m_100_1.2B yield the best translations.

Opus-MT

We provide a wrapper for the pre-trained models from Opus-MT.

Opus-MT provides 1200+ different translation models, each capable to translate one direction (e.g. from German to English). Each model is about 300 MB of size.

Supported languages: aav, aed, af, alv, am, ar, art, ase, az, bat, bcl, be, bem, ber, bg, bi, bn, bnt, bzs, ca, cau, ccs, ceb, cel, chk, cpf, crs, cs, csg, csn, cus, cy, da, de, dra, ee, efi, el, en, eo, es, et, eu, euq, fi, fj, fr, fse, ga, gaa, gil, gl, grk, guw, gv, ha, he, hi, hil, ho, hr, ht, hu, hy, id, ig, ilo, is, iso, it, ja, jap, ka, kab, kg, kj, kl, ko, kqn, kwn, kwy, lg, ln, loz, lt, lu, lua, lue, lun, luo, lus, lv, map, mfe, mfs, mg, mh, mk, mkh, ml, mos, mr, ms, mt, mul, ng, nic, niu, nl, no, nso, ny, nyk, om, pa, pag, pap, phi, pis, pl, pon, poz, pqe, pqw, prl, pt, rn, rnd, ro, roa, ru, run, rw, sal, sg, sh, sit, sk, sl, sm, sn, sq, srn, ss, ssp, st, sv, sw, swc, taw, tdt, th, ti, tiv, tl, tll, tn, to, toi, tpi, tr, trk, ts, tum, tut, tvl, tw, ty, tzo, uk, umb, ur, ve, vi, vsl, wa, wal, war, wls, xh, yap, yo, yua, zai, zh, zne

Usage:

from easynmt import EasyNMT
model = EasyNMT('opus-mt', max_loaded_models=10)

The system will automatically detect the suitable Opus-MT model and load it. With the optional parameter max_loaded_models you can specify the maximal number of models that are simoultanously loaded. If you then translate with an unseen language direction, the oldest model is unloaded and the new model is loaded.

mBERT_50

We provide a wrapper for the mBART50 model from Facebook, that is able to translate between any pair of 50+ languages.

Usage:

from easynmt import EasyNMT
model = EasyNMT('mbart50_m2m')

Supported languages: af, ar, az, bn, cs, de, en, es, et, fa, fi, fr, gl, gu, he, hi, hr, id, it, ja, ka, kk, km, ko, lt, lv, mk, ml, mn, mr, my, ne, nl, pl, ps, pt, ro, ru, si, sl, sv, sw, ta, te, th, tl, tr, uk, ur, vi, xh, zh

M2M_100

We provide a wrapper for the M2M 100 model from Facebook, that is able to translate between any pair of 100 languages.

Supported languages: af, am, ar, ast, az, ba, be, bg, bn, br, bs, ca, ceb, cs, cy, da, de, el, en, es, et, fa, ff, fi, fr, fy, ga, gd, gl, gu, ha, he, hi, hr, ht, hu, hy, id, ig, ilo, is, it, ja, jv, ka, kk, km, kn, ko, lb, lg, ln, lo, lt, lv, mg, mk, ml, mn, mr, ms, my, ne, nl, no, ns, oc, or, pa, pl, ps, pt, ro, ru, sd, si, sk, sl, so, sq, sr, ss, su, sv, sw, ta, th, tl, tn, tr, uk, ur, uz, vi, wo, xh, yi, yo, zh, zu

As the moment, we provide wrapper for two M2M 100 models:

m2m_100_418M: M2M model with 418 million parameters (0.9 GB)
m2m_100_1.2B: M2M model with 1.2 billion parameters (2.4 GB)

Usage:

from easynmt import EasyNMT
model = EasyNMT('m2m_100_418M')   #or: EasyNMT('m2m_100_1.2B')

You can find more information here. Note: the 12 billion M2M parameters model is currently not supported.

As soon as you call EasyNMT('m2m_100_418M') / EasyNMT('m2m_100_1.2B'), the respective model is downloaded and cached locally.

Author

Contact person: Nils Reimers; [email protected]

https://www.ukp.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software to encourage future research.

Comments

Missing supported translate pair with M2M_100 model

Hi, I found that M2M_100 support translate directly between any pair in 100 languages (9900 pairs). But when I use EasyNMT with M2M_100 model, it doesn't support all of these pairs.

Example: EasyNMT can't translate directly from 'th' (Thai) to 'en' (English) while M2M_100 model does support this pair.

And when I tried to use HuggingFace to translate directly between Thai and English, it work perfectly.

Can you please solve the problem? By the way, thank you for creating EasyNMT.

opened by nguyenhuuthuat09 12
Can't access other models in docker image

Hi,

I'm sorry for this noobish question/issue and maybe it is easy to resolve (I'm not experienced with docker). I've built a web app which uses easyNMT in the back via the docker images and REST. When translating from romanian to german I noticed that the docker image is only using the opus model which does not provide this language direction. But when executing the "/model_name" request it shows me only "opus" as part of the docker image.

So how can I get the other models? I have 3 docker images of easynmt (one with 7.7gb, one with 6.02 and one with 3.8 gb size) but it seems none of them contains the other models. Am I doing something wrong here? And also when they are part of the image, is there some kind of auto selection if a language is not available in one of the packages?

I installed the docker images via the "build-docker-hub.sh" file.

Best regards, André

opened by 4quen 5
Library not translating, just returning input
Hello I am running the following code

from easynmt import EasyNMT model = EasyNMT('opus-mt') print(model.translate("停", target_lang='en'))

The result of the code is just "停", which is the exact same thing as the input. How can i fix this?
opened by geekjr 5
Can this project support num-beams in opus-mt model ?

I find similar project called ktrain support this. located in https://github.com/amaiya/ktrain/blob/5c9c6b333115be44433639c4bc4c091bd79ab65c/ktrain/text/translation/core.py and have some accuracy measurement output to summarize the conclusion will more interesting. Can multilingual sentence embedding can do some help ?

opened by svjack 5
AttributeError: 'float' object has no attribute 'split'

Hi Team, I have a question. I am trying to translate a column which has blanks in between. I am using EasyMT and its giving an error. won't it work if there is a blanks or missing in between the rows of a column?

Thanks Srinivas

opened by sriprad 4

Sending large documents for translation with GET endpoint can sometimes result in URL parser error

With large documents (several thousand characters), I see a crash in the URL parser. Tried with different source and target languages. Also, for the same source text, the translation may sometimes succeed, but then fail again. For example, for the sample request URL given below, the URL parser may sometimes succeed and sometimes fail with exception.

I'm using the last 1.x commit of EasyNMT (specifically commit 61fcf7154f01f56c02be6d30b1c5d0921b91aa2e) as it has better benchmarks than 2.x for fairseq models, but I believe the same issue should be there for the latest version too as I don't think the URL parser would have changed. I'm using the m2m_100_418M model with a T4 GPU if that matters at all.

EasyNMT error logs:

[2021-05-25 14:58:29 +0000] [16] [WARNING] Invalid HTTP request received.
Traceback (most recent call last):
  File "httptools/parser/parser.pyx", line 245, in httptools.parser.parser.cb_on_url
  File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 216, in on_url
    parsed_url = httptools.parse_url(url)
  File "httptools/parser/parser.pyx", line 468, in httptools.parser.parser.parse_url
httptools.parser.errors.HttpParserInvalidURLError: invalid url b'ividual+level+we+need+to+take+up+responsibility+to+curb+the+spread+of+this+virus%2C%22+said+Dr+Rao%0D%0A%0D%0APromotedListen+to+the+latest+songs%2C+only+on+JioSaavn.com%0D%0A%0D%0AWhen+it+comes+to+Bengaluru%2C+Dr+Rao+said+the+lockdown+had+reduced+the+number+of+emergency+oxygen+requirements+and+the+panic.+%22That+is+because+the+virus+has+stopped+moving+because+we+have+stopped+moving%2C%22+he+said.+%22Generally%2C+as+a+rule%2C+a+health+care+system+will+not+be+able+to+cope+with+a+sudden+rise+in+numbers%2C+emergency+oxygen+requirements+or+health+care.+The+other+big+concern+is+trained+manpower.%22%0D%0A%0D%0AComments%0D%0AMucormycosis%2C+commonly+known+as+Black+Fungus%2C+is+also+on+the+rise+in+the+state.+Dr+Rao+said%3A+%22At+HCG+we+are+treating+30+cases+and+the+number+is+on+the+rise.+In+Karnataka%2C+currently%2C+it+must+be+about+700+cases.+It+looks+like+an+epidemic+within+a+pandemic+at+this+juncture.+We+need+to+understand+the+source+of+this+infection%2C+have+early+detection+and+treatment.+A+committee+will+give+a+clear+strategy+for+the+state.+We+don%27t+need+to+scare+people+about+black+fungus%2C+we+need+to+create+awareness.+What+we+have+seen+in+the+patients+-+they+have+all+been+Covid+positive%2C+most+have+been+given+steroids%2C+majority+had+high+sugar.+30+to+40+per+cent+had+been+given+oxygen+and+most+important+-+none+of+them+had+been+vaccinated.%22%0D%0A%0D%0A'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 167, in data_received
    self.parser.feed_data(data)
  File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data
httptools.parser.errors.HttpParserCallbackError: the on_url callback failed
[2021-05-25 14:58:29 +0000] [18] [WARNING] Invalid HTTP request received.
Traceback (most recent call last):
  File "httptools/parser/parser.pyx", line 245, in httptools.parser.parser.cb_on_url
  File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 216, in on_url
    parsed_url = httptools.parse_url(url)
  File "httptools/parser/parser.pyx", line 468, in httptools.parser.parser.parse_url
httptools.parser.errors.HttpParserInvalidURLError: invalid url b'irus+is+now+being+reported+more+from+rural+Karnataka+with+often+a+weak+health+infrastructure.%0D%0A%0D%0ADr+Vishal+Rao+of+the+HCG+hospitals+and+a+member+of+the+Karnataka+Covid+task+force+said%2C+%22It+is+going+to+be+an+uphill+task+as+we+move+towards+the+districts+as+the+health+care+systems+get+overburdened+there.+Even+the+oxygen+management.+In+cities%2C+we+have+the+privilege+that+oxygen+comes+to+the+doorstep+of+the+hospital.+Whereas+in+villages+and+districts%2C+hospitals+have+to+carry+their+cylinders+to+refill+them.+Public+health+experts+and+virologists+are+repeatedly+trying+to+enhance+the+surveillance+in+villages+to+ensure+we+are+better+prepared+in+villages.+This+is+the+time+to+ramp+up+the+preparation+for+villages.%22%0D%0A%0D%0AHe+also+said+that+the+lockdown+%22definitely+had+a+very+significant+impact%22+on+the+daily+infections.+%22From+50%2C000+cases+everyday%2C+today+we+are+at+around+20%2C000+odd+cases.It+is+not+a+reassurance+that+once+the+lockdown+is+lifted%2C+we+will+continue+to+have+these+low+numbers.+But+what+is+of+concern+is+that+the+positivity+rate+still+sticks+at+around+20+per+cent+and+the+mortality+has+jumped+to+about+2+per+cent.+We+need+to+understand+that+when+the+waves+flatten%2C+it+is+not+that+the+virus+is+taking+rest.+It+is+a+socio-economic+virus+and+the+more+we+improve+interactions+without+safety%2C+we+are+going+to+explode+and+expand+the+spread+of+this+virus.+At+an+ind'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 167, in data_received
    self.parser.feed_data(data)
  File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data
httptools.parser.errors.HttpParserCallbackError: the on_url callback failed

Full request URL path + query params:

/translate?beam_size=2&source_lang=en&target_lang=de&text=Bengaluru%3A+Karnataka+-+one+of+the+worst+hit+states+in+the+country+in+the+second+wave+of+COVID-19%2C+has+been+witnessing+a+slump+in+the+new+case+numbers+over+the+last+few+weeks.+The+authorities%2C+however%2C+are+of+the+view+that+It+is+far+too+early+to+relax.%0D%0AThe+number+of+COVID-19+outnumbered+fresh+infections+in+Karnataka+yet+again+on+Tuesday%2C+as+the+state+reported+38%2C224+discharges+and+22%2C758+new+cases.+Of+the+new+cases+reported+today%2C+6%2C243+were+from+Bengaluru.%0D%0A%0D%0A%22If+you+look+at+the+numbers%2C+it+has+been+reducing+very+drastically.+Except+for+a+few+districts+where+the+numbers+are+not+coming+down.+In+most+of+the+districts+and+Bengaluru%2C+the+numbers+have+come+down.+The+number+should+come+down+drastically+so+that+we+can+unlock+from+the+lockdown%2C%22+Deputy+Chief+Minister+Dr+Ashwath+Narayan+told+NDTV.%0D%0A%0D%0AThe+state+is+in+the+middle+of+a+strict+shutdown.+But+that+doesn%27t+mean+any+major+reduction+in+the+demand+for+oxygen+in+Bengaluru+as+ICU+beds+still+remain+full.%0D%0A%0D%0A%22Since+the+number+has+come+down+very+drastically+-+now+it+is+5%2C000+odd+cases+in+Bengaluru+%28daily+infections%29+-+from+when+it+had+almost+reached+25%2C000%2C+it+is+a+great+relief.+When+it+comes+to+ICU+or+ventilator%2C+however%2C+there+is+still+a+lot+of+demand%2C%22+he+said.%0D%0A%0D%0AAn+extra+concern+is+that+the+virus+is+now+being+reported+more+from+rural+Karnataka+with+often+a+weak+health+infrastructure.%0D%0A%0D%0ADr+Vishal+Rao+of+the+HCG+hospitals+and+a+member+of+the+Karnataka+Covid+task+force+said%2C+%22It+is+going+to+be+an+uphill+task+as+we+move+towards+the+districts+as+the+health+care+systems+get+overburdened+there.+Even+the+oxygen+management.+In+cities%2C+we+have+the+privilege+that+oxygen+comes+to+the+doorstep+of+the+hospital.+Whereas+in+villages+and+districts%2C+hospitals+have+to+carry+their+cylinders+to+refill+them.+Public+health+experts+and+virologists+are+repeatedly+trying+to+enhance+the+surveillance+in+villages+to+ensure+we+are+better+prepared+in+villages.+This+is+the+time+to+ramp+up+the+preparation+for+villages.%22%0D%0A%0D%0AHe+also+said+that+the+lockdown+%22definitely+had+a+very+significant+impact%22+on+the+daily+infections.+%22From+50%2C000+cases+everyday%2C+today+we+are+at+around+20%2C000+odd+cases.It+is+not+a+reassurance+that+once+the+lockdown+is+lifted%2C+we+will+continue+to+have+these+low+numbers.+But+what+is+of+concern+is+that+the+positivity+rate+still+sticks+at+around+20+per+cent+and+the+mortality+has+jumped+to+about+2+per+cent.+We+need+to+understand+that+when+the+waves+flatten%2C+it+is+not+that+the+virus+is+taking+rest.+It+is+a+socio-economic+virus+and+the+more+we+improve+interactions+without+safety%2C+we+are+going+to+explode+and+expand+the+spread+of+this+virus.+At+an+individual+level+we+need+to+take+up+responsibility+to+curb+the+spread+of+this+virus%2C%22+said+Dr+Rao%0D%0A%0D%0APromotedListen+to+the+latest+songs%2C+only+on+JioSaavn.com%0D%0A%0D%0AWhen+it+comes+to+Bengaluru%2C+Dr+Rao+said+the+lockdown+had+reduced+the+number+of+emergency+oxygen+requirements+and+the+panic.+%22That+is+because+the+virus+has+stopped+moving+because+we+have+stopped+moving%2C%22+he+said.+%22Generally%2C+as+a+rule%2C+a+health+care+system+will+not+be+able+to+cope+with+a+sudden+rise+in+numbers%2C+emergency+oxygen+requirements+or+health+care.+The+other+big+concern+is+trained+manpower.%22%0D%0A%0D%0AComments%0D%0AMucormycosis%2C+commonly+known+as+Black+Fungus%2C+is+also+on+the+rise+in+the+state.+Dr+Rao+said%3A+%22At+HCG+we+are+treating+30+cases+and+the+number+is+on+the+rise.+In+Karnataka%2C+currently%2C+it+must+be+about+700+cases.+It+looks+like+an+epidemic+within+a+pandemic+at+this+juncture.+We+need+to+understand+the+source+of+this+infection%2C+have+early+detection+and+treatment.+A+committee+will+give+a+clear+strategy+for+the+state.+We+don%27t+need+to+scare+people+about+black+fungus%2C+we+need+to+create+awareness.+What+we+have+seen+in+the+patients+-+they+have+all+been+Covid+positive%2C+most+have+been+given+steroids%2C+majority+had+high+sugar.+30+to+40+per+cent+had+been+given+oxygen+and+most+important+-+none+of+them+had+been+vaccinated.%22%0D%0A%0D%0ADelhi+received+144.8+mm+rainfall+in+May+this+year%2C+the+highest+for+the+month+in+13+years%2C+according+to+the+India+Meteorological+Department+%28IMD%29.%0D%0A%22No+rain+is+predicted+in+the+next+four+to+five+days.+So%2C+this+is+the+highest+rainfall+in+May+since+2008%2C%22+Kuldeep+Srivastava%2C+the+head+of+the+IMD%27s+regional+forecasting+centre%2C+said+today.%0D%0A%0D%0AThe+Safdarjung+Observatory%2C+considered+the+official+marker+for+the+city%2C+had+recorded+21.1+mm+rainfall+last+year%2C+26.9+mm+in+2019+and+24.2+mm+in+2018.%0D%0A%0D%0AIt+had+gauged+40.5+mm+precipitation+in+2017%3B+24.3+mm+in+2016%3B+3.1+mm+in+2015+and+100.2+mm+in+2014%2C+according+to+IMD+data.%0D%0A%0D%0A

opened by AgrimPrasad 3

Docker image easynmt/api:2.0-cpu crashes when trying to run on mac

Running this on a 2017 Macbook. Docker image easynmt/api:2.0-cpu fails to start with exceptions, while easynmt/api:1.1-cpu was running fine with the same docker run command previously.

docker run -p 24081:80 -v /Users/agrim/Downloads/easynmt-models:/cache easynmt/api:2.0-cpu

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
    self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
KeyError: 'model_args'
Checking for script in /app/prestart.sh
There is no script /app/prestart.sh
[2021-04-27 14:38:22 +0000] [13] [INFO] Starting gunicorn 20.1.0
[2021-04-27 14:38:22 +0000] [12] [INFO] Starting gunicorn 20.1.0
[2021-04-27 14:38:22 +0000] [12] [INFO] Listening at: http://0.0.0.0:8080 (12)
[2021-04-27 14:38:22 +0000] [13] [INFO] Listening at: http://0.0.0.0:80 (13)
[2021-04-27 14:38:22 +0000] [13] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2021-04-27 14:38:22 +0000] [12] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2021-04-27 14:38:22 +0000] [17] [INFO] Booting worker with pid: 17
[2021-04-27 14:38:22 +0000] [18] [INFO] Booting worker with pid: 18
[2021-04-27 14:38:22 +0000] [19] [INFO] Booting worker with pid: 19
[2021-04-27 14:38:24 +0000] [19] [INFO] Started server process [19]
[2021-04-27 14:38:24 +0000] [17] [INFO] Started server process [17]
[2021-04-27 14:38:24 +0000] [17] [INFO] Waiting for application startup.
[2021-04-27 14:38:24 +0000] [19] [INFO] Waiting for application startup.
[2021-04-27 14:38:24 +0000] [17] [INFO] Application startup complete.
[2021-04-27 14:38:24 +0000] [19] [INFO] Application startup complete.
{"loglevel": "info", "workers": "1", "bind": "0.0.0.0:8080", "graceful_timeout": 120, "timeout": 120, "keepalive": 5, "errorlog": "-", "accesslog": "-", "host": "0.0.0.0", "port": "8080"}
Booted as backend: True
Load model: opus-mt
[2021-04-27 14:38:25 +0000] [18] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 63, in init_process
    super(UvicornWorker, self).init_process()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/app/main.py", line 36, in <module>
    model = EasyNMT(model_name, load_translator=IS_BACKEND, **model_args)
  File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
    self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
KeyError: 'model_args'
[2021-04-27 14:38:25 +0000] [18] [INFO] Worker exiting (pid: 18)
[2021-04-27 14:38:25 +0000] [12] [INFO] Shutting down: Master
[2021-04-27 14:38:25 +0000] [12] [INFO] Reason: Worker failed to boot.
{"loglevel": "info", "workers": "1", "bind": "0.0.0.0:8080", "graceful_timeout": 120, "timeout": 120, "keepalive": 5, "errorlog": "-", "accesslog": "-", "host": "0.0.0.0", "port": "8080"}
One of the processes has already exited.

opened by AgrimPrasad 3

No module named 'easynmt.models.OpusMT' in PyCharm

Hello, I'm running this simple code in pycharm `from easynmt import EasyNMT model = EasyNMT("opus-mt")

print(model.translate("Hi", target_lang="fr"))`

and it gives me this error

Traceback (most recent call last): File "H:/Documents/Python/Random.py", line 2, in model = EasyNMT("opus-mt") File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\site-packages\easynmt\EasyNMT.py", line 69, in init module_class = import_from_string(self.config['model_class']) File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\site-packages\easynmt\util.py", line 56, in import_from_string module = importlib.import_module(module_path) File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\importlib_init_.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'easynmt.models.OpusMT'

I had the same error addressed here when installing easynmt, and followed the steps, nothing happened... how do I fix this?

opened by Lorddickenstein 3
OSError after a few translations
Hi and thanks for the cool library!

I want to include the translation function in one of my data pipelines that loops over thousands of text snippets. Without the GPU support and on Windows I was following the instructions in the other issue and successfully added the function.

from easynmt import EasyNMT model = EasyNMT('opus-mt')

and I translate with:

language = detect_langs(text) for each_lang in language: if (each_lang.lang != "en"): translated_text = model.translate(text, target_lang='en')

whereas text is a string. However, after a few translations (2-3) I always run into this error:

OSError: Can't load tokenizer for 'Helsinki-NLP/opus-mt-ia-en'. Make sure that: - 'Helsinki-NLP/opus-mt-ia-en' is a correct model identifier listed on 'https://huggingface.co/models'

Any idea what the problem could be?
opened by jonas-nothnagel 3

MBart50Converter requires the protobuf library but it was not found in your environment.

Try to use docker image with model mbart50_m2m Command: docker run --env EASYNMT_MODEL=mbart50_m2m --env TIMEOUT=600 --env MAX_WORKERS_FRONTEND=1 -p 24080:80 easynmt/api:2.0-cpu And Its exited with trace:

[2022-05-26 12:48:19 +0000] [36] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 63, in init_process
    super(UvicornWorker, self).init_process()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/app/main.py", line 36, in <module>
    model = EasyNMT(model_name, load_translator=IS_BACKEND, **model_args)
  File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
    self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
  File "/usr/local/lib/python3.8/site-packages/easynmt/models/AutoModel.py", line 32, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, **self.tokenizer_args)
  File "/usr/local/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 407, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1709, in from_pretrained
    return cls._from_pretrained(
  File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1781, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/usr/local/lib/python3.8/site-packages/transformers/models/mbart/tokenization_mbart50_fast.py", line 128, in __init__
    super().__init__(
  File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 99, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/usr/local/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 708, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/usr/local/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 301, in __init__
    requires_protobuf(self)
  File "/usr/local/lib/python3.8/site-packages/transformers/file_utils.py", line 574, in requires_protobuf
    raise ImportError(PROTOBUF_IMPORT_ERROR.format(name))
ImportError: 
MBart50Converter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment.

Does anybody works on docker images and can fix it?

opened by TheMY3 2

Issues installing sentencepiece and fasttext dependencies on Windows and Mac

Trying to install EasyNMT 2.0.1 on Windows 10 and normal Python 3.10.0 installation (not Anaconda)

A colleague said he had the same issue on his Mac.

Building wheels for collected packages: fasttext, sentencepiece
  Building wheel for fasttext (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [52 lines of output]
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
        warnings.warn(
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-3.10
      creating build\lib.win-amd64-3.10\fasttext
      copying python\fasttext_module\fasttext\FastText.py -> build\lib.win-amd64-3.10\fasttext
      copying python\fasttext_module\fasttext\__init__.py -> build\lib.win-amd64-3.10\fasttext
      creating build\lib.win-amd64-3.10\fasttext\util
      copying python\fasttext_module\fasttext\util\util.py -> build\lib.win-amd64-3.10\fasttext\util
      copying python\fasttext_module\fasttext\util\__init__.py -> build\lib.win-amd64-3.10\fasttext\util
      creating build\lib.win-amd64-3.10\fasttext\tests
      copying python\fasttext_module\fasttext\tests\test_configurations.py -> build\lib.win-amd64-3.10\fasttext\tests
      copying python\fasttext_module\fasttext\tests\test_script.py -> build\lib.win-amd64-3.10\fasttext\tests
      copying python\fasttext_module\fasttext\tests\__init__.py -> build\lib.win-amd64-3.10\fasttext\tests
      running build_ext
      building 'fasttext_pybind' extension
      creating build\temp.win-amd64-3.10
      creating build\temp.win-amd64-3.10\Release
      creating build\temp.win-amd64-3.10\Release\python
      creating build\temp.win-amd64-3.10\Release\python\fasttext_module
      creating build\temp.win-amd64-3.10\Release\python\fasttext_module\fasttext
      creating build\temp.win-amd64-3.10\Release\python\fasttext_module\fasttext\pybind
      creating build\temp.win-amd64-3.10\Release\src
      "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include -Isrc -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tppython/fasttext_module/fasttext/pybind/fasttext_pybind.cc /Fobuild\temp.win-amd64-3.10\Release\python/fasttext_module/fasttext/pybind/fasttext_pybind.obj /EHsc /DVERSION_INFO=\\\"0.9.2\\\"
      fasttext_pybind.cc
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2065: 'ssize_t': undeclared identifier
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2672: 'pybind11::init': no matching overloaded function found
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'CFunc', type expected
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1702): note: see declaration of 'pybind11::init'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'Func', type expected
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1697): note: see declaration of 'pybind11::init'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'Args', type expected
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1690): note: see declaration of 'pybind11::init'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2672: 'pybind11::class_<fasttext::Vector>::def': no matching overloaded function found
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2780: 'pybind11::class_<fasttext::Vector> &pybind11::class_<fasttext::Vector>::def(const char *,Func &&,const Extra &...)': expects 3 arguments - 1 provided
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1416): note: see declaration of 'pybind11::class_<fasttext::Vector>::def'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2065: 'ssize_t': undeclared identifier
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2065: 'ssize_t': undeclared identifier
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2672: 'pybind11::init': no matching overloaded function found
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'CFunc', type expected
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1702): note: see declaration of 'pybind11::init'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'Func', type expected
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1697): note: see declaration of 'pybind11::init'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'Args', type expected
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1690): note: see declaration of 'pybind11::init'
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2672: 'pybind11::class_<fasttext::DenseMatrix>::def': no matching overloaded function found
      python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2780: 'pybind11::class_<fasttext::DenseMatrix> &pybind11::class_<fasttext::DenseMatrix>::def(const char *,Func &&,const Extra &...)': expects 3 arguments - 1 provided
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1416): note: see declaration of 'pybind11::class_<fasttext::DenseMatrix>::def'
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for fasttext
  Running setup.py clean for fasttext
  Building wheel for sentencepiece (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [22 lines of output]
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
        warnings.warn(
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-3.10
      creating build\lib.win-amd64-3.10\sentencepiece
      copying src\sentencepiece/__init__.py -> build\lib.win-amd64-3.10\sentencepiece
      copying src\sentencepiece/sentencepiece_model_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
      copying src\sentencepiece/sentencepiece_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
      running build_ext
      building 'sentencepiece._sentencepiece' extension
      creating build\temp.win-amd64-3.10
      creating build\temp.win-amd64-3.10\Release
      creating build\temp.win-amd64-3.10\Release\src
      creating build\temp.win-amd64-3.10\Release\src\sentencepiece
      "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpsrc/sentencepiece/sentencepiece_wrap.cxx /Fobuild\temp.win-amd64-3.10\Release\src/sentencepiece/sentencepiece_wrap.obj /MT /I..\build\root\include
      cl : Command line warning D9025 : overriding '/MD' with '/MT'
      sentencepiece_wrap.cxx
      src/sentencepiece/sentencepiece_wrap.cxx(2809): fatal error C1083: Cannot open include file: 'sentencepiece_processor.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for sentencepiece
  Running setup.py clean for sentencepiece
Failed to build fasttext sentencepiece
Installing collected packages: sentencepiece, fasttext, EasyNMT
  Running setup.py install for sentencepiece ... error
  error: subprocess-exited-with-error

  × Running setup.py install for sentencepiece did not run successfully.
  │ exit code: 1
  ╰─> [24 lines of output]
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
        warnings.warn(
      running install
      C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-3.10
      creating build\lib.win-amd64-3.10\sentencepiece
      copying src\sentencepiece/__init__.py -> build\lib.win-amd64-3.10\sentencepiece
      copying src\sentencepiece/sentencepiece_model_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
      copying src\sentencepiece/sentencepiece_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
      running build_ext
      building 'sentencepiece._sentencepiece' extension
      creating build\temp.win-amd64-3.10
      creating build\temp.win-amd64-3.10\Release
      creating build\temp.win-amd64-3.10\Release\src
      creating build\temp.win-amd64-3.10\Release\src\sentencepiece
      "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpsrc/sentencepiece/sentencepiece_wrap.cxx /Fobuild\temp.win-amd64-3.10\Release\src/sentencepiece/sentencepiece_wrap.obj /MT /I..\build\root\include
      cl : Command line warning D9025 : overriding '/MD' with '/MT'
      sentencepiece_wrap.cxx
      src/sentencepiece/sentencepiece_wrap.cxx(2809): fatal error C1083: Cannot open include file: 'sentencepiece_processor.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> sentencepiece

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

opened by ZackPlauche 2

Is there randomness in translation or does every translation lead to the exact same output?

Thanks a lot for creating this great package! Question: will every translation with equivalent input always lead to the exact same output, or is there some randomness involved (e.g. through beamsearch), which requires setting a seed for full reproducibility? I've discussed this with colleagues and there seem to be some beamsearch algorithms that are stochastic (i.e. introduce randomness) and others do not. Which one is used here? If a stochastic algorithm is used, how would be set a seed to ensure reproducibility?

opened by MoritzLaurer 0
EasyNMT

Hi, Thank you for this useful library. I tried to install in my machine and it gave me this error. Any help please?

print(model.translate('This is a sentence we want to translate to German', target_lang='de')) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 154, in translate raise e File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 149, in translate translated = self.translate(**method_args) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 181, in translate translated_sentences = self.translate_sentences(splitted_sentences, target_lang=target_lang, source_lang=source_lang, show_progress_bar=show_progress_bar, beam_size=beam_size, batch_size=batch_size, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 278, in translate_sentences output.extend(self.translator.translate_sentences(sentences_sorted[start_idx:start_idx+batch_size], source_lang=source_lang, target_lang=target_lang, beam_size=beam_size, device=self.device, **kwargs)) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/models/OpusMT.py", line 49, in translate_sentences translated = model.generate(**inputs, num_beams=beam_size, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/transformers/generation_utils.py", line 1182, in generate model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation( File "/home/karima/.local/lib/python3.8/site-packages/transformers/generation_utils.py", line 525, in _prepare_encoder_decoder_kwargs_for_generation model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/transformers/models/marian/modeling_marian.py", line 749, in forward inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

opened by abidikarima 0
How to run test_translation_speed.py

easynmt docker install is working fine through http requests. Now i'd like to run some benchmark. How do you run /examples/test_translation_speed.py ?

opened by JohnWinner 2
Workflow for large datasets

Hi! I was wondering if there is a workflow for large datasets available. I am trying to translate a big amount of tweets using Pandas and Python.

Best, Daniel

opened by viajerus 0
Enable manually specifying the desired OPUS model?

I really like the library, great work! Is there a way to manually specify a specific OPUS model? For example EasyNMT with OPUS currently does not support English as source and Portuguese as target language because it tries to download 'opus-mt-en-pt' by default, which does not exist. There is, however, an en2pt model on the hub now (https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-pt) with a slightly different name. I don't know how to tell EasyNMT to take this specific model instead of throwing the following error:

OSError: Helsinki-NLP/opus-mt-en-pt is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

opened by MoritzLaurer 2

Releases(v2.0.0)

v2.0.0(Apr 26, 2021)

mbart50 & m2m models now use huggingface transformers

The mbart50 & m2m models required in version 1 the fairseq library. This caused several issues: fairseq cannot be used on Windows, multi-processing did not work with fairseq models, loading and using the models were quite complicated.

With this release, the fairseq dependency is removed and mbart50 / m2m models are loaded with huggingface transformers version >= 4.4.0

From a user perspective, no changes should be visible. But from a developer perspective, this simplifies the architecture of EasyNMT and allows new futures more easily be integrated.

Saving models

Models can now be saved to disc by calling:

model.save(output_path)

Models can be loaded from disc by calling:

model = EasyNMT(output_path)

Loadings models from huggingface model hub

Loading of any Huggingface Translation Model is now simple. Simply pass the name or the model path to the following code:

from easynmt import EasyNMT, models
article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate 
sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""

model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro')) 
print(model.translate(article, source_lang='en_XX', target_lang='ro_RO'))

This loads the facebook/mbart-large-en-ro model from the model hub.

Note: Models might use different language codes, e.g. the mbart model uses 'en_XX' instead of 'en' and 'ro_RO' instead of 'ro'. To make the language code consistent, you can pass a lang_map:

from easynmt import EasyNMT, models

article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate 
sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""

output_path = 'output/mbart-large-en-ro'
model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro', lang_map={'en': 'en_XX', 'ro': 'ro_RO'}))

#Save the model to disc
model.save(output_path)

# Load the model from disc
model = EasyNMT(output_path)
print(model.translate(article,  target_lang='ro'))

Source code(tar.gz)
Source code(zip)

v1.1.0(Mar 15, 2021)
This release brings several improvements and is the first step towards the release of a Docker Image + REST API.

Improvements:

Docker REST API: We have published Docker images for a REST API, that allows the easy usage of EasyNMT. Just run the Docker image and starts translating using REST API calls: more info

Google Colab REST API Hosting: We have published a colab notenbook that shows to to wrap EasyNMT in a REST API and host it on Google Colab with a free GPU. Useful if you need to translate large amounts.

Long sentences are translated first: Sentences are sorted before they are translated in order to waste minimal time with padding tokens. In the previous version, the shortest sentences were translated first and then later the longer sentences. Now the order is reversed. This has several advantages: If an OOM happens, it happens at the start of the translation process and not at the end. Also, the estimate from the progress bar is more accurate as the longest and slowest sentences are now translated first.

Improve language detection: Automatic language is still an issue, especially for mixed languages. Language detection is now performed on document level and not on sentence level. If you need sentence level lang. detection on sentence level you can set document_language_detection=False for the translate method. Also, text is now lower cased before the language is detected (the lang. detection scripts had issues with all upper case text

Max length parameter: When you create your model like this: model = EasyNMT(model_name, max_length=100), then all sentences with more than 100 word pieces will be truncated to at max 100 word pieces. This can prevent OOM with too long sentences.

Load model without translator: If you just want to use the language detection methods, you can now load your model like model = EasyNMT(model_name, load_translator=False). This will prevent the loading of the translation engine.

Roadmap

As soon as Huggingface transformers v4.4.0 is released, the dependency on fairseq can be removed as the mBART50 and m2m models will be available in HF transformers. This will make the installation on a Windows machine possible

Source code(tar.gz)
Source code(zip)
v1.0.2(Jan 29, 2021)
fastText is used for automatic language detection, as it provides the highest speed and best accuracy.

However, it can be complicated to install it on Windows as it requires a C/C++ compiler.

This release adds two alternative language identifiers:

[langid][(https://github.com/saffsd/langid.py) - Can be installed via pip install langid

langdetect - Can be installed via pip install langdetect

If fastText is not available, langid / langdetect will be used as alternative language detection methods.

For installation on Windows, you can run the following commands:

pip install --no-deps easynmt pip install tqdm transformers numpy nltk sentencepiece langid

Further, you have to install pytorch as described here: https://pytorch.org/get-started/locally/

If you want to install fastText on Windows, I can recommend this link: https://anaconda.org/conda-forge/fasttext
Source code(tar.gz)
Source code(zip)
v1.0.1(Jan 27, 2021)

fastText language detection did not work well if the text was in UPPERCASE.

Adding lower() to the string before the language identification step significantly improved the performance.
Source code(tar.gz)
Source code(zip)
v1.0.0(Jan 27, 2021)

First release of EasyNMT - Easy-to-use, state-of-the-art machine translation using transformers architecture.
Source code(tar.gz)
Source code(zip)

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Related tags

Overview

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation

Installation

Usage

Document Translation

Automatic Language Detection

Available Models

Translation Quality

Opus-MT

mBERT_50

M2M_100

Author

Comments

Releases(v2.0.0)

v2.0.0(Apr 26, 2021)

mbart50 & m2m models now use huggingface transformers

Saving models

Loadings models from huggingface model hub

v1.1.0(Mar 15, 2021)

v1.0.2(Jan 29, 2021)

v1.0.1(Jan 27, 2021)

v1.0.0(Jan 27, 2021)

Owner

Ubiquitous Knowledge Processing Lab

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

NLP-based analysis of poor Chinese movie reviews on Douban

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

Non-Autoregressive Predictive Coding

This is a project of data parallel that running on NLP tasks.

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

基于pytorch_rnn的古诗词生成

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

TalkNet: Audio-visual active speaker detection Model

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

Repository for Project Insight: NLP as a Service

[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Spooky Skelly For Python

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation