Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Overview

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation

This package provides easy to use, state-of-the-art machine translation for more than 100+ languages. The highlights of this package are:

  • Easy installation and usage: Use state-of-the-art machine translation with 3 lines of code
  • Automatic download of pre-trained machine translation models
  • Translation between 150+ languages
  • Automatic language detection for 170+ languages
  • Sentence and document translation
  • Multi-GPU and multi-process translation

At the moment, we provide the following models:

Examples:

Installation

You can install the package via:

pip install -U easynmt

The models are based on PyTorch. If you have a GPU available, see how to install PyTorch with GPU support. If you use Windows and have issues with the installation, see this issue how to solve it.

Usage

The usage is simple:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

#Translate a single sentence to German
print(model.translate('This is a sentence we want to translate to German', target_lang='de'))

#Translate several sentences to German
sentences = ['You can define a list with sentences.',
             'All sentences are translated to your target language.',
             'Note, you could also mix the languages of the sentences.']
print(model.translate(sentences, target_lang='de'))

Document Translation

The available models are based on the Transformer architecture, which provide state-of-the-art translation quality. However, the input length is limited to 512 word pieces for the opus-mt model and 1024 word pieces for the M2M models.

The translate() performs automatic sentence splitting to be able to translate also longer documents:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

document = """Berlin is the capital and largest city of Germany by both area and population.[6][7] Its 3,769,495 inhabitants as of 31 December 2019[2] make it the most-populous city of the European Union, according to population within city limits.[8] The city is also one of Germany's 16 federal states. It is surrounded by the state of Brandenburg, and contiguous with Potsdam, Brandenburg's capital. The two cities are at the center of the Berlin-Brandenburg capital region, which is, with about six million inhabitants and an area of more than 30,000 km2,[9] Germany's third-largest metropolitan region after the Rhine-Ruhr and Rhine-Main regions. Berlin straddles the banks of the River Spree, which flows into the River Havel (a tributary of the River Elbe) in the western borough of Spandau. Among the city's main topographical features are the many lakes in the western and southeastern boroughs formed by the Spree, Havel, and Dahme rivers (the largest of which is Lake Müggelsee). Due to its location in the European Plain, Berlin is influenced by a temperate seasonal climate. About one-third of the city's area is composed of forests, parks, gardens, rivers, canals and lakes.[10] The city lies in the Central German dialect area, the Berlin dialect being a variant of the Lusatian-New Marchian dialects.

First documented in the 13th century and at the crossing of two important historic trade routes,[11] Berlin became the capital of the Margraviate of Brandenburg (1417–1701), the Kingdom of Prussia (1701–1918), the German Empire (1871–1918), the Weimar Republic (1919–1933), and the Third Reich (1933–1945).[12] Berlin in the 1920s was the third-largest municipality in the world.[13] After World War II and its subsequent occupation by the victorious countries, the city was divided; West Berlin became a de facto West German exclave, surrounded by the Berlin Wall (1961–1989) and East German territory.[14] East Berlin was declared capital of East Germany, while Bonn became the West German capital. Following German reunification in 1990, Berlin once again became the capital of all of Germany.

Berlin is a world city of culture, politics, media and science.[15][16][17][18] Its economy is based on high-tech firms and the service sector, encompassing a diverse range of creative industries, research facilities, media corporations and convention venues.[19][20] Berlin serves as a continental hub for air and rail traffic and has a highly complex public transportation network. The metropolis is a popular tourist destination.[21] Significant industries also include IT, pharmaceuticals, biomedical engineering, clean tech, biotechnology, construction and electronics."""

#Translate the document to German
print(model.translate(document, target_lang='de'))

The function breaks down the document into sentences and then translates the sentences individually using the specified model.

Automatic Language Detection

You can set the source_lang for the translate method to define the source language. If source_lang is not set, fastText will be used to automatically determine the source language. This also allows you to provide a list with sentences / documents that have various languages:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

#Translate several sentences to English
sentences = ['Dies ist ein Satz in Deutsch.',   #This is a German sentence
             '这是一个中文句子',    #This is a chinese sentence
             'Esta es una oración en español.'] #This is a spanish sentence
print(model.translate(sentences, target_lang='en'))

Available Models

The following models are currently available. They provide translations between 150+ languages.

Model Reference #Languages Size Speed GPU (Sentences/Sec on V100) Speed CPU (Sentences/Sec) Comment
opus-mt Helsinki-NLP 186 300 MB 53 6 Inidivudal models (~300 MB) per translation direction
mbart50_m2m Facebook Research 52 1.2 GB 35 0.9
m2m_100_418M Facebook Research 100 0.9 GB 39 1.1
m2m_100_1.2B Facebook Research 100 2.4 GB 23 0.5

Translation Quality

Comparing model translation quality will be added soon here. So far, my personal subjective impression is, that opus-mt and m2m_100_1.2B yield the best translations.

Opus-MT

We provide a wrapper for the pre-trained models from Opus-MT.

Opus-MT provides 1200+ different translation models, each capable to translate one direction (e.g. from German to English). Each model is about 300 MB of size.

Supported languages: aav, aed, af, alv, am, ar, art, ase, az, bat, bcl, be, bem, ber, bg, bi, bn, bnt, bzs, ca, cau, ccs, ceb, cel, chk, cpf, crs, cs, csg, csn, cus, cy, da, de, dra, ee, efi, el, en, eo, es, et, eu, euq, fi, fj, fr, fse, ga, gaa, gil, gl, grk, guw, gv, ha, he, hi, hil, ho, hr, ht, hu, hy, id, ig, ilo, is, iso, it, ja, jap, ka, kab, kg, kj, kl, ko, kqn, kwn, kwy, lg, ln, loz, lt, lu, lua, lue, lun, luo, lus, lv, map, mfe, mfs, mg, mh, mk, mkh, ml, mos, mr, ms, mt, mul, ng, nic, niu, nl, no, nso, ny, nyk, om, pa, pag, pap, phi, pis, pl, pon, poz, pqe, pqw, prl, pt, rn, rnd, ro, roa, ru, run, rw, sal, sg, sh, sit, sk, sl, sm, sn, sq, srn, ss, ssp, st, sv, sw, swc, taw, tdt, th, ti, tiv, tl, tll, tn, to, toi, tpi, tr, trk, ts, tum, tut, tvl, tw, ty, tzo, uk, umb, ur, ve, vi, vsl, wa, wal, war, wls, xh, yap, yo, yua, zai, zh, zne

Usage:

from easynmt import EasyNMT
model = EasyNMT('opus-mt', max_loaded_models=10)

The system will automatically detect the suitable Opus-MT model and load it. With the optional parameter max_loaded_models you can specify the maximal number of models that are simoultanously loaded. If you then translate with an unseen language direction, the oldest model is unloaded and the new model is loaded.

mBERT_50

We provide a wrapper for the mBART50 model from Facebook, that is able to translate between any pair of 50+ languages.

Usage:

from easynmt import EasyNMT
model = EasyNMT('mbart50_m2m')

Supported languages: af, ar, az, bn, cs, de, en, es, et, fa, fi, fr, gl, gu, he, hi, hr, id, it, ja, ka, kk, km, ko, lt, lv, mk, ml, mn, mr, my, ne, nl, pl, ps, pt, ro, ru, si, sl, sv, sw, ta, te, th, tl, tr, uk, ur, vi, xh, zh

M2M_100

We provide a wrapper for the M2M 100 model from Facebook, that is able to translate between any pair of 100 languages.

Supported languages: af, am, ar, ast, az, ba, be, bg, bn, br, bs, ca, ceb, cs, cy, da, de, el, en, es, et, fa, ff, fi, fr, fy, ga, gd, gl, gu, ha, he, hi, hr, ht, hu, hy, id, ig, ilo, is, it, ja, jv, ka, kk, km, kn, ko, lb, lg, ln, lo, lt, lv, mg, mk, ml, mn, mr, ms, my, ne, nl, no, ns, oc, or, pa, pl, ps, pt, ro, ru, sd, si, sk, sl, so, sq, sr, ss, su, sv, sw, ta, th, tl, tn, tr, uk, ur, uz, vi, wo, xh, yi, yo, zh, zu

As the moment, we provide wrapper for two M2M 100 models:

  • m2m_100_418M: M2M model with 418 million parameters (0.9 GB)
  • m2m_100_1.2B: M2M model with 1.2 billion parameters (2.4 GB)

Usage:

from easynmt import EasyNMT
model = EasyNMT('m2m_100_418M')   #or: EasyNMT('m2m_100_1.2B') 

You can find more information here. Note: the 12 billion M2M parameters model is currently not supported.

As soon as you call EasyNMT('m2m_100_418M') / EasyNMT('m2m_100_1.2B'), the respective model is downloaded and cached locally.

Author

Contact person: Nils Reimers; [email protected]

https://www.ukp.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software to encourage future research.

Comments
  • Missing supported translate pair with M2M_100 model

    Missing supported translate pair with M2M_100 model

    Hi, I found that M2M_100 support translate directly between any pair in 100 languages (9900 pairs). But when I use EasyNMT with M2M_100 model, it doesn't support all of these pairs.

    Example: EasyNMT can't translate directly from 'th' (Thai) to 'en' (English) while M2M_100 model does support this pair.

    And when I tried to use HuggingFace to translate directly between Thai and English, it work perfectly.

    Can you please solve the problem? By the way, thank you for creating EasyNMT.

    opened by nguyenhuuthuat09 12
  • Can't access other models in docker image

    Can't access other models in docker image

    Hi,

    I'm sorry for this noobish question/issue and maybe it is easy to resolve (I'm not experienced with docker). I've built a web app which uses easyNMT in the back via the docker images and REST. When translating from romanian to german I noticed that the docker image is only using the opus model which does not provide this language direction. But when executing the "/model_name" request it shows me only "opus" as part of the docker image.

    So how can I get the other models? I have 3 docker images of easynmt (one with 7.7gb, one with 6.02 and one with 3.8 gb size) but it seems none of them contains the other models. Am I doing something wrong here? And also when they are part of the image, is there some kind of auto selection if a language is not available in one of the packages?

    I installed the docker images via the "build-docker-hub.sh" file.

    Best regards, André

    opened by 4quen 5
  • Library not translating, just returning input

    Library not translating, just returning input

    Hello I am running the following code

    from easynmt import EasyNMT
    
    model = EasyNMT('opus-mt')
    
    print(model.translate("停", target_lang='en'))
    

    The result of the code is just "停", which is the exact same thing as the input. How can i fix this?

    opened by geekjr 5
  • Can this project support num-beams in opus-mt model ?

    Can this project support num-beams in opus-mt model ?

    I find similar project called ktrain support this. located in https://github.com/amaiya/ktrain/blob/5c9c6b333115be44433639c4bc4c091bd79ab65c/ktrain/text/translation/core.py and have some accuracy measurement output to summarize the conclusion will more interesting. Can multilingual sentence embedding can do some help ?

    opened by svjack 5
  • AttributeError: 'float' object has no attribute 'split'

    AttributeError: 'float' object has no attribute 'split'

    Hi Team, I have a question. I am trying to translate a column which has blanks in between. I am using EasyMT and its giving an error. won't it work if there is a blanks or missing in between the rows of a column?

    Thanks Srinivas

    opened by sriprad 4
  • Sending large documents for translation with GET endpoint can sometimes result in URL parser error

    Sending large documents for translation with GET endpoint can sometimes result in URL parser error

    With large documents (several thousand characters), I see a crash in the URL parser. Tried with different source and target languages. Also, for the same source text, the translation may sometimes succeed, but then fail again. For example, for the sample request URL given below, the URL parser may sometimes succeed and sometimes fail with exception.

    I'm using the last 1.x commit of EasyNMT (specifically commit 61fcf7154f01f56c02be6d30b1c5d0921b91aa2e) as it has better benchmarks than 2.x for fairseq models, but I believe the same issue should be there for the latest version too as I don't think the URL parser would have changed. I'm using the m2m_100_418M model with a T4 GPU if that matters at all.

    EasyNMT error logs:

    [2021-05-25 14:58:29 +0000] [16] [WARNING] Invalid HTTP request received.
    Traceback (most recent call last):
      File "httptools/parser/parser.pyx", line 245, in httptools.parser.parser.cb_on_url
      File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 216, in on_url
        parsed_url = httptools.parse_url(url)
      File "httptools/parser/parser.pyx", line 468, in httptools.parser.parser.parse_url
    httptools.parser.errors.HttpParserInvalidURLError: invalid url b'ividual+level+we+need+to+take+up+responsibility+to+curb+the+spread+of+this+virus%2C%22+said+Dr+Rao%0D%0A%0D%0APromotedListen+to+the+latest+songs%2C+only+on+JioSaavn.com%0D%0A%0D%0AWhen+it+comes+to+Bengaluru%2C+Dr+Rao+said+the+lockdown+had+reduced+the+number+of+emergency+oxygen+requirements+and+the+panic.+%22That+is+because+the+virus+has+stopped+moving+because+we+have+stopped+moving%2C%22+he+said.+%22Generally%2C+as+a+rule%2C+a+health+care+system+will+not+be+able+to+cope+with+a+sudden+rise+in+numbers%2C+emergency+oxygen+requirements+or+health+care.+The+other+big+concern+is+trained+manpower.%22%0D%0A%0D%0AComments%0D%0AMucormycosis%2C+commonly+known+as+Black+Fungus%2C+is+also+on+the+rise+in+the+state.+Dr+Rao+said%3A+%22At+HCG+we+are+treating+30+cases+and+the+number+is+on+the+rise.+In+Karnataka%2C+currently%2C+it+must+be+about+700+cases.+It+looks+like+an+epidemic+within+a+pandemic+at+this+juncture.+We+need+to+understand+the+source+of+this+infection%2C+have+early+detection+and+treatment.+A+committee+will+give+a+clear+strategy+for+the+state.+We+don%27t+need+to+scare+people+about+black+fungus%2C+we+need+to+create+awareness.+What+we+have+seen+in+the+patients+-+they+have+all+been+Covid+positive%2C+most+have+been+given+steroids%2C+majority+had+high+sugar.+30+to+40+per+cent+had+been+given+oxygen+and+most+important+-+none+of+them+had+been+vaccinated.%22%0D%0A%0D%0A'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 167, in data_received
        self.parser.feed_data(data)
      File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data
    httptools.parser.errors.HttpParserCallbackError: the on_url callback failed
    [2021-05-25 14:58:29 +0000] [18] [WARNING] Invalid HTTP request received.
    Traceback (most recent call last):
      File "httptools/parser/parser.pyx", line 245, in httptools.parser.parser.cb_on_url
      File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 216, in on_url
        parsed_url = httptools.parse_url(url)
      File "httptools/parser/parser.pyx", line 468, in httptools.parser.parser.parse_url
    httptools.parser.errors.HttpParserInvalidURLError: invalid url b'irus+is+now+being+reported+more+from+rural+Karnataka+with+often+a+weak+health+infrastructure.%0D%0A%0D%0ADr+Vishal+Rao+of+the+HCG+hospitals+and+a+member+of+the+Karnataka+Covid+task+force+said%2C+%22It+is+going+to+be+an+uphill+task+as+we+move+towards+the+districts+as+the+health+care+systems+get+overburdened+there.+Even+the+oxygen+management.+In+cities%2C+we+have+the+privilege+that+oxygen+comes+to+the+doorstep+of+the+hospital.+Whereas+in+villages+and+districts%2C+hospitals+have+to+carry+their+cylinders+to+refill+them.+Public+health+experts+and+virologists+are+repeatedly+trying+to+enhance+the+surveillance+in+villages+to+ensure+we+are+better+prepared+in+villages.+This+is+the+time+to+ramp+up+the+preparation+for+villages.%22%0D%0A%0D%0AHe+also+said+that+the+lockdown+%22definitely+had+a+very+significant+impact%22+on+the+daily+infections.+%22From+50%2C000+cases+everyday%2C+today+we+are+at+around+20%2C000+odd+cases.It+is+not+a+reassurance+that+once+the+lockdown+is+lifted%2C+we+will+continue+to+have+these+low+numbers.+But+what+is+of+concern+is+that+the+positivity+rate+still+sticks+at+around+20+per+cent+and+the+mortality+has+jumped+to+about+2+per+cent.+We+need+to+understand+that+when+the+waves+flatten%2C+it+is+not+that+the+virus+is+taking+rest.+It+is+a+socio-economic+virus+and+the+more+we+improve+interactions+without+safety%2C+we+are+going+to+explode+and+expand+the+spread+of+this+virus.+At+an+ind'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 167, in data_received
        self.parser.feed_data(data)
      File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data
    httptools.parser.errors.HttpParserCallbackError: the on_url callback failed
    

    Full request URL path + query params:

    /translate?beam_size=2&source_lang=en&target_lang=de&text=Bengaluru%3A+Karnataka+-+one+of+the+worst+hit+states+in+the+country+in+the+second+wave+of+COVID-19%2C+has+been+witnessing+a+slump+in+the+new+case+numbers+over+the+last+few+weeks.+The+authorities%2C+however%2C+are+of+the+view+that+It+is+far+too+early+to+relax.%0D%0AThe+number+of+COVID-19+outnumbered+fresh+infections+in+Karnataka+yet+again+on+Tuesday%2C+as+the+state+reported+38%2C224+discharges+and+22%2C758+new+cases.+Of+the+new+cases+reported+today%2C+6%2C243+were+from+Bengaluru.%0D%0A%0D%0A%22If+you+look+at+the+numbers%2C+it+has+been+reducing+very+drastically.+Except+for+a+few+districts+where+the+numbers+are+not+coming+down.+In+most+of+the+districts+and+Bengaluru%2C+the+numbers+have+come+down.+The+number+should+come+down+drastically+so+that+we+can+unlock+from+the+lockdown%2C%22+Deputy+Chief+Minister+Dr+Ashwath+Narayan+told+NDTV.%0D%0A%0D%0AThe+state+is+in+the+middle+of+a+strict+shutdown.+But+that+doesn%27t+mean+any+major+reduction+in+the+demand+for+oxygen+in+Bengaluru+as+ICU+beds+still+remain+full.%0D%0A%0D%0A%22Since+the+number+has+come+down+very+drastically+-+now+it+is+5%2C000+odd+cases+in+Bengaluru+%28daily+infections%29+-+from+when+it+had+almost+reached+25%2C000%2C+it+is+a+great+relief.+When+it+comes+to+ICU+or+ventilator%2C+however%2C+there+is+still+a+lot+of+demand%2C%22+he+said.%0D%0A%0D%0AAn+extra+concern+is+that+the+virus+is+now+being+reported+more+from+rural+Karnataka+with+often+a+weak+health+infrastructure.%0D%0A%0D%0ADr+Vishal+Rao+of+the+HCG+hospitals+and+a+member+of+the+Karnataka+Covid+task+force+said%2C+%22It+is+going+to+be+an+uphill+task+as+we+move+towards+the+districts+as+the+health+care+systems+get+overburdened+there.+Even+the+oxygen+management.+In+cities%2C+we+have+the+privilege+that+oxygen+comes+to+the+doorstep+of+the+hospital.+Whereas+in+villages+and+districts%2C+hospitals+have+to+carry+their+cylinders+to+refill+them.+Public+health+experts+and+virologists+are+repeatedly+trying+to+enhance+the+surveillance+in+villages+to+ensure+we+are+better+prepared+in+villages.+This+is+the+time+to+ramp+up+the+preparation+for+villages.%22%0D%0A%0D%0AHe+also+said+that+the+lockdown+%22definitely+had+a+very+significant+impact%22+on+the+daily+infections.+%22From+50%2C000+cases+everyday%2C+today+we+are+at+around+20%2C000+odd+cases.It+is+not+a+reassurance+that+once+the+lockdown+is+lifted%2C+we+will+continue+to+have+these+low+numbers.+But+what+is+of+concern+is+that+the+positivity+rate+still+sticks+at+around+20+per+cent+and+the+mortality+has+jumped+to+about+2+per+cent.+We+need+to+understand+that+when+the+waves+flatten%2C+it+is+not+that+the+virus+is+taking+rest.+It+is+a+socio-economic+virus+and+the+more+we+improve+interactions+without+safety%2C+we+are+going+to+explode+and+expand+the+spread+of+this+virus.+At+an+individual+level+we+need+to+take+up+responsibility+to+curb+the+spread+of+this+virus%2C%22+said+Dr+Rao%0D%0A%0D%0APromotedListen+to+the+latest+songs%2C+only+on+JioSaavn.com%0D%0A%0D%0AWhen+it+comes+to+Bengaluru%2C+Dr+Rao+said+the+lockdown+had+reduced+the+number+of+emergency+oxygen+requirements+and+the+panic.+%22That+is+because+the+virus+has+stopped+moving+because+we+have+stopped+moving%2C%22+he+said.+%22Generally%2C+as+a+rule%2C+a+health+care+system+will+not+be+able+to+cope+with+a+sudden+rise+in+numbers%2C+emergency+oxygen+requirements+or+health+care.+The+other+big+concern+is+trained+manpower.%22%0D%0A%0D%0AComments%0D%0AMucormycosis%2C+commonly+known+as+Black+Fungus%2C+is+also+on+the+rise+in+the+state.+Dr+Rao+said%3A+%22At+HCG+we+are+treating+30+cases+and+the+number+is+on+the+rise.+In+Karnataka%2C+currently%2C+it+must+be+about+700+cases.+It+looks+like+an+epidemic+within+a+pandemic+at+this+juncture.+We+need+to+understand+the+source+of+this+infection%2C+have+early+detection+and+treatment.+A+committee+will+give+a+clear+strategy+for+the+state.+We+don%27t+need+to+scare+people+about+black+fungus%2C+we+need+to+create+awareness.+What+we+have+seen+in+the+patients+-+they+have+all+been+Covid+positive%2C+most+have+been+given+steroids%2C+majority+had+high+sugar.+30+to+40+per+cent+had+been+given+oxygen+and+most+important+-+none+of+them+had+been+vaccinated.%22%0D%0A%0D%0ADelhi+received+144.8+mm+rainfall+in+May+this+year%2C+the+highest+for+the+month+in+13+years%2C+according+to+the+India+Meteorological+Department+%28IMD%29.%0D%0A%22No+rain+is+predicted+in+the+next+four+to+five+days.+So%2C+this+is+the+highest+rainfall+in+May+since+2008%2C%22+Kuldeep+Srivastava%2C+the+head+of+the+IMD%27s+regional+forecasting+centre%2C+said+today.%0D%0A%0D%0AThe+Safdarjung+Observatory%2C+considered+the+official+marker+for+the+city%2C+had+recorded+21.1+mm+rainfall+last+year%2C+26.9+mm+in+2019+and+24.2+mm+in+2018.%0D%0A%0D%0AIt+had+gauged+40.5+mm+precipitation+in+2017%3B+24.3+mm+in+2016%3B+3.1+mm+in+2015+and+100.2+mm+in+2014%2C+according+to+IMD+data.%0D%0A%0D%0A
    
    opened by AgrimPrasad 3
  • Docker image easynmt/api:2.0-cpu crashes when trying to run on mac

    Docker image easynmt/api:2.0-cpu crashes when trying to run on mac

    Running this on a 2017 Macbook. Docker image easynmt/api:2.0-cpu fails to start with exceptions, while easynmt/api:1.1-cpu was running fine with the same docker run command previously.

    docker run -p 24081:80 -v /Users/agrim/Downloads/easynmt-models:/cache easynmt/api:2.0-cpu
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
        self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
    KeyError: 'model_args'
    Checking for script in /app/prestart.sh
    There is no script /app/prestart.sh
    [2021-04-27 14:38:22 +0000] [13] [INFO] Starting gunicorn 20.1.0
    [2021-04-27 14:38:22 +0000] [12] [INFO] Starting gunicorn 20.1.0
    [2021-04-27 14:38:22 +0000] [12] [INFO] Listening at: http://0.0.0.0:8080 (12)
    [2021-04-27 14:38:22 +0000] [13] [INFO] Listening at: http://0.0.0.0:80 (13)
    [2021-04-27 14:38:22 +0000] [13] [INFO] Using worker: uvicorn.workers.UvicornWorker
    [2021-04-27 14:38:22 +0000] [12] [INFO] Using worker: uvicorn.workers.UvicornWorker
    [2021-04-27 14:38:22 +0000] [17] [INFO] Booting worker with pid: 17
    [2021-04-27 14:38:22 +0000] [18] [INFO] Booting worker with pid: 18
    [2021-04-27 14:38:22 +0000] [19] [INFO] Booting worker with pid: 19
    [2021-04-27 14:38:24 +0000] [19] [INFO] Started server process [19]
    [2021-04-27 14:38:24 +0000] [17] [INFO] Started server process [17]
    [2021-04-27 14:38:24 +0000] [17] [INFO] Waiting for application startup.
    [2021-04-27 14:38:24 +0000] [19] [INFO] Waiting for application startup.
    [2021-04-27 14:38:24 +0000] [17] [INFO] Application startup complete.
    [2021-04-27 14:38:24 +0000] [19] [INFO] Application startup complete.
    {"loglevel": "info", "workers": "1", "bind": "0.0.0.0:8080", "graceful_timeout": 120, "timeout": 120, "keepalive": 5, "errorlog": "-", "accesslog": "-", "host": "0.0.0.0", "port": "8080"}
    Booted as backend: True
    Load model: opus-mt
    [2021-04-27 14:38:25 +0000] [18] [ERROR] Exception in worker process
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
        worker.init_process()
      File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 63, in init_process
        super(UvicornWorker, self).init_process()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
        self.load_wsgi()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
        self.wsgi = self.app.wsgi()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
        self.callable = self.load()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
        return self.load_wsgiapp()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
        return util.import_app(self.app_uri)
      File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
        mod = importlib.import_module(module)
      File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 783, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/app/main.py", line 36, in <module>
        model = EasyNMT(model_name, load_translator=IS_BACKEND, **model_args)
      File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
        self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
    KeyError: 'model_args'
    [2021-04-27 14:38:25 +0000] [18] [INFO] Worker exiting (pid: 18)
    [2021-04-27 14:38:25 +0000] [12] [INFO] Shutting down: Master
    [2021-04-27 14:38:25 +0000] [12] [INFO] Reason: Worker failed to boot.
    {"loglevel": "info", "workers": "1", "bind": "0.0.0.0:8080", "graceful_timeout": 120, "timeout": 120, "keepalive": 5, "errorlog": "-", "accesslog": "-", "host": "0.0.0.0", "port": "8080"}
    One of the processes has already exited.
    
    opened by AgrimPrasad 3
  • No module named 'easynmt.models.OpusMT' in PyCharm

    No module named 'easynmt.models.OpusMT' in PyCharm

    Hello, I'm running this simple code in pycharm `from easynmt import EasyNMT model = EasyNMT("opus-mt")

    print(model.translate("Hi", target_lang="fr"))`

    and it gives me this error

    Traceback (most recent call last): File "H:/Documents/Python/Random.py", line 2, in model = EasyNMT("opus-mt") File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\site-packages\easynmt\EasyNMT.py", line 69, in init module_class = import_from_string(self.config['model_class']) File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\site-packages\easynmt\util.py", line 56, in import_from_string module = importlib.import_module(module_path) File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\importlib_init_.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'easynmt.models.OpusMT'

    I had the same error addressed here when installing easynmt, and followed the steps, nothing happened... how do I fix this?

    opened by Lorddickenstein 3
  • OSError after a few translations

    OSError after a few translations

    Hi and thanks for the cool library!

    I want to include the translation function in one of my data pipelines that loops over thousands of text snippets. Without the GPU support and on Windows I was following the instructions in the other issue and successfully added the function.

    from easynmt import EasyNMT
    model = EasyNMT('opus-mt')
    

    and I translate with:

    language = detect_langs(text)
    for each_lang in language:
       if (each_lang.lang != "en"):
          translated_text = model.translate(text, target_lang='en')
    

    whereas text is a string. However, after a few translations (2-3) I always run into this error:

    OSError: Can't load tokenizer for 'Helsinki-NLP/opus-mt-ia-en'. Make sure that:
    - 'Helsinki-NLP/opus-mt-ia-en' is a correct model identifier listed on 'https://huggingface.co/models'
    

    Any idea what the problem could be?

    opened by jonas-nothnagel 3
  • MBart50Converter requires the protobuf library but it was not found in your environment.

    MBart50Converter requires the protobuf library but it was not found in your environment.

    Try to use docker image with model mbart50_m2m Command: docker run --env EASYNMT_MODEL=mbart50_m2m --env TIMEOUT=600 --env MAX_WORKERS_FRONTEND=1 -p 24080:80 easynmt/api:2.0-cpu And Its exited with trace:

    [2022-05-26 12:48:19 +0000] [36] [ERROR] Exception in worker process
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
        worker.init_process()
      File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 63, in init_process
        super(UvicornWorker, self).init_process()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
        self.load_wsgi()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
        self.wsgi = self.app.wsgi()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
        self.callable = self.load()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
        return self.load_wsgiapp()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
        return util.import_app(self.app_uri)
      File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
        mod = importlib.import_module(module)
      File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 783, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/app/main.py", line 36, in <module>
        model = EasyNMT(model_name, load_translator=IS_BACKEND, **model_args)
      File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
        self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
      File "/usr/local/lib/python3.8/site-packages/easynmt/models/AutoModel.py", line 32, in __init__
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, **self.tokenizer_args)
      File "/usr/local/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 407, in from_pretrained
        return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
      File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1709, in from_pretrained
        return cls._from_pretrained(
      File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1781, in _from_pretrained
        tokenizer = cls(*init_inputs, **init_kwargs)
      File "/usr/local/lib/python3.8/site-packages/transformers/models/mbart/tokenization_mbart50_fast.py", line 128, in __init__
        super().__init__(
      File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 99, in __init__
        fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
      File "/usr/local/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 708, in convert_slow_tokenizer
        return converter_class(transformer_tokenizer).converted()
      File "/usr/local/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 301, in __init__
        requires_protobuf(self)
      File "/usr/local/lib/python3.8/site-packages/transformers/file_utils.py", line 574, in requires_protobuf
        raise ImportError(PROTOBUF_IMPORT_ERROR.format(name))
    ImportError: 
    MBart50Converter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
    installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
    that match your environment.
    

    Does anybody works on docker images and can fix it?

    opened by TheMY3 2
  • Issues installing sentencepiece and fasttext dependencies on Windows and Mac

    Issues installing sentencepiece and fasttext dependencies on Windows and Mac

    Trying to install EasyNMT 2.0.1 on Windows 10 and normal Python 3.10.0 installation (not Anaconda)

    A colleague said he had the same issue on his Mac.

    Building wheels for collected packages: fasttext, sentencepiece
      Building wheel for fasttext (setup.py) ... error
      error: subprocess-exited-with-error
    
      × python setup.py bdist_wheel did not run successfully.
      │ exit code: 1
      ╰─> [52 lines of output]
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
            warnings.warn(
          running bdist_wheel
          running build
          running build_py
          creating build
          creating build\lib.win-amd64-3.10
          creating build\lib.win-amd64-3.10\fasttext
          copying python\fasttext_module\fasttext\FastText.py -> build\lib.win-amd64-3.10\fasttext
          copying python\fasttext_module\fasttext\__init__.py -> build\lib.win-amd64-3.10\fasttext
          creating build\lib.win-amd64-3.10\fasttext\util
          copying python\fasttext_module\fasttext\util\util.py -> build\lib.win-amd64-3.10\fasttext\util
          copying python\fasttext_module\fasttext\util\__init__.py -> build\lib.win-amd64-3.10\fasttext\util
          creating build\lib.win-amd64-3.10\fasttext\tests
          copying python\fasttext_module\fasttext\tests\test_configurations.py -> build\lib.win-amd64-3.10\fasttext\tests
          copying python\fasttext_module\fasttext\tests\test_script.py -> build\lib.win-amd64-3.10\fasttext\tests
          copying python\fasttext_module\fasttext\tests\__init__.py -> build\lib.win-amd64-3.10\fasttext\tests
          running build_ext
          building 'fasttext_pybind' extension
          creating build\temp.win-amd64-3.10
          creating build\temp.win-amd64-3.10\Release
          creating build\temp.win-amd64-3.10\Release\python
          creating build\temp.win-amd64-3.10\Release\python\fasttext_module
          creating build\temp.win-amd64-3.10\Release\python\fasttext_module\fasttext
          creating build\temp.win-amd64-3.10\Release\python\fasttext_module\fasttext\pybind
          creating build\temp.win-amd64-3.10\Release\src
          "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include -Isrc -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tppython/fasttext_module/fasttext/pybind/fasttext_pybind.cc /Fobuild\temp.win-amd64-3.10\Release\python/fasttext_module/fasttext/pybind/fasttext_pybind.obj /EHsc /DVERSION_INFO=\\\"0.9.2\\\"
          fasttext_pybind.cc
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2065: 'ssize_t': undeclared identifier
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2672: 'pybind11::init': no matching overloaded function found
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'CFunc', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1702): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'Func', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1697): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'Args', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1690): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2672: 'pybind11::class_<fasttext::Vector>::def': no matching overloaded function found
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2780: 'pybind11::class_<fasttext::Vector> &pybind11::class_<fasttext::Vector>::def(const char *,Func &&,const Extra &...)': expects 3 arguments - 1 provided
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1416): note: see declaration of 'pybind11::class_<fasttext::Vector>::def'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2065: 'ssize_t': undeclared identifier
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2065: 'ssize_t': undeclared identifier
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2672: 'pybind11::init': no matching overloaded function found
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'CFunc', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1702): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'Func', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1697): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'Args', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1690): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2672: 'pybind11::class_<fasttext::DenseMatrix>::def': no matching overloaded function found
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2780: 'pybind11::class_<fasttext::DenseMatrix> &pybind11::class_<fasttext::DenseMatrix>::def(const char *,Func &&,const Extra &...)': expects 3 arguments - 1 provided
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1416): note: see declaration of 'pybind11::class_<fasttext::DenseMatrix>::def'
          error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for fasttext
      Running setup.py clean for fasttext
      Building wheel for sentencepiece (setup.py) ... error
      error: subprocess-exited-with-error
    
      × python setup.py bdist_wheel did not run successfully.
      │ exit code: 1
      ╰─> [22 lines of output]
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
            warnings.warn(
          running bdist_wheel
          running build
          running build_py
          creating build
          creating build\lib.win-amd64-3.10
          creating build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/__init__.py -> build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/sentencepiece_model_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/sentencepiece_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
          running build_ext
          building 'sentencepiece._sentencepiece' extension
          creating build\temp.win-amd64-3.10
          creating build\temp.win-amd64-3.10\Release
          creating build\temp.win-amd64-3.10\Release\src
          creating build\temp.win-amd64-3.10\Release\src\sentencepiece
          "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpsrc/sentencepiece/sentencepiece_wrap.cxx /Fobuild\temp.win-amd64-3.10\Release\src/sentencepiece/sentencepiece_wrap.obj /MT /I..\build\root\include
          cl : Command line warning D9025 : overriding '/MD' with '/MT'
          sentencepiece_wrap.cxx
          src/sentencepiece/sentencepiece_wrap.cxx(2809): fatal error C1083: Cannot open include file: 'sentencepiece_processor.h': No such file or directory
          error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for sentencepiece
      Running setup.py clean for sentencepiece
    Failed to build fasttext sentencepiece
    Installing collected packages: sentencepiece, fasttext, EasyNMT
      Running setup.py install for sentencepiece ... error
      error: subprocess-exited-with-error
    
      × Running setup.py install for sentencepiece did not run successfully.
      │ exit code: 1
      ╰─> [24 lines of output]
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
            warnings.warn(
          running install
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
            warnings.warn(
          running build
          running build_py
          creating build
          creating build\lib.win-amd64-3.10
          creating build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/__init__.py -> build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/sentencepiece_model_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/sentencepiece_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
          running build_ext
          building 'sentencepiece._sentencepiece' extension
          creating build\temp.win-amd64-3.10
          creating build\temp.win-amd64-3.10\Release
          creating build\temp.win-amd64-3.10\Release\src
          creating build\temp.win-amd64-3.10\Release\src\sentencepiece
          "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpsrc/sentencepiece/sentencepiece_wrap.cxx /Fobuild\temp.win-amd64-3.10\Release\src/sentencepiece/sentencepiece_wrap.obj /MT /I..\build\root\include
          cl : Command line warning D9025 : overriding '/MD' with '/MT'
          sentencepiece_wrap.cxx
          src/sentencepiece/sentencepiece_wrap.cxx(2809): fatal error C1083: Cannot open include file: 'sentencepiece_processor.h': No such file or directory
          error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: legacy-install-failure
    
    × Encountered error while trying to install package.
    ╰─> sentencepiece
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for output from the failure.
    
    opened by ZackPlauche 2
  • Is there randomness in translation or does every translation lead to the exact same output?

    Is there randomness in translation or does every translation lead to the exact same output?

    Thanks a lot for creating this great package! Question: will every translation with equivalent input always lead to the exact same output, or is there some randomness involved (e.g. through beamsearch), which requires setting a seed for full reproducibility? I've discussed this with colleagues and there seem to be some beamsearch algorithms that are stochastic (i.e. introduce randomness) and others do not. Which one is used here? If a stochastic algorithm is used, how would be set a seed to ensure reproducibility?

    opened by MoritzLaurer 0
  • EasyNMT

    EasyNMT

    Hi, Thank you for this useful library. I tried to install in my machine and it gave me this error. Any help please?

    print(model.translate('This is a sentence we want to translate to German', target_lang='de')) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 154, in translate raise e File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 149, in translate translated = self.translate(**method_args) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 181, in translate translated_sentences = self.translate_sentences(splitted_sentences, target_lang=target_lang, source_lang=source_lang, show_progress_bar=show_progress_bar, beam_size=beam_size, batch_size=batch_size, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 278, in translate_sentences output.extend(self.translator.translate_sentences(sentences_sorted[start_idx:start_idx+batch_size], source_lang=source_lang, target_lang=target_lang, beam_size=beam_size, device=self.device, **kwargs)) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/models/OpusMT.py", line 49, in translate_sentences translated = model.generate(**inputs, num_beams=beam_size, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/transformers/generation_utils.py", line 1182, in generate model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation( File "/home/karima/.local/lib/python3.8/site-packages/transformers/generation_utils.py", line 525, in _prepare_encoder_decoder_kwargs_for_generation model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/transformers/models/marian/modeling_marian.py", line 749, in forward inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by abidikarima 0
  • How to run test_translation_speed.py

    How to run test_translation_speed.py

    easynmt docker install is working fine through http requests. Now i'd like to run some benchmark. How do you run /examples/test_translation_speed.py ?

    opened by JohnWinner 2
  • Workflow for large datasets

    Workflow for large datasets

    Hi! I was wondering if there is a workflow for large datasets available. I am trying to translate a big amount of tweets using Pandas and Python.

    Best, Daniel

    opened by viajerus 0
  • Enable manually specifying the desired OPUS model?

    Enable manually specifying the desired OPUS model?

    I really like the library, great work! Is there a way to manually specify a specific OPUS model? For example EasyNMT with OPUS currently does not support English as source and Portuguese as target language because it tries to download 'opus-mt-en-pt' by default, which does not exist. There is, however, an en2pt model on the hub now (https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-pt) with a slightly different name. I don't know how to tell EasyNMT to take this specific model instead of throwing the following error:

    OSError: Helsinki-NLP/opus-mt-en-pt is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

    opened by MoritzLaurer 2
Releases(v2.0.0)
  • v2.0.0(Apr 26, 2021)

    mbart50 & m2m models now use huggingface transformers

    The mbart50 & m2m models required in version 1 the fairseq library. This caused several issues: fairseq cannot be used on Windows, multi-processing did not work with fairseq models, loading and using the models were quite complicated.

    With this release, the fairseq dependency is removed and mbart50 / m2m models are loaded with huggingface transformers version >= 4.4.0

    From a user perspective, no changes should be visible. But from a developer perspective, this simplifies the architecture of EasyNMT and allows new futures more easily be integrated.

    Saving models

    Models can now be saved to disc by calling:

    model.save(output_path)
    

    Models can be loaded from disc by calling:

    model = EasyNMT(output_path)
    

    Loadings models from huggingface model hub

    Loading of any Huggingface Translation Model is now simple. Simply pass the name or the model path to the following code:

    from easynmt import EasyNMT, models
    article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
    pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate 
    sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""
    
    model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro')) 
    print(model.translate(article, source_lang='en_XX', target_lang='ro_RO'))
    

    This loads the facebook/mbart-large-en-ro model from the model hub.

    Note: Models might use different language codes, e.g. the mbart model uses 'en_XX' instead of 'en' and 'ro_RO' instead of 'ro'. To make the language code consistent, you can pass a lang_map:

    from easynmt import EasyNMT, models
    
    article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
    pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate 
    sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""
    
    output_path = 'output/mbart-large-en-ro'
    model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro', lang_map={'en': 'en_XX', 'ro': 'ro_RO'}))
    
    #Save the model to disc
    model.save(output_path)
    
    # Load the model from disc
    model = EasyNMT(output_path)
    print(model.translate(article,  target_lang='ro'))
    
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Mar 15, 2021)

    This release brings several improvements and is the first step towards the release of a Docker Image + REST API.

    Improvements:

    • Docker REST API: We have published Docker images for a REST API, that allows the easy usage of EasyNMT. Just run the Docker image and starts translating using REST API calls: more info
    • Google Colab REST API Hosting: We have published a colab notenbook that shows to to wrap EasyNMT in a REST API and host it on Google Colab with a free GPU. Useful if you need to translate large amounts.
    • Long sentences are translated first: Sentences are sorted before they are translated in order to waste minimal time with padding tokens. In the previous version, the shortest sentences were translated first and then later the longer sentences. Now the order is reversed. This has several advantages: If an OOM happens, it happens at the start of the translation process and not at the end. Also, the estimate from the progress bar is more accurate as the longest and slowest sentences are now translated first.
    • Improve language detection: Automatic language is still an issue, especially for mixed languages. Language detection is now performed on document level and not on sentence level. If you need sentence level lang. detection on sentence level you can set document_language_detection=False for the translate method. Also, text is now lower cased before the language is detected (the lang. detection scripts had issues with all upper case text
    • Max length parameter: When you create your model like this: model = EasyNMT(model_name, max_length=100), then all sentences with more than 100 word pieces will be truncated to at max 100 word pieces. This can prevent OOM with too long sentences.
    • Load model without translator: If you just want to use the language detection methods, you can now load your model like model = EasyNMT(model_name, load_translator=False). This will prevent the loading of the translation engine.

    Roadmap

    • As soon as Huggingface transformers v4.4.0 is released, the dependency on fairseq can be removed as the mBART50 and m2m models will be available in HF transformers. This will make the installation on a Windows machine possible
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 29, 2021)

    fastText is used for automatic language detection, as it provides the highest speed and best accuracy.

    However, it can be complicated to install it on Windows as it requires a C/C++ compiler.

    This release adds two alternative language identifiers:

    • [langid][(https://github.com/saffsd/langid.py) - Can be installed via pip install langid
    • langdetect - Can be installed via pip install langdetect

    If fastText is not available, langid / langdetect will be used as alternative language detection methods.

    For installation on Windows, you can run the following commands:

    pip install --no-deps easynmt
    pip install tqdm transformers numpy nltk sentencepiece langid 
    

    Further, you have to install pytorch as described here: https://pytorch.org/get-started/locally/

    If you want to install fastText on Windows, I can recommend this link: https://anaconda.org/conda-forge/fasttext

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 27, 2021)

    fastText language detection did not work well if the text was in UPPERCASE.

    Adding lower() to the string before the language identification step significantly improved the performance.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 27, 2021)

Owner
Ubiquitous Knowledge Processing Lab
Ubiquitous Knowledge Processing Lab
Journalism AI – Quotes extraction for modular journalism

Quote extraction for modular journalism (JournalismAI collab 2021)

Journalism AI collab 2021 207 Dec 25, 2022
Code for the Python code smells video on the ArjanCodes channel.

7 Python code smells This repository contains the code for the Python code smells video on the ArjanCodes channel (watch the video here). The example

55 Dec 29, 2022
Comprehensive-E2E-TTS - PyTorch Implementation

A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultima

Keon Lee 114 Nov 13, 2022
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
Simple Text-To-Speech Bot For Discord

Simple Text-To-Speech Bot For Discord This is a very simple TTS bot for discord made with python. For this bot you need FFMPEG, see installation to se

1 Sep 26, 2022
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
Simple and efficient RevNet-Library with DeepSpeed support

RevLib Simple and efficient RevNet-Library with DeepSpeed support Features Half the constant memory usage and faster than RevNet libraries Less memory

Lucas Nestler 112 Dec 05, 2022
Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models

PEGASUS library Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised

Google Research 1.4k Dec 22, 2022
Arabic speech recognition, classification and text-to-speech.

klaam Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows tr

ARBML 177 Dec 27, 2022
Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ProphetNet-X This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model call

Microsoft 394 Dec 17, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
Every Google, Azure & IBM text to speech voice for free

TTS-Grabber Quick thing i made about a year ago to download any text with any tts voice, over 630 voices to choose from currently. It will split the i

16 Dec 07, 2022
Tools for curating biomedical training data for large-scale language modeling

Tools for curating biomedical training data for large-scale language modeling

BigScience Workshop 242 Dec 25, 2022
Client library to download and publish models and other files on the huggingface.co hub

huggingface_hub Client library to download and publish models and other files on the huggingface.co hub Do you have an open source ML library? We're l

Hugging Face 644 Jan 01, 2023
SentAugment is a data augmentation technique for semi-supervised learning in NLP.

SentAugment SentAugment is a data augmentation technique for semi-supervised learning in NLP. It uses state-of-the-art sentence embeddings to structur

Meta Research 363 Dec 30, 2022
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

VILLA: Vision-and-Language Adversarial Training This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports

Zhe Gan 109 Dec 31, 2022
Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is impleme

Titipat Achakulvisut 66 Jul 05, 2022
Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Realistic Few-Shot Relation Extraction This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extrac

Bloomberg 8 Nov 09, 2022
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
Precision Medicine Knowledge Graph (PrimeKG)

PrimeKG Website | bioRxiv Paper | Harvard Dataverse Precision Medicine Knowledge Graph (PrimeKG) presents a holistic view of diseases. PrimeKG integra

Machine Learning for Medicine and Science @ Harvard 103 Dec 10, 2022