Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Last update: Oct 07, 2021

Overview

BLEU Score

Implementation for paper:

BLEU: a Method for Automatic Evaluation of Machine Translation

Author: Ba Ngoc from ProtonX

BLEU score is a popular metric to evaluate machine translation. Check out the recent Transformer project we published.

I. Usage

from bleu_score import cal_corpus_bleu_score

candidates = ['eating chicken chicken is a eating a eating chicken',
              'eating chicken chicken is not good']
references_list = [['a chicken is eating chicken', 'there is a chicken eating chicken'], [
    'a chicken is eating chicken', 'there is a chicken eating chicken']]

bleu_score = cal_corpus_bleu_score(candidates, references_list,
                      weights=(0.25, 0.25, 0.25, 0.25), N=4)

print('Bleu Score: {}'.format(bleu_score))

II. BLEU Score Formula

1. Precision

We count specific n-grams in the candidates and the number of those grams in the references. Then we calculate the proportion of two countings and get the precision.

Important to note: Count clip means that the number of typical n-grams can not exceed the maximum number of that n-grams in any single reference.

For example: if ('a', 'a') gram exists 3 times in a candidate. However, the maximum number of this gram in any single reference is 2. So we will use value 2 for calculation.

If you never heard about grams? It means that we count the number of continuous substrings with a pre-set length in a string.

Candidate 1: 'eating chicken chicken is a eating a eating chicken'

-------Unigram------


eating	3
chicken	3
is	1
a	2

-------bigrams------


eating chicken	2
chicken chicken	1
chicken is	1
is a	1
a eating	2
eating a	1

We can do the same thing with trigrams and 4-grams

2. Sentence brevity penalty

We prefer the reference with a length that is closest to the candidate's.

Checkout function get_eff_ref_length in utils.py.

c: the total lengths of all candidates

r: the total lengths of all effective reference lengths

3. BLEU Formula

N: the number of grams

w: list of pre-set weight for each gram

Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Related tags

Overview

BLEU Score

1. Precision

2. Sentence brevity penalty

3. BLEU Formula

Owner

Ngoc Nguyen Ba

Natural Language Processing Specialization

This is a really simple text-to-speech app made with python and tkinter.

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine

Espial is an engine for automated organization and discovery of personal knowledge

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

PortaSpeech - PyTorch Implementation

GooAQ 🥑 : Google Answers to Google Questions!

Implementation of paper Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

A curated list of FOSS tools to improve the Hacker News experience

Programme de chiffrement et de déchiffrement inverse d'un message en python3.

The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

A combination of autoregressors and autoencoders using XLNet for sentiment analysis

Code for using and evaluating SpanBERT.

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

A library that integrates huggingface transformers with the world of fastai, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models.

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

To be a next-generation DL-based phenotype prediction from genome mutations.