This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Last update: Dec 04, 2022

Related tags

Text Data & NLP proteno

Overview

Proteno

This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems (https://arxiv.org/abs/2104.07777)

Security

See CONTRIBUTING for more information.

License

This project is released under CC-BY-NC-4.0 and other licenses:

English: CC-BY-SA
Spanish: CC-BY-SA
Tamil: CC-BY-NC-SA

Citation

If you use our data, please cite the following paper:

@inproceedings{tyagi-etal-2021-proteno,
    title = "Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems",
    author = "Tyagi, Shubhi  and
      Bonafonte, Antonio  and
      Lorenzo-Trueba, Jaime  and
      Latorre, Javier",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-industry.10",
    pages = "72--79",
}

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Related tags

Overview

Proteno

Security

License

Citation

Owner

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

Use fastai-v2 with HuggingFace's pretrained transformers

A sentence aligner for comparable corpora

BERN2: an advanced neural biomedical namedentity recognition and normalization tool

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.

Edge-Augmented Graph Transformer

Active learning for text classification in Python

A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.

Code for hyperboloid embeddings for knowledge graph entities

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

AllenNLP integration for Shiba: Japanese CANINE model

💫 Industrial-strength Natural Language Processing (NLP) in Python

Sample data associated with the Aurora-BP study

Fidibo.com comments Sentiment Analyser

A Structured Self-attentive Sentence Embedding

2021搜狐校园文本匹配算法大赛baseline

Blazing fast language detection using fastText model

LewusBot - Twitch ChatBot built in python with twitchio library

NLP topic mdel LDA - Gathered from New York Times website