Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Last update: Jan 05, 2023

Related tags

Text Data & NLP malaya-speech

Overview

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya-speech

GPU version

$ pip install malaya-speech[gpu]

Only Python 3.6.0 and above and Tensorflow 1.15.0 and above are supported.

We recommend to use virtualenv for development. All examples tested on Tensorflow version 1.15.4, 1.15.5, 2.4.1 and 2.5.

Features

Age Detection, detect age in speech using Finetuned Speaker Vector.
Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.
Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.
Force Alignment, generate a time-aligned transcription of an audio file using RNNT.
Gender Detection, detect genders in speech using Finetuned Speaker Vector.
Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.
Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.
Noise Reduction, reduce multilevel noises using STFT UNET.
Speaker Change, detect changing speakers using Finetuned Speaker Vector.
Speaker overlap, detect overlap speakers using Finetuned Speaker Vector.
Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.
Speech Enhancement, enhance voice activities using Waveform UNET.
SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.
Speech-to-Text, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish and Mandarin) and Singlish using RNNT and Wav2Vec2 CTC.
Super Resolution, Super Resolution 4x for Waveform.
Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2 and FastPitch.
Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.
Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.
Voice Conversion, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.
Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

Wave UNET, Multi-Scale Neural Network for End-to-End Audio Source Separation, https://arxiv.org/abs/1806.03185
Wave ResNet UNET, added ResNet style into Wave UNET, no paper produced.
Wave ResNext UNET, added ResNext style into Wave UNET, no paper produced.
Deep Speaker, An End-to-End Neural Speaker Embedding System, https://arxiv.org/pdf/1705.02304.pdf
SpeakerNet, 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification, https://arxiv.org/abs/2010.12653
VGGVox, a large-scale speaker identification dataset, https://arxiv.org/pdf/1706.08612.pdf
GhostVLAD, Utterance-level Aggregation For Speaker Recognition In The Wild, https://arxiv.org/abs/1902.10107
Conformer, Convolution-augmented Transformer for Speech Recognition, https://arxiv.org/abs/2005.08100
ALConformer, A lite Conformer, no paper produced.
Jasper, An End-to-End Convolutional Neural Acoustic Model, https://arxiv.org/abs/1904.03288
Tacotron2, Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions, https://arxiv.org/abs/1712.05884
FastSpeech2, Fast and High-Quality End-to-End Text to Speech, https://arxiv.org/abs/2006.04558
MelGAN, Generative Adversarial Networks for Conditional Waveform Synthesis, https://arxiv.org/abs/1910.06711
Multi-band MelGAN, Faster Waveform Generation for High-Quality Text-to-Speech, https://arxiv.org/abs/2005.05106
SRGAN, Modified version of SRGAN to do 1D Convolution, Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, https://arxiv.org/abs/1609.04802
Speech Enhancement UNET, https://github.com/haoxiangsnr/Wave-U-Net-for-Speech-Enhancement
Speech Enhancement ResNet UNET, Added ResNet style into Speech Enhancement UNET, no paper produced.
Speech Enhancement ResNext UNET, Added ResNext style into Speech Enhancement UNET, no paper produced.
Universal MelGAN, Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains, https://arxiv.org/abs/2011.09631
FastVC, Faster and Accurate Voice Conversion using Transformer, no paper produced.
FastSep, Faster and Accurate Speech Separation using Transformer, no paper produced.
wav2vec 2.0, A Framework for Self-Supervised Learning of Speech Representations, https://arxiv.org/abs/2006.11477
FastSpeechSplit, Unsupervised Speech Decomposition Via Triple Information Bottleneck using Transformer, no paper produced.
Sepformer, Attention is All You Need in Speech Separation, https://arxiv.org/abs/2010.13154
FastSpeechSplit, Faster and Accurate Speech Split Conversion using Transformer, no paper produced.
HuBERT, Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, https://arxiv.org/pdf/2106.07447v1.pdf
FastPitch, Parallel Text-to-speech with Pitch Prediction, https://arxiv.org/abs/2006.06873
GlowTTS, A Generative Flow for Text-to-Speech via Monotonic Alignment Search, https://arxiv.org/abs/2005.11129

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to KeyReply for sponsoring private cloud to train Malaya-Speech models, without it, this library will collapse entirely.

ExKaldi-RT: An Online Speech Recognition Extension Toolkit of Kaldi

ExKaldi-RT is an online ASR toolkit for Python language. It reads realtime streaming audio and do online feature extraction, probability computation, and online decoding.

31 Aug 16, 2021

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart

247 Jan 5, 2023

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

186 Dec 24, 2022

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。安装 pip install -U ttskit 注意可能需另外安装的依赖包：torch，版本要求torch=1.6.0,=1.7.1，根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

483 Jan 4, 2023

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

922 Dec 31, 2022

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

247 Dec 26, 2022

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

10 Oct 13, 2022

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Tevatron Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models. The toolkit has a modularized

193 Jan 4, 2023

Releases(1.3.0)

1.3.0(Sep 18, 2022)
Added GPT2 LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/gpt2-lm.html

Added Mask LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/masked-lm.html

Added Transducer with GPT2 LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html

Added Transducer with Mask LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html

Added GPT2 LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-gpt2.html

Added Mask LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-mlm.html

Added Squeezeformer transducer models.

Added End-to-End FastSpeech2 STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-e2e-fastspeech2.html

Added End-to-End VITS STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html

Added Neural Vocoder Super Resolution models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-tfgan.html

Added super resolution diffusion models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-audio-diffusion.html

Added HMM speaker diarization, https://malaya-speech.readthedocs.io/en/latest/load-diarization-clustering-hmm.html

Source code(tar.gz)
Source code(zip)
1.2.7(Jun 13, 2022)
Added Speech-to-Text HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html

Added Force Alignment HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html

Added Text-to-Speech LightSpeech, https://arxiv.org/abs/2102.04040, https://malaya-speech.readthedocs.io/en/latest/tts-lightspeech-model.html

Now Transducer LM support multi-languages.

Source code(tar.gz)
Source code(zip)
1.2.6(May 6, 2022)
Use HuggingFace as backend repository.

Added yasmin and osman speakers for TTS Tacotron2, https://malaya-speech.readthedocs.io/en/latest/tts-tacotron2-model.html

Added yasmin and osman speakers for TTS FastSpeech2, https://malaya-speech.readthedocs.io/en/latest/tts-fastspeech2-model.html

Added yasmin and osman speakers for TTS GlowTTS, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html

Use yasmin and osman speakers for long text TTS, https://malaya-speech.readthedocs.io/en/latest/tts-long-text.html

Source code(tar.gz)
Source code(zip)
1.2.5(Mar 20, 2022)
Use latest SpectralCluster==0.2.4 for diarization.

Added Gradio interface for STT and TTS.

Source code(tar.gz)
Source code(zip)
1.2.4(Mar 1, 2022)
Added malay language pretrained BEST-RQ models, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/stt/best_rq

Added BEST-RQ STT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html#List-available-CTC-model

Source code(tar.gz)
Source code(zip)
1.2.2(Dec 29, 2021)
Added 3 mixed languages for CTC Hubert model, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-3mixed.html

Source code(tar.gz)
Source code(zip)
1.2.1(Dec 2, 2021)
Added more KenLM models, included Malay + Singlish, https://malaya-speech.readthedocs.io/en/latest/ctc-language-model.html

Improved ASR CTC models, Hubert-Conformer-Large achieved 12.8% WER-LM, 3.8% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html

Added CTC Decoders interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-ctc-decoders.html

Added pyctcdecode interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode.html

Improved ASR RNNT models, large-conformer achieved 14.8% WER-LM, 5.9% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model.html

Added KenLM support for ASR RNNT models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html

Added ASR RNNT for 2 mixed languages, Malay and Singlish, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html#

Added ASR RNNT for 3 mixed languages, Malay, Singlish and Mandarin, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-3mixed.html

Added GlowTTS Text-to-Speech, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html

Added GlowTTS Text-to-Speech Multispeakers, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-multispeaker-model.html

Added HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-vocoder.html

Added Universal HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-universal-hifigan.html

Source code(tar.gz)
Source code(zip)
1.2(Oct 2, 2021)
Added HuBERT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html, new SOTA on Malay CER.

Improved Singlish TTS model, now supported Universal MelGAN as vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-singlish.html

Added Force Alignment module, now you can generate a time-aligned for your transcription, https://malaya-speech.readthedocs.io/en/latest/force-alignment.html

Improved Mixed STT Transducer models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html

Add new Mixed STT SOTA models, called conformer-stack-mixed, 2% better than other Mixed STT models, no paper produced, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html#List-available-RNNT-model

Add Singlish STT Transducer models, thanks to Singapore National Speech Corpus for the dataset, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-singlish.html

Source code(tar.gz)
Source code(zip)
1.1.1(Jun 29, 2021)
Improved Bahasa Speech-to-Text, Large Conformer beat Google Speech-to-Text accuracy.

Improved Mixed (malay and singlish) Speech-to-Text.

Added real time Mixed (malay and singlish) Speech-to-Text documentation, https://malaya-speech.readthedocs.io/en/latest/realtime-asr-mixed.html

Source code(tar.gz)
Source code(zip)
1.1(Jun 1, 2021)
Added SpeechSplit Conversion.

Added Force Alignment using Transducer model.

Added Optimization docs.

Source code(tar.gz)
Source code(zip)
1.0(Apr 18, 2021)

Released V1.0!
Source code(tar.gz)
Source code(zip)

Owner

HUSEIN ZOLKEPLI

I really love to fart and korek hidung.

GitHub Repository https://malaya-speech.readthedocs.io/

用Resnet101+GPT搭建一个玩王者荣耀的AI

基于pytorch框架用resnet101加GPT搭建AI玩王者荣耀本源码模型主要用了SamLynnEvans Transformer 的源码的解码部分。以及pytorch自带的预训练模型"resnet101-5d3b4d8f.pth"

2.2k Jan 03, 2023

Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

14 Dec 12, 2022

String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

1 Jan 06, 2022

Predict an emoji that is associated with a text

Sentiment Analysis Sentiment analysis in computational linguistics is a general term for techniques that quantify sentiment or mood in a text. Can you

30 Sep 07, 2022

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

This repository is a modification of: https://github.com/openai/gpt-2 for our sp

1 Jan 06, 2022

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Related tags

Overview

Documentation

Installing from the PyPI

Features

Pretrained Models

References

Acknowledgement

You might also like...

ExKaldi-RT: An Online Speech Recognition Extension Toolkit of Kaldi

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Releases(1.3.0)

1.3.0(Sep 18, 2022)

1.2.7(Jun 13, 2022)

1.2.6(May 6, 2022)

1.2.5(Mar 20, 2022)

1.2.4(Mar 1, 2022)

1.2.2(Dec 29, 2021)

1.2.1(Dec 2, 2021)

1.2(Oct 2, 2021)

1.1.1(Jun 29, 2021)

1.1(Jun 1, 2021)

1.0(Apr 18, 2021)

Owner

HUSEIN ZOLKEPLI

用Resnet101+GPT搭建一个玩王者荣耀的AI

Training RNNs as Fast as CNNs

String Gen + Word Checker

Predict an emoji that is associated with a text

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

숭실대학교 컴퓨터학부 전공종합설계프로젝트

The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

Natural Language Processing

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

Various Algorithms for Short Text Mining

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

This is a project of data parallel that running on NLP tasks.

Code for "Finetuning Pretrained Transformers into Variational Autoencoders"

This is Assignment1 code for the Web Data Processing System.

A PyTorch implementation of the Transformer model in "Attention is All You Need".

A relatively simple python program to generate one of those reddit text to speech videos dominating youtube.

A demo of chinese asr

A Japanese tokenizer based on recurrent neural networks

This library is testing the ethics of language models by using natural adversarial texts.

Nested Named Entity Recognition for Chinese Biomedical Text