Asr abc - Automatic speech recognition(ASR),中文语音识别

Overview

语音识别的简单示例,主要在课堂演示使用

创建python虚拟环境

在linux 和macos 上验证通过

# 如果已经有pyhon3.6 环境,跳过该步骤,使用现有环境也可以
virtualenv ~/env/asr_abc --python=python3.8
. ~/env/asr_abc/bin/activate

安装本项目

python setup.py install
or 
pip install .

识别wav音频

Note: 输入音频采样率必须是16k,如果待识别音频不是16k,可以采用以下命令重采样为16k.

ffmpeg -i ks1_48k.acc -ar 16000 ks1_16k.wav
python decode.py
# 或debug 模式
python decode.py -d

#预期输出:
2021-12-24 17:08:31,736 INFO [decode.py:91] All files seem exist.
2021-12-24 17:08:31,736 INFO [decode.py:96] Start loading model.
2021-12-24 17:08:41,321 INFO [decode.py:113] Start loading dict.
2021-12-24 17:08:41,336 INFO [decode.py:119] Start recognize data/wavs/BAC009S0764W0143.wav.
2021-12-24 17:08:41,527 INFO [decode.py:134] Result: 在市场整体从高速增长进入中高速增长区间的同时
2021-12-24 17:08:41,527 INFO [decode.py:135] done.

# 也可以指定输入音频
python decode.py --input-wav=data/wavs/ks1_16k.wav
或者
python decode.py -i=data/wavs/ks1_16k.wav  # ks is short for "Kantanzhe Song"
# 预期输出:
2021-12-27 19:16:02,911 INFO [decode.py:91] All files seem exist.
2021-12-27 19:16:02,911 INFO [decode.py:96] Start loading model.
2021-12-27 19:16:08,405 INFO [decode.py:113] Start loading dict.
2021-12-27 19:16:08,409 INFO [decode.py:119] Start recognize data/wavs/ks1_16k.wav.
2021-12-27 19:16:08,449 INFO [decode.py:137] Result: 我们有火焰般的热情
2021-12-27 19:16:08,450 INFO [decode.py:138] done.

相关项目链接:

https://github.com/k2-fsa/icefall
https://github.com/k2-fsa/k2
https://github.com/lhotse-speech/lhotse

手动模型下载

如果上述python decode.py 已识别出预期结果,说明模型已自动从下载源1成功下载模型,无需关注以下内容。

下载源1:(decode.py 代码会自动访问这个源下载): https://huggingface.co/GuoLiyong/cn_conformer_encoder_aishell/tree/main/data/lang_char

下载源2: 百度网盘

链接: https://pan.baidu.com/s/17tPOJM_Sm49q1kZrE3jfUQ
提取码: qa4p

对于访问下载源1有困难或者访问速度过慢的同学,可以手动从百度网盘下载. 下载完毕后按以下文件结构放置下载所得的"tokens.txt"和"conformer_encoder.pt"两个文件:

.
|-- README.md
|-- build
|   `-- bdist.linux-x86_64
|-- conformer.py
|-- data
|   |-- lang_char
|   |   |-- tokens.txt
|   `-- wavs
|       |-- BAC009S0764W0143.wav
|       |-- README.md
|       `-- transcript
|-- decode.py
|-- exp
|   `-- conformer_encoder.pt
|-- requirements.txt
|-- setup.py
`-- utils.py
Owner
LIyong.Guo
LIyong.Guo
Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Zhenhailong Wang 2 Jul 15, 2022
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022
A music comments dataset, containing 39,051 comments for 27,384 songs.

Music Comments Dataset A music comments dataset, containing 39,051 comments for 27,384 songs. For academic research use only. Introduction This datase

Zhang Yixiao 2 Jan 10, 2022
TLA - Twitter Linguistic Analysis

TLA - Twitter Linguistic Analysis Tool for linguistic analysis of communities TLA is built using PyTorch, Transformers and several other State-of-the-

Tushar Sarkar 47 Aug 14, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 06, 2022
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
A Chinese to English Neural Model Translation Project

ZH-EN NMT Chinese to English Neural Machine Translation This project is inspired by Stanford's CS224N NMT Project Dataset used in this project: News C

Zhenbang Feng 29 Nov 26, 2022
Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

Wordle_Bot Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time. It will log onto the wordle website and en

Lucas Polidori 15 Dec 11, 2022
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

3 Dec 20, 2022
Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

OkaeriChatBot Just another Telegram AI chat bot written in Python using Pyrogram. Requirements Python 3.7 or higher.

Wahyusaputra 2 Dec 23, 2021
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

T-TA (Transformer-based Text Auto-encoder) This repository contains codes for Transformer-based Text Auto-encoder (T-TA, paper: Fast and Accurate Deep

Jeong Ukjae 13 Dec 13, 2022
A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk.

Simple-Vosk A Python wrapper for simple offline real-time dictation (speech-to-text) and speaker-recognition using Vosk. Check out the official Vosk G

2 Jun 19, 2022
This is the offline-training-pipeline for our project.

offline-training-pipeline This is the offline-training-pipeline for our project. We adopt the offline training and online prediction Machine Learning

0 Apr 22, 2022
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 101 Dec 26, 2022
[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

◥ Curriculum Labeling ◣ Revisiting Pseudo-Labeling for Semi-Supervised Learning Paola Cascante-Bonilla, Fuwen Tan, Yanjun Qi, Vicente Ordonez. In the

UVA Computer Vision 113 Dec 15, 2022
🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

Explosion 222 Dec 16, 2022
Binaural Speech Synthesis

Binaural Speech Synthesis This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided datase

Facebook Research 135 Dec 18, 2022
Pytorch version of BERT-whitening

BERT-whitening This is the Pytorch implementation of "Whitening Sentence Representations for Better Semantics and Faster Retrieval". BERT-whitening is

Weijie Liu 255 Dec 27, 2022
DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also v

Jinhyuk Lee 543 Jan 08, 2023