Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Last update: Jan 09, 2023

Related tags

Overview

UniSpeech

The family of UniSpeech:

UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech-SAT (ICASSP 2022 Submission): Universal Speech Representation Learning with Speaker Aware Pre-Training

Pre-trained models

We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.

Model	Dataset	Model
UniSpeech Base	1500 hrs CommonVoice	download
UniSpeech Large	1500 hrs CommonVoice	download
UniSpeech-SAT Base	960 hrs LibriSpeech	download
UniSpeech-SAT Base+	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download
UniSpeech-SAT Large	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using UniSpeech models, please submit a GitHub issue.

For other communications related to UniSpeech, please contact Yu Wu ([email protected]).

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Related tags

Overview

UniSpeech

Pre-trained models

License

Contact Information

Owner

Microsoft

a reimplementation of Holistically-Nested Edge Detection in PyTorch

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

An OpenAI Gym environment for Super Mario Bros

Controlling a game using mediapipe hand tracking

An implementation of chunked, compressed, N-dimensional arrays for Python.

Texture mapping with variational auto-encoders

Official repository of DeMFI (arXiv.)

Repo for EchoVPR: Echo State Networks for Visual Place Recognition

Semantic Segmentation Architectures Implemented in PyTorch

Implementation for "Exploiting Aliasing for Manga Restoration" (CVPR 2021)

Code for the paper: Fighting Fake News: Image Splice Detection via Learned Self-Consistency

Caffe implementation for Hu et al. Segmentation for Natural Language Expressions

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"

Unofficial PyTorch Implementation of "Augmenting Convolutional networks with attention-based aggregation"

Towards End-to-end Video-based Eye Tracking

https://arxiv.org/abs/2102.11005

Record radiologists' eye gaze when they are labeling images.