Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Overview

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together.

Introduction

For the same input:

from datasets import load_dataset
import soundfile as sf


# define function to read in sound file
def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch


# load dummy dataset and read soundfiles
ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

transcript = ds['text'][0]
speech = ds["speech"][0]

Speech encoder NLP decoder

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP decoder only fine-tune on cross attention/projection/decoder embedding

model = SpeechMixED("facebook/wav2vec2-base-960h", "facebook/bart-large", ftl=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large")

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on layer norm and attention

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", lna=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Speech encoder NLP encoder decoder only fine-tune on speech encoder

model = SpeechMixEED("facebook/wav2vec2-base-960h", "facebook/bart-large", fne=True)

transcript_tensor = model.tokenizer(transcript, return_tensors="pt").input_ids
speech_tensor = model.processor(speech, return_tensors="pt").input_values

model(speech_tensor, transcript_tensor)

Installation

pip install

pip install speechmix

Build from source

git clone and cd into this project.

pip install -e .
Owner
Eric Lam
NLP researcher, Data scientist and computer science engineer.
Eric Lam
Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Universal Adversarial Triggers for Attacking and Analyzing NLP This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for

Eric Wallace 248 Dec 17, 2022
NeMo: a toolkit for conversational AI

NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with

NVIDIA Corporation 5.3k Jan 04, 2023
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 2k Jan 04, 2023
Treemap visualisation of Maya scene files

Ever wondered which nodes are responsible for that 600 mb+ Maya scene file? Features Fast, resizable UI Parsing at 50 mb/sec Dependency-free, single-f

Marcus Ottosson 76 Nov 12, 2022
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

Amazon Web Services - Labs 124 Jan 03, 2023
ByT5: Towards a token-free future with pre-trained byte-to-byte models

ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 is a tokenizer-free extension of the mT5 model. Instead of using a subword

Google Research 409 Jan 06, 2023
Question and answer retrieval in Turkish with BERT

trfaq Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! 🎉 What is this? At this repo, I'm

M. Yusuf Sarıgöz 13 Oct 10, 2022
Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

ThreatFox2Misp Creating a Feed of MISP Events from ThreatFox (by abuse.ch) What will it do? This will fetch IOCs from ThreatFox by Abuse.ch, convert t

17 Nov 22, 2022
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022
Example code for "Real-World Natural Language Processing"

Real-World Natural Language Processing This repository contains example code for the book "Real-World Natural Language Processing." AllenNLP (2.5.0 or

Masato Hagiwara 303 Dec 17, 2022
A simple Streamlit App to classify swahili news into different categories.

Swahili News Classifier Streamlit App A simple app to classify swahili news into different categories. Installation Install all streamlit requirements

Davis David 4 May 01, 2022
JaQuAD: Japanese Question Answering Dataset

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension (2022, Skelter Labs)

SkelterLabs 84 Dec 27, 2022
HuggingTweets - Train a model to generate tweets

HuggingTweets - Train a model to generate tweets Create in 5 minutes a tweet generator based on your favorite Tweeter Make my own model with the demo

Boris Dayma 318 Jan 04, 2023
PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

922 Dec 31, 2022
MicBot - MicBot uses Google Translate to speak everyone's chat messages

MicBot MicBot uses Google Translate to speak everyone's chat messages. It can al

2 Mar 09, 2022
American Sign Language (ASL) to Text Converter

Signterpreter American Sign Language (ASL) to Text Converter Recommendations Although there is grayscale and gaussian blur, we recommend that you use

0 Feb 20, 2022
Transformer related optimization, including BERT, GPT

This repository provides a script and recipe to run the highly optimized transformer-based encoder and decoder component, and it is tested and maintained by NVIDIA.

NVIDIA Corporation 1.7k Jan 04, 2023
Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

BERT-for-Surprisal Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings

7 Dec 05, 2022
A python package to fine-tune transformer-based models for named entity recognition (NER).

nerblackbox A python package to fine-tune transformer-based language models for named entity recognition (NER). Resources Source Code: https://github.

Felix Stollenwerk 13 Jul 30, 2022
AI_Assistant - This is a Python based Voice Assistant.

This is a Python based Voice Assistant. This was programmed to increase my understanding of python and also how the in-general Voice Assistants work.

1 Jan 06, 2022