PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Last update: Jan 04, 2023

Overview

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition.

Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. Conformer combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.

This repository contains only model code, but you can train with conformer with this repository.

Installation

This project recommends Python 3.7 or higher. We recommend creating a new virtual environment for this project (using virtual env or conda).

Prerequisites

Numpy: pip install numpy (Refer here for problem installing Numpy).
Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.

Install from source

Currently we only support installation from source code using setuptools. Checkout the source code and run the following commands:

pip install -e .

Usage

import torch
import torch.nn as nn
from conformer import Conformer

batch_size, sequence_length, dim = 3, 12345, 80

cuda = torch.cuda.is_available()  
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(batch_size, sequence_length, dim).to(device)
input_lengths = torch.IntTensor([12345, 12300, 12000])
targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
                            [1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
                            [1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
target_lengths = torch.LongTensor([9, 8, 7])

model = nn.DataParallel(Conformer(num_classes=10, input_dim=dim, 
                                  encoder_dim=32, num_encoder_layers=3, 
                                  decoder_dim=32, device=device)).to(device)

# Forward propagate
outputs = model(inputs, input_lengths, targets, target_lengths)

# Recognize input speech
outputs = model.module.recognize(inputs, input_lengths)

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on github or
contacts [email protected] please.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Reference

Author

Soohwan Kim @sooftware
Contacts: [email protected]

Comments

Outputs differ from Targets

@sooftware Can you kindly explain to me why the output lengths and targets are so different? :/ (also in outputs I get negative floats). Example shown below

The outputs are of shape [32,490,16121] (where 16121 is the len of my vocab) What is the 490 dimensions Also the outputs are probabilities right?

(outputs)
tensor([[[-9.7001, -9.6490, -9.6463,  ..., -9.6936, -9.6430, -9.7431],
         [-9.6997, -9.6487, -9.6470,  ..., -9.6903, -9.6450, -9.7416],
         [-9.6999, -9.6477, -9.6479,  ..., -9.6898, -9.6453, -9.7417],
         ...,
         [-9.7006, -9.6449, -9.6513,  ..., -9.6889, -9.6477, -9.7405],
         [-9.7003, -9.6448, -9.6512,  ..., -9.6893, -9.6477, -9.7410],
         [-9.7007, -9.6453, -9.6513,  ..., -9.6892, -9.6466, -9.7403]],

        [[-9.6844, -9.6316, -9.6387,  ..., -9.6880, -9.6269, -9.7657],
         [-9.6834, -9.6299, -9.6404,  ..., -9.6872, -9.6283, -9.7642],
         [-9.6834, -9.6334, -9.6387,  ..., -9.6864, -9.6290, -9.7616],
         ...,
         [-9.6840, -9.6299, -9.6431,  ..., -9.6830, -9.6304, -9.7608],
         [-9.6838, -9.6297, -9.6428,  ..., -9.6834, -9.6303, -9.7609],
         [-9.6842, -9.6300, -9.6428,  ..., -9.6837, -9.6292, -9.7599]],

        [[-9.6966, -9.6386, -9.6458,  ..., -9.6896, -9.6375, -9.7521],
         [-9.6974, -9.6374, -9.6462,  ..., -9.6890, -9.6369, -9.7516],
         [-9.6974, -9.6405, -9.6456,  ..., -9.6876, -9.6378, -9.7491],
         ...,
         [-9.6978, -9.6336, -9.6493,  ..., -9.6851, -9.6419, -9.7490],
         [-9.6971, -9.6334, -9.6487,  ..., -9.6863, -9.6411, -9.7501],
         [-9.6972, -9.6338, -9.6489,  ..., -9.6867, -9.6396, -9.7497]],

        ...,

        [[-9.7005, -9.6249, -9.6588,  ..., -9.6762, -9.6557, -9.7555],
         [-9.7028, -9.6266, -9.6597,  ..., -9.6765, -9.6574, -9.7542],
         [-9.7016, -9.6240, -9.6605,  ..., -9.6761, -9.6576, -9.7553],
         ...,
         [-9.7036, -9.6237, -9.6624,  ..., -9.6728, -9.6590, -9.7524],
         [-9.7034, -9.6235, -9.6620,  ..., -9.6735, -9.6589, -9.7530],
         [-9.7038, -9.6240, -9.6622,  ..., -9.6738, -9.6582, -9.7524]],

        [[-9.7058, -9.6305, -9.6566,  ..., -9.6739, -9.6557, -9.7466],
         [-9.7061, -9.6273, -9.6569,  ..., -9.6774, -9.6564, -9.7499],
         [-9.7046, -9.6280, -9.6576,  ..., -9.6772, -9.6575, -9.7498],
         ...,
         [-9.7060, -9.6263, -9.6609,  ..., -9.6714, -9.6561, -9.7461],
         [-9.7055, -9.6262, -9.6605,  ..., -9.6723, -9.6558, -9.7469],
         [-9.7058, -9.6270, -9.6606,  ..., -9.6725, -9.6552, -9.7460]],

        [[-9.7101, -9.6312, -9.6570,  ..., -9.6736, -9.6551, -9.7420],
         [-9.7102, -9.6307, -9.6579,  ..., -9.6733, -9.6576, -9.7418],
         [-9.7078, -9.6281, -9.6598,  ..., -9.6704, -9.6596, -9.7418],
         ...,
         [-9.7084, -9.6288, -9.6605,  ..., -9.6706, -9.6588, -9.7399],
         [-9.7081, -9.6286, -9.6600,  ..., -9.6714, -9.6584, -9.7406],
         [-9.7085, -9.6291, -9.6601,  ..., -9.6717, -9.6577, -9.7398]]],
       device='cuda:0', grad_fn=<LogSoftmaxBackward0>)

(output_lengths)
tensor([312, 260, 315, 320, 317, 275, 308, 291, 272, 300, 262, 227, 303, 252,
        298, 256, 303, 251, 284, 259, 263, 286, 209, 262, 166, 194, 149, 212,
        121, 114, 110,  57], device='cuda:0', dtype=torch.int32)

(target_lengths)
tensor([57, 55, 54, 50, 49, 49, 49, 48, 48, 47, 43, 42, 41, 40, 40, 39, 37, 37,
        36, 36, 36, 35, 34, 33, 29, 27, 26, 24, 20, 19, 17,  9])

I am using the following code for training and evaluation

import torch
import time
import sys
from google.colab import output
import torch.nn as nn
from conformer import Conformer
import torchmetrics
import random

cuda = torch.cuda.is_available()  
device = torch.device('cuda' if cuda else 'cpu')
print('Device:', device)

################################################################################

def train_model(model, optimizer, criterion, loader, metric):
  running_loss = 0.0
  for i, (audio,audio_len, translations, translation_len) in enumerate(loader):
    # with output.use_tags('some_outputs'):
    #   sys.stdout.write('Batch: '+ str(i+1)+'/290')
    #   sys.stdout.flush();

    #sorting inputs and targets to have targets in descending order based on len
    sorted_list,sorted_indices=torch.sort(translation_len,descending=True)

    sorted_audio=torch.zeros((32,201,1963),dtype=torch.float)
    sorted_audio_len=torch.zeros(32,dtype=torch.int)
    sorted_translations=torch.zeros((32,78),dtype=torch.int)
    sorted_translation_len=sorted_list

    for index, contentof in enumerate(translation_len):
      sorted_audio[index]=audio[sorted_indices[index]]
      sorted_audio_len[index]=audio_len[sorted_indices[index]]
      sorted_translations[index]=translations[sorted_indices[index]]

    #transpose inputs from (batch, dim, seq_len) to (batch, seq_len, dim)
    inputs=sorted_audio.to(device)
    inputs=torch.transpose(inputs, 1, 2)
    input_lengths=sorted_audio_len
    targets=sorted_translations.to(device)
    target_lengths=sorted_translation_len

    optimizer.zero_grad()
  
    # Forward propagate
    outputs, output_lengths = model(inputs, input_lengths)
    # print(outputs)

    # Calculate CTC Loss
    loss = criterion(outputs.transpose(0, 1), targets, output_lengths, target_lengths)

    loss.backward()
    optimizer.step()

    # print statistics
    running_loss += loss.item()

    output.clear(output_tags='some_outputs')

  loss_per_epoch=running_loss/(i+1)
  # print(f'Loss: {loss_per_epoch:.3f}')

  return loss_per_epoch

################################################################################

def eval_model(model, optimizer, criterion, loader, metric):
  running_loss = 0.0
  wer_calc=0.0
  random_index_per_epoch= random.randint(0, 178)

  for i, (audio,audio_len, translations, translation_len) in enumerate(loader):
    # with output.use_tags('some_outputs'):
    #   sys.stdout.write('Batch: '+ str(i+1)+'/72')
    #   sys.stdout.flush();

    #sorting inputs and targets to have targets in descending order based on len
    sorted_list,sorted_indices=torch.sort(translation_len,descending=True)

    sorted_audio=torch.zeros((32,201,1963),dtype=torch.float)
    sorted_audio_len=torch.zeros(32,dtype=torch.int)
    sorted_translations=torch.zeros((32,78),dtype=torch.int)
    sorted_translation_len=sorted_list

    for index, contentof in enumerate(translation_len):
      sorted_audio[index]=audio[sorted_indices[index]]
      sorted_audio_len[index]=audio_len[sorted_indices[index]]
      sorted_translations[index]=translations[sorted_indices[index]]

    #transpose inputs from (batch, dim, seq_len) to (batch, seq_len, dim)
    inputs=sorted_audio.to(device)
    inputs=torch.transpose(inputs, 1, 2)
    input_lengths=sorted_audio_len
    targets=sorted_translations.to(device)
    target_lengths=sorted_translation_len

    # Forward propagate
    outputs, output_lengths = model(inputs, input_lengths)
    # print(outputs)

    # Calculate CTC Loss
    loss = criterion(outputs.transpose(0, 1), targets, output_lengths, target_lengths)

    print(output_lengths)
    print(target_lengths)
    # outputs_in_words=words_vocab.convert_pred_to_words(outputs.transpose(0, 1))
    # targets_in_words=words_vocab.convert_pred_to_words(targets)
    # wer=metrics_calculation(metric, outputs_in_words,targets_in_words)
    
    break

    if (i==random_index_per_epoch):
        print(outputs_in_words,targets_in_words)

    running_loss += loss.item()
    # wer_calc += wer

    output.clear(output_tags='some_outputs')

  loss_per_epoch=running_loss/(i+1)
  wer_per_epoch=wer_calc/(i+1)

  return loss_per_epoch, wer_per_epoch

################################################################################

def train_eval_model(epochs):
  #conformer model init
  model = nn.DataParallel(Conformer(num_classes=16121, input_dim=201, encoder_dim=32, num_encoder_layers=1)).to(device)

  # Optimizers specified in the torch.optim package
  optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)

  #loss function
  criterion = nn.CTCLoss().to(device)

  #metrics init
  metric=torchmetrics.WordErrorRate()

  for epoch in range(epochs):
    print("Epoch", epoch+1)

    ############################################################################
    #TRAINING      
    model.train()
    print("Training")

    # epoch_loss=train_model(model=model,optimizer=optimizer, criterion=criterion, loader=train_loader, metric=metric)

    # print(f'Loss: {epoch_loss:.3f}')
    # print(f'WER: {epoch_wer:.3f}')

    ############################################################################
    #EVALUATION
    model.train(False)
    print("Validation")

    epoch_val_loss, epoch_val_wer=eval_model(model=model,optimizer=optimizer, criterion=criterion, loader=test_loader, metric=metric)
    
    print(f'Loss: {epoch_val_loss:.3f}')     
    print(f'WER: {epoch_val_wer:.3f}')   

################################################################################

def metrics_calculation(metric, predictions, targets):
    print(predictions)
    print(targets)
    wer=metric(predictions, targets)

    return wer



train_eval_model(1)

opened by jcgeo9 8

question about the relative shift function
Hi @sooftware, thank you for coding this repo. I have a question about the relative shift function: https://github.com/sooftware/conformer/blob/c76ff16d01b149ae518f3fe66a3dd89c9ecff2fc/conformer/attention.py#L105 I don't quite understand how this function works. Could you elaborate on this?

An example input and output of size 4 is shown below, which does not really make sense to me.

Input:

tensor([[[[-0.9623, -0.3168, -1.1478, -1.3076], [ 0.5907, -0.0391, -0.1849, -0.6368], [-0.3956, 0.2142, -0.6415, 0.2196], [-0.8194, -0.2601, 1.1337, -0.3478]]]])

output:

tensor([[[[-1.3076, 0.0000, 0.5907, -0.0391], [-0.1849, -0.6368, 0.0000, -0.3956], [ 0.2142, -0.6415, 0.2196, 0.0000], [-0.8194, -0.2601, 1.1337, -0.3478]]]])

Thank you!
opened by ChanganVR 6
Decoding predictions to strings

Hi, thanks for the great repo.

the README Usage example gives outputs as a torch tensor of ints. How would you suggest decoding these to strings (the actual speech)?

Thanks!

opened by Andrew-Brown1 3
mat1 and mat2 shapes cannot be multiplied (1323x9248 and 1568x32)

These are the shapes of my input, input_len, target, target_len where batch size=27

This is the setup I am running (only using first batch to check that is working before training with all the batches)

This is the error I am getting

I need some assistance here please:)

opened by jcgeo9 2

error when reproducing the example of use (RuntimeError: Input tensor at index 1 has invalid shape [1, 3085, 8, 10], but expected [1, 3085, 9, 10])

Running the code results in an error:

import torch
print(torch.__version__)
import torch.nn as nn
from conformer import Conformer

batch_size, sequence_length, dim = 3, 12345, 80

cuda = torch.cuda.is_available()  
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(batch_size, sequence_length, dim).to(device)
input_lengths = torch.IntTensor([12345, 12300, 12000])
targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
                            [1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
                            [1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
target_lengths = torch.LongTensor([9, 8, 7])

model = nn.DataParallel(Conformer(num_classes=10, input_dim=dim, 
                                  encoder_dim=32, num_encoder_layers=3, 
                                  decoder_dim=32, device=device)).to(device)

# Forward propagate
outputs = model(inputs, input_lengths, targets, target_lengths)

# Recognize input speech
outputs = model.module.recognize(inputs, input_lengths)

1.9.0+cu111
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-12-eea3aeffaf58> in <module>
     21 
     22 # Forward propagate
---> 23 outputs = model(inputs, input_lengths, targets, target_lengths)
     24 
     25 # Recognize input speech

/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
    167             replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
    168             outputs = self.parallel_apply(replicas, inputs, kwargs)
--> 169             return self.gather(outputs, self.output_device)
    170 
    171     def replicate(self, module, device_ids):

/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py in gather(self, outputs, output_device)
    179 
    180     def gather(self, outputs, output_device):
--> 181         return gather(outputs, output_device, dim=self.dim)
    182 
    183 

/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py in gather(outputs, target_device, dim)
     76     # Setting the function to None clears the refcycle.
     77     try:
---> 78         res = gather_map(outputs)
     79     finally:
     80         gather_map = None

/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py in gather_map(outputs)
     61         out = outputs[0]
     62         if isinstance(out, torch.Tensor):
---> 63             return Gather.apply(target_device, dim, *outputs)
     64         if out is None:
     65             return None

/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/_functions.py in forward(ctx, target_device, dim, *inputs)
     73             ctx.unsqueezed_scalar = False
     74         ctx.input_sizes = tuple(i.size(ctx.dim) for i in inputs)
---> 75         return comm.gather(inputs, ctx.dim, ctx.target_device)
     76 
     77     @staticmethod

/opt/conda/lib/python3.8/site-packages/torch/nn/parallel/comm.py in gather(tensors, dim, destination, out)
    233                 'device object or string instead, e.g., "cpu".')
    234         destination = _get_device_index(destination, allow_cpu=True, optional=True)
--> 235         return torch._C._gather(tensors, dim, destination)
    236     else:
    237         if destination is not None:

RuntimeError: Input tensor at index 1 has invalid shape [1, 3085, 8, 10], but expected [1, 3085, 9, 10]

I am using version Python 3.8.8. Which version should it work with?

opened by sovse 2

The to.(self.device) in return

The inputs.to(self.device) in ConformerConvmodule and FeedForwardModule will cause the network graph in tensorboard to fork and appear kind of messy. Is there any special reason to write like that? Since in most cases we should have send both the model and tensor to the device before we input the tensor to the model, probably no more sending action is needed?

opened by panjiashu 2

Invalid size error when running usage in README

Hello sooftware, thank you very much for your wonderful work!

When I run the sample code in Usage of README:

import torch
import torch.nn as nn
from conformer import Conformer

batch_size, sequence_length, dim = 3, 12345, 80

cuda = torch.cuda.is_available()  
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(batch_size, sequence_length, dim).to(device)
input_lengths = torch.IntTensor([12345, 12300, 12000])
targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
                            [1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
                            [1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
target_lengths = torch.LongTensor([9, 8, 7])

model = nn.DataParallel(Conformer(num_classes=10, input_dim=dim, 
                                  encoder_dim=32, num_encoder_layers=3, 
                                  decoder_dim=32, device=device)).to(device)

# Forward propagate
outputs = model(inputs, input_lengths, targets, target_lengths)

# Recognize input speech
outputs = model.module.recognize(inputs, input_lengths)

I got this error:

Traceback (most recent call last):
  File "/home/xuchutian/ASR/sooftware-conformer/try.py", line 36, in <module>
    outputs = model(inputs, input_lengths, targets, target_lengths)
  File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 156, in forward
    return self.gather(outputs, self.output_device)
  File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    res = gather_map(outputs)
  File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 68, in forward
    return comm.gather(inputs, ctx.dim, ctx.target_device)
  File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/cuda/comm.py", line 165, in gather
    return torch._C._gather(tensors, dim, destination)
RuntimeError: Gather got an input of invalid size: got [1, 3085, 8, 10], but expected [1, 3085, 9, 10]

May I ask how to solve this error?

Thank you very much.

opened by chutianxu 2

use relative import

The import path is now absolute, which requires users to install or configuring the python path before using. However, this can be improved with relative import, so users can use the package without installing it first.

opened by bridgream 1
Remove device from the argument list

This PR solve #33 by removing device from the argument list, which will require the user to manually put input tensors to device as done in the example code in README.

The property solution mentioned in #33 is not adopted as it does work with nn.DataParallel.

When the devices of input tensor and module parameters match, the following to device on the input tensor is not required, which are removed in this PR:

https://github.com/sooftware/conformer/blob/348e8af6c156dae19e311697cbb22b9581880a12/conformer/encoder.py#L117

Besides, as positional encoding is created from a buffer whose device is changed with the module, we don't have to call to device here, which is also removed in this PR.

https://github.com/sooftware/conformer/blob/610a77667aafe533a85001298c522e7079503da4/conformer/attention.py#L147

opened by enhuiz 1
Switching device
Hi. I notice the model requires passing the device as an argument, which may have not been decided yet at the point of the module initialization. Once the device is decided, it seems we cannot easily change it. Do you consider making the device switchable? One solution may be instead of passing the device, add an attribute:

@property def device(self): return next(self.parameters()).device
opened by enhuiz 1
cannot import name 'Conformer'

Hi when I tried to import conformer, I got this issue >>> from conformer import Conformer Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/workspace/bert/conformer/conformer.py", line 3, in <module> from conformer import Conformer ImportError: cannot import name 'Conformer' from partially initialized module 'conformer' (most likely due to a circular import) (/workspace/bert/conformer/conformer.py) I did as the installation instruction. Would you please see where I might be wrong? Thanks.

opened by cyy857 1
Fix relative positional multi-head attention layer
I referred to fairseq's conformer layer multi-head attention. [code] I also confirmed that it is training.

math.sqrt(dim) -> math.sqrt(d_head)

Add relative positional encoding module

Fix _relative_shift method - input : B X n_head X T X 2T-1 - output : B X n_head X T X T
opened by upskyy 0
Feature Extraction using Pre-trained Conformer Model

Is there any possibility to use pre-trained conformer model for feature extraction on another speech dataset. Have you uploaded your pre-trained model and is there any tutorial how to extract embeddings ? Thank you

opened by shakeel608 0
export onnx

Hi, I am a little confused, if I want to export the onnx, should I use the forward or the recognize function? The difference seems to be that in the recognize function, the decoder loop num is adaptive according to the encoder outputs

opened by pengaoao 1

Releases(v1.0)

v1.0(Feb 21, 2022)
Conformer encoder only

Update README.md (example with CTC loss)

Source code(tar.gz)
Source code(zip)

Owner

Soohwan Kim

Current AI Research Engineer at Kakao Brain.

GitHub Repository https://sooftware.github.io/conformer/

A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

427 Dec 13, 2022

Implementation of popular bandit algorithms in batch environments.

batch-bandits Implementation of popular bandit algorithms in batch environments. Source code to our paper "The Impact of Batch Learning in Stochastic

2 Sep 11, 2022

Official Implementation for Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation We present a generic image-to-image translation framework, pixel2style2pixel (pSp

2.8k Dec 30, 2022

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

English | 简体中文 Easy Parallel Library Overview Easy Parallel Library (EPL) is a general and efficient library for distributed model training. Usability

185 Dec 21, 2022

PyTorch implementation of MLP-Mixer

PyTorch implementation of MLP-Mixer MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations. The token-mix

33 Nov 27, 2022

Wenet STT Python

Wenet STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using WeNet models for sp

33 Feb 21, 2022

PyTorch Personal Trainer: My framework for deep learning experiments

Alex's PyTorch Personal Trainer (ptpt) (name subject to change) This repository contains my personal lightweight framework for deep learning projects

8 Jul 14, 2022

Adapter-BERT: Parameter-Efficient Transfer Learning for NLP.

340 Jan 03, 2023

This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans

This repository contains the code for TABS, a 3D CNN-Transformer hybrid automated brain tissue segmentation algorithm using T1w structural MRI scans. TABS relies on a Res-Unet backbone, with a Vision

6 Nov 07, 2022

This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

OpenCV-Multiple-Object-Tracking Python is version 3.6.7 to install opencv: pip uninstall opecv-python pip uninstall opencv-contrib-python pip install

6 Dec 19, 2021

A brand new hub for Scene Graph Generation methods based on MMdetection (2021). The pipeline of from detection, scene graph generation to downstream tasks (e.g., image cpationing) is supported. Pytorch version implementation of HetH (ECCV 2020) and TopicSG (ICCV 2021) is included.

MMSceneGraph Introduction MMSceneneGraph is an open source code hub for scene graph generation as well as supporting downstream tasks based on the sce

39 Dec 17, 2022

Code for "The Intrinsic Dimension of Images and Its Impact on Learning" - ICLR 2021 Spotlight

dimensions Estimating the instrinsic dimensionality of image datasets Code for: The Intrinsic Dimensionaity of Images and Its Impact On Learning - Phi

41 Dec 10, 2022

Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

Topology-Imbalance Learning for Semi-Supervised Node Classification Introduction Code for NeurIPS 2021 paper "Topology-Imbalance Learning for Semi-Sup

40 Nov 23, 2022

Employee-Managment - Company employee registration software in the face recognition system

Employee-Managment Company employee registration software in the face recognitio

7 Jul 10, 2022

Convert Pytorch model to onnx or tflite, and the converted model can be visualized by Netron

5 Nov 19, 2022

Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

Render In-between: Motion Guided Video Synthesis for Action Interpolation [Paper] [Supp] [arXiv] [4min Video] This is the official Pytorch implementat

8 Oct 27, 2022

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Annoy Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given quer

10.6k Jan 04, 2023

A curated list of the top 10 computer vision papers in 2021 with video demos, articles, code and paper reference.

The Top 10 Computer Vision Papers of 2021 The top 10 computer vision papers in 2021 with video demos, articles, code, and paper reference. While the w

118 Dec 21, 2022

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

270 Dec 31, 2022

Code for our paper "Sematic Representation for Dialogue Modeling" in ACL2021

AMR-Dialogue An implementation for paper "Semantic Representation for Dialogue Modeling". You may find our paper here. Requirements python 3.6 pytorch

45 Dec 26, 2022