PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Overview

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition.


Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. Conformer combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.

This repository contains only model code, but you can train with conformer with this repository.

Installation

This project recommends Python 3.7 or higher. We recommend creating a new virtual environment for this project (using virtual env or conda).

Prerequisites

  • Numpy: pip install numpy (Refer here for problem installing Numpy).
  • Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.

Install from source

Currently we only support installation from source code using setuptools. Checkout the source code and run the following commands:

pip install -e .

Usage

import torch
import torch.nn as nn
from conformer import Conformer

batch_size, sequence_length, dim = 3, 12345, 80

cuda = torch.cuda.is_available()  
device = torch.device('cuda' if cuda else 'cpu')

inputs = torch.rand(batch_size, sequence_length, dim).to(device)
input_lengths = torch.IntTensor([12345, 12300, 12000])
targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
                            [1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
                            [1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
target_lengths = torch.LongTensor([9, 8, 7])

model = nn.DataParallel(Conformer(num_classes=10, input_dim=dim, 
                                  encoder_dim=32, num_encoder_layers=3, 
                                  decoder_dim=32, device=device)).to(device)

# Forward propagate
outputs = model(inputs, input_lengths, targets, target_lengths)

# Recognize input speech
outputs = model.module.recognize(inputs, input_lengths)

Troubleshoots and Contributing

If you have any questions, bug reports, and feature requests, please open an issue on github or
contacts [email protected] please.

I appreciate any kind of feedback or contribution. Feel free to proceed with small issues like bug fixes, documentation improvement. For major contributions and new features, please discuss with the collaborators in corresponding issues.

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Reference

Author

Comments
  • Outputs differ from Targets

    Outputs differ from Targets

    @sooftware Can you kindly explain to me why the output lengths and targets are so different? :/ (also in outputs I get negative floats). Example shown below

    The outputs are of shape [32,490,16121] (where 16121 is the len of my vocab) What is the 490 dimensions Also the outputs are probabilities right?

    (outputs)
    tensor([[[-9.7001, -9.6490, -9.6463,  ..., -9.6936, -9.6430, -9.7431],
             [-9.6997, -9.6487, -9.6470,  ..., -9.6903, -9.6450, -9.7416],
             [-9.6999, -9.6477, -9.6479,  ..., -9.6898, -9.6453, -9.7417],
             ...,
             [-9.7006, -9.6449, -9.6513,  ..., -9.6889, -9.6477, -9.7405],
             [-9.7003, -9.6448, -9.6512,  ..., -9.6893, -9.6477, -9.7410],
             [-9.7007, -9.6453, -9.6513,  ..., -9.6892, -9.6466, -9.7403]],
    
            [[-9.6844, -9.6316, -9.6387,  ..., -9.6880, -9.6269, -9.7657],
             [-9.6834, -9.6299, -9.6404,  ..., -9.6872, -9.6283, -9.7642],
             [-9.6834, -9.6334, -9.6387,  ..., -9.6864, -9.6290, -9.7616],
             ...,
             [-9.6840, -9.6299, -9.6431,  ..., -9.6830, -9.6304, -9.7608],
             [-9.6838, -9.6297, -9.6428,  ..., -9.6834, -9.6303, -9.7609],
             [-9.6842, -9.6300, -9.6428,  ..., -9.6837, -9.6292, -9.7599]],
    
            [[-9.6966, -9.6386, -9.6458,  ..., -9.6896, -9.6375, -9.7521],
             [-9.6974, -9.6374, -9.6462,  ..., -9.6890, -9.6369, -9.7516],
             [-9.6974, -9.6405, -9.6456,  ..., -9.6876, -9.6378, -9.7491],
             ...,
             [-9.6978, -9.6336, -9.6493,  ..., -9.6851, -9.6419, -9.7490],
             [-9.6971, -9.6334, -9.6487,  ..., -9.6863, -9.6411, -9.7501],
             [-9.6972, -9.6338, -9.6489,  ..., -9.6867, -9.6396, -9.7497]],
    
            ...,
    
            [[-9.7005, -9.6249, -9.6588,  ..., -9.6762, -9.6557, -9.7555],
             [-9.7028, -9.6266, -9.6597,  ..., -9.6765, -9.6574, -9.7542],
             [-9.7016, -9.6240, -9.6605,  ..., -9.6761, -9.6576, -9.7553],
             ...,
             [-9.7036, -9.6237, -9.6624,  ..., -9.6728, -9.6590, -9.7524],
             [-9.7034, -9.6235, -9.6620,  ..., -9.6735, -9.6589, -9.7530],
             [-9.7038, -9.6240, -9.6622,  ..., -9.6738, -9.6582, -9.7524]],
    
            [[-9.7058, -9.6305, -9.6566,  ..., -9.6739, -9.6557, -9.7466],
             [-9.7061, -9.6273, -9.6569,  ..., -9.6774, -9.6564, -9.7499],
             [-9.7046, -9.6280, -9.6576,  ..., -9.6772, -9.6575, -9.7498],
             ...,
             [-9.7060, -9.6263, -9.6609,  ..., -9.6714, -9.6561, -9.7461],
             [-9.7055, -9.6262, -9.6605,  ..., -9.6723, -9.6558, -9.7469],
             [-9.7058, -9.6270, -9.6606,  ..., -9.6725, -9.6552, -9.7460]],
    
            [[-9.7101, -9.6312, -9.6570,  ..., -9.6736, -9.6551, -9.7420],
             [-9.7102, -9.6307, -9.6579,  ..., -9.6733, -9.6576, -9.7418],
             [-9.7078, -9.6281, -9.6598,  ..., -9.6704, -9.6596, -9.7418],
             ...,
             [-9.7084, -9.6288, -9.6605,  ..., -9.6706, -9.6588, -9.7399],
             [-9.7081, -9.6286, -9.6600,  ..., -9.6714, -9.6584, -9.7406],
             [-9.7085, -9.6291, -9.6601,  ..., -9.6717, -9.6577, -9.7398]]],
           device='cuda:0', grad_fn=<LogSoftmaxBackward0>)
    
    (output_lengths)
    tensor([312, 260, 315, 320, 317, 275, 308, 291, 272, 300, 262, 227, 303, 252,
            298, 256, 303, 251, 284, 259, 263, 286, 209, 262, 166, 194, 149, 212,
            121, 114, 110,  57], device='cuda:0', dtype=torch.int32)
    
    (target_lengths)
    tensor([57, 55, 54, 50, 49, 49, 49, 48, 48, 47, 43, 42, 41, 40, 40, 39, 37, 37,
            36, 36, 36, 35, 34, 33, 29, 27, 26, 24, 20, 19, 17,  9])
    
    

    I am using the following code for training and evaluation

    import torch
    import time
    import sys
    from google.colab import output
    import torch.nn as nn
    from conformer import Conformer
    import torchmetrics
    import random
    
    cuda = torch.cuda.is_available()  
    device = torch.device('cuda' if cuda else 'cpu')
    print('Device:', device)
    
    ################################################################################
    
    def train_model(model, optimizer, criterion, loader, metric):
      running_loss = 0.0
      for i, (audio,audio_len, translations, translation_len) in enumerate(loader):
        # with output.use_tags('some_outputs'):
        #   sys.stdout.write('Batch: '+ str(i+1)+'/290')
        #   sys.stdout.flush();
    
        #sorting inputs and targets to have targets in descending order based on len
        sorted_list,sorted_indices=torch.sort(translation_len,descending=True)
    
        sorted_audio=torch.zeros((32,201,1963),dtype=torch.float)
        sorted_audio_len=torch.zeros(32,dtype=torch.int)
        sorted_translations=torch.zeros((32,78),dtype=torch.int)
        sorted_translation_len=sorted_list
    
        for index, contentof in enumerate(translation_len):
          sorted_audio[index]=audio[sorted_indices[index]]
          sorted_audio_len[index]=audio_len[sorted_indices[index]]
          sorted_translations[index]=translations[sorted_indices[index]]
    
        #transpose inputs from (batch, dim, seq_len) to (batch, seq_len, dim)
        inputs=sorted_audio.to(device)
        inputs=torch.transpose(inputs, 1, 2)
        input_lengths=sorted_audio_len
        targets=sorted_translations.to(device)
        target_lengths=sorted_translation_len
    
        optimizer.zero_grad()
      
        # Forward propagate
        outputs, output_lengths = model(inputs, input_lengths)
        # print(outputs)
    
        # Calculate CTC Loss
        loss = criterion(outputs.transpose(0, 1), targets, output_lengths, target_lengths)
    
        loss.backward()
        optimizer.step()
    
        # print statistics
        running_loss += loss.item()
    
        output.clear(output_tags='some_outputs')
    
      loss_per_epoch=running_loss/(i+1)
      # print(f'Loss: {loss_per_epoch:.3f}')
    
      return loss_per_epoch
    
    ################################################################################
    
    def eval_model(model, optimizer, criterion, loader, metric):
      running_loss = 0.0
      wer_calc=0.0
      random_index_per_epoch= random.randint(0, 178)
    
      for i, (audio,audio_len, translations, translation_len) in enumerate(loader):
        # with output.use_tags('some_outputs'):
        #   sys.stdout.write('Batch: '+ str(i+1)+'/72')
        #   sys.stdout.flush();
    
        #sorting inputs and targets to have targets in descending order based on len
        sorted_list,sorted_indices=torch.sort(translation_len,descending=True)
    
        sorted_audio=torch.zeros((32,201,1963),dtype=torch.float)
        sorted_audio_len=torch.zeros(32,dtype=torch.int)
        sorted_translations=torch.zeros((32,78),dtype=torch.int)
        sorted_translation_len=sorted_list
    
        for index, contentof in enumerate(translation_len):
          sorted_audio[index]=audio[sorted_indices[index]]
          sorted_audio_len[index]=audio_len[sorted_indices[index]]
          sorted_translations[index]=translations[sorted_indices[index]]
    
        #transpose inputs from (batch, dim, seq_len) to (batch, seq_len, dim)
        inputs=sorted_audio.to(device)
        inputs=torch.transpose(inputs, 1, 2)
        input_lengths=sorted_audio_len
        targets=sorted_translations.to(device)
        target_lengths=sorted_translation_len
    
        # Forward propagate
        outputs, output_lengths = model(inputs, input_lengths)
        # print(outputs)
    
        # Calculate CTC Loss
        loss = criterion(outputs.transpose(0, 1), targets, output_lengths, target_lengths)
    
        print(output_lengths)
        print(target_lengths)
        # outputs_in_words=words_vocab.convert_pred_to_words(outputs.transpose(0, 1))
        # targets_in_words=words_vocab.convert_pred_to_words(targets)
        # wer=metrics_calculation(metric, outputs_in_words,targets_in_words)
        
        break
    
        if (i==random_index_per_epoch):
            print(outputs_in_words,targets_in_words)
    
        running_loss += loss.item()
        # wer_calc += wer
    
        output.clear(output_tags='some_outputs')
    
      loss_per_epoch=running_loss/(i+1)
      wer_per_epoch=wer_calc/(i+1)
    
      return loss_per_epoch, wer_per_epoch
    
    ################################################################################
    
    def train_eval_model(epochs):
      #conformer model init
      model = nn.DataParallel(Conformer(num_classes=16121, input_dim=201, encoder_dim=32, num_encoder_layers=1)).to(device)
    
      # Optimizers specified in the torch.optim package
      optimizer = torch.optim.Adam(model.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)
    
      #loss function
      criterion = nn.CTCLoss().to(device)
    
      #metrics init
      metric=torchmetrics.WordErrorRate()
    
      for epoch in range(epochs):
        print("Epoch", epoch+1)
    
        ############################################################################
        #TRAINING      
        model.train()
        print("Training")
    
        # epoch_loss=train_model(model=model,optimizer=optimizer, criterion=criterion, loader=train_loader, metric=metric)
    
        # print(f'Loss: {epoch_loss:.3f}')
        # print(f'WER: {epoch_wer:.3f}')
    
        ############################################################################
        #EVALUATION
        model.train(False)
        print("Validation")
    
        epoch_val_loss, epoch_val_wer=eval_model(model=model,optimizer=optimizer, criterion=criterion, loader=test_loader, metric=metric)
        
        print(f'Loss: {epoch_val_loss:.3f}')     
        print(f'WER: {epoch_val_wer:.3f}')   
    
    ################################################################################
    
    def metrics_calculation(metric, predictions, targets):
        print(predictions)
        print(targets)
        wer=metric(predictions, targets)
    
        return wer
    
    
    
    train_eval_model(1)
    
    opened by jcgeo9 8
  • question about the relative shift function

    question about the relative shift function

    Hi @sooftware, thank you for coding this repo. I have a question about the relative shift function: https://github.com/sooftware/conformer/blob/c76ff16d01b149ae518f3fe66a3dd89c9ecff2fc/conformer/attention.py#L105 I don't quite understand how this function works. Could you elaborate on this?

    An example input and output of size 4 is shown below, which does not really make sense to me.

    Input:

    tensor([[[[-0.9623, -0.3168, -1.1478, -1.3076],
              [ 0.5907, -0.0391, -0.1849, -0.6368],
              [-0.3956,  0.2142, -0.6415,  0.2196],
              [-0.8194, -0.2601,  1.1337, -0.3478]]]])
    

    output:

    tensor([[[[-1.3076,  0.0000,  0.5907, -0.0391],
              [-0.1849, -0.6368,  0.0000, -0.3956],
              [ 0.2142, -0.6415,  0.2196,  0.0000],
              [-0.8194, -0.2601,  1.1337, -0.3478]]]])
    

    Thank you!

    opened by ChanganVR 6
  • Decoding predictions to strings

    Decoding predictions to strings

    Hi, thanks for the great repo.

    the README Usage example gives outputs as a torch tensor of ints. How would you suggest decoding these to strings (the actual speech)?

    Thanks!

    opened by Andrew-Brown1 3
  • mat1 and mat2 shapes cannot be multiplied (1323x9248 and 1568x32)

    mat1 and mat2 shapes cannot be multiplied (1323x9248 and 1568x32)

    These are the shapes of my input, input_len, target, target_len where batch size=27 image

    This is the setup I am running (only using first batch to check that is working before training with all the batches) image

    This is the error I am getting image

    I need some assistance here please:)

    opened by jcgeo9 2
  • error when reproducing the example of use (RuntimeError: Input tensor at index 1 has invalid shape [1, 3085, 8, 10], but expected [1, 3085, 9, 10])

    error when reproducing the example of use (RuntimeError: Input tensor at index 1 has invalid shape [1, 3085, 8, 10], but expected [1, 3085, 9, 10])

    Running the code results in an error:

    import torch
    print(torch.__version__)
    import torch.nn as nn
    from conformer import Conformer
    
    batch_size, sequence_length, dim = 3, 12345, 80
    
    cuda = torch.cuda.is_available()  
    device = torch.device('cuda' if cuda else 'cpu')
    
    inputs = torch.rand(batch_size, sequence_length, dim).to(device)
    input_lengths = torch.IntTensor([12345, 12300, 12000])
    targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
                                [1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
                                [1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
    target_lengths = torch.LongTensor([9, 8, 7])
    
    model = nn.DataParallel(Conformer(num_classes=10, input_dim=dim, 
                                      encoder_dim=32, num_encoder_layers=3, 
                                      decoder_dim=32, device=device)).to(device)
    
    # Forward propagate
    outputs = model(inputs, input_lengths, targets, target_lengths)
    
    # Recognize input speech
    outputs = model.module.recognize(inputs, input_lengths)
    
    
    
    1.9.0+cu111
    ---------------------------------------------------------------------------
    RuntimeError                              Traceback (most recent call last)
    <ipython-input-12-eea3aeffaf58> in <module>
         21 
         22 # Forward propagate
    ---> 23 outputs = model(inputs, input_lengths, targets, target_lengths)
         24 
         25 # Recognize input speech
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
       1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
       1050                 or _global_forward_hooks or _global_forward_pre_hooks):
    -> 1051             return forward_call(*input, **kwargs)
       1052         # Do not call functions when jit is used
       1053         full_backward_hooks, non_full_backward_hooks = [], []
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
        167             replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
        168             outputs = self.parallel_apply(replicas, inputs, kwargs)
    --> 169             return self.gather(outputs, self.output_device)
        170 
        171     def replicate(self, module, device_ids):
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py in gather(self, outputs, output_device)
        179 
        180     def gather(self, outputs, output_device):
    --> 181         return gather(outputs, output_device, dim=self.dim)
        182 
        183 
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py in gather(outputs, target_device, dim)
         76     # Setting the function to None clears the refcycle.
         77     try:
    ---> 78         res = gather_map(outputs)
         79     finally:
         80         gather_map = None
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py in gather_map(outputs)
         61         out = outputs[0]
         62         if isinstance(out, torch.Tensor):
    ---> 63             return Gather.apply(target_device, dim, *outputs)
         64         if out is None:
         65             return None
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/parallel/_functions.py in forward(ctx, target_device, dim, *inputs)
         73             ctx.unsqueezed_scalar = False
         74         ctx.input_sizes = tuple(i.size(ctx.dim) for i in inputs)
    ---> 75         return comm.gather(inputs, ctx.dim, ctx.target_device)
         76 
         77     @staticmethod
    
    /opt/conda/lib/python3.8/site-packages/torch/nn/parallel/comm.py in gather(tensors, dim, destination, out)
        233                 'device object or string instead, e.g., "cpu".')
        234         destination = _get_device_index(destination, allow_cpu=True, optional=True)
    --> 235         return torch._C._gather(tensors, dim, destination)
        236     else:
        237         if destination is not None:
    
    RuntimeError: Input tensor at index 1 has invalid shape [1, 3085, 8, 10], but expected [1, 3085, 9, 10]
    

    I am using version Python 3.8.8. Which version should it work with?

    opened by sovse 2
  • The to.(self.device) in return

    The to.(self.device) in return

    The inputs.to(self.device) in ConformerConvmodule and FeedForwardModule will cause the network graph in tensorboard to fork and appear kind of messy. Is there any special reason to write like that? Since in most cases we should have send both the model and tensor to the device before we input the tensor to the model, probably no more sending action is needed?

    opened by panjiashu 2
  • Invalid size error when running usage in README

    Invalid size error when running usage in README

    Hello sooftware, thank you very much for your wonderful work!

    When I run the sample code in Usage of README:

    import torch
    import torch.nn as nn
    from conformer import Conformer
    
    batch_size, sequence_length, dim = 3, 12345, 80
    
    cuda = torch.cuda.is_available()  
    device = torch.device('cuda' if cuda else 'cpu')
    
    inputs = torch.rand(batch_size, sequence_length, dim).to(device)
    input_lengths = torch.IntTensor([12345, 12300, 12000])
    targets = torch.LongTensor([[1, 3, 3, 3, 3, 3, 4, 5, 6, 2],
                                [1, 3, 3, 3, 3, 3, 4, 5, 2, 0],
                                [1, 3, 3, 3, 3, 3, 4, 2, 0, 0]]).to(device)
    target_lengths = torch.LongTensor([9, 8, 7])
    
    model = nn.DataParallel(Conformer(num_classes=10, input_dim=dim, 
                                      encoder_dim=32, num_encoder_layers=3, 
                                      decoder_dim=32, device=device)).to(device)
    
    # Forward propagate
    outputs = model(inputs, input_lengths, targets, target_lengths)
    
    # Recognize input speech
    outputs = model.module.recognize(inputs, input_lengths)
    

    I got this error:

    Traceback (most recent call last):
      File "/home/xuchutian/ASR/sooftware-conformer/try.py", line 36, in <module>
        outputs = model(inputs, input_lengths, targets, target_lengths)
      File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 156, in forward
        return self.gather(outputs, self.output_device)
      File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
        return gather(outputs, output_device, dim=self.dim)
      File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
        res = gather_map(outputs)
      File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
        return Gather.apply(target_device, dim, *outputs)
      File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 68, in forward
        return comm.gather(inputs, ctx.dim, ctx.target_device)
      File "/home/yangyi/anaconda3/lib/python3.8/site-packages/torch/cuda/comm.py", line 165, in gather
        return torch._C._gather(tensors, dim, destination)
    RuntimeError: Gather got an input of invalid size: got [1, 3085, 8, 10], but expected [1, 3085, 9, 10]
    

    May I ask how to solve this error?

    Thank you very much.

    opened by chutianxu 2
  • use relative import

    use relative import

    The import path is now absolute, which requires users to install or configuring the python path before using. However, this can be improved with relative import, so users can use the package without installing it first.

    opened by bridgream 1
  • Remove device from the argument list

    Remove device from the argument list

    This PR solve #33 by removing device from the argument list, which will require the user to manually put input tensors to device as done in the example code in README.

    The property solution mentioned in #33 is not adopted as it does work with nn.DataParallel.

    When the devices of input tensor and module parameters match, the following to device on the input tensor is not required, which are removed in this PR:

    https://github.com/sooftware/conformer/blob/348e8af6c156dae19e311697cbb22b9581880a12/conformer/encoder.py#L117

    Besides, as positional encoding is created from a buffer whose device is changed with the module, we don't have to call to device here, which is also removed in this PR.

    https://github.com/sooftware/conformer/blob/610a77667aafe533a85001298c522e7079503da4/conformer/attention.py#L147

    opened by enhuiz 1
  • Switching device

    Switching device

    Hi. I notice the model requires passing the device as an argument, which may have not been decided yet at the point of the module initialization. Once the device is decided, it seems we cannot easily change it. Do you consider making the device switchable? One solution may be instead of passing the device, add an attribute:

    @property
    def device(self):
        return next(self.parameters()).device
    
    opened by enhuiz 1
  • cannot import name 'Conformer'

    cannot import name 'Conformer'

    Hi when I tried to import conformer, I got this issue >>> from conformer import Conformer Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/workspace/bert/conformer/conformer.py", line 3, in <module> from conformer import Conformer ImportError: cannot import name 'Conformer' from partially initialized module 'conformer' (most likely due to a circular import) (/workspace/bert/conformer/conformer.py) I did as the installation instruction. Would you please see where I might be wrong? Thanks.

    opened by cyy857 1
  • Fix relative positional multi-head attention layer

    Fix relative positional multi-head attention layer

    I referred to fairseq's conformer layer multi-head attention. [code] I also confirmed that it is training.

    1. math.sqrt(dim) -> math.sqrt(d_head)
    2. Add relative positional encoding module
    3. Fix _relative_shift method - input : B X n_head X T X 2T-1 - output : B X n_head X T X T
    opened by upskyy 0
  • Feature Extraction using Pre-trained Conformer Model

    Feature Extraction using Pre-trained Conformer Model

    Is there any possibility to use pre-trained conformer model for feature extraction on another speech dataset. Have you uploaded your pre-trained model and is there any tutorial how to extract embeddings ? Thank you

    opened by shakeel608 0
  • export onnx

    export onnx

    Hi, I am a little confused, if I want to export the onnx, should I use the forward or the recognize function? The difference seems to be that in the recognize function, the decoder loop num is adaptive according to the encoder outputs

    opened by pengaoao 1
Releases(v1.0)
Owner
Soohwan Kim
Current AI Research Engineer at Kakao Brain.
Soohwan Kim
This porject is intented to build the most accurate model for predicting the porbability of loan default

Estimating-Loan-Default-Probability IBA ML2 Mid-project / Kaggle Competition This porject is intented to build the most accurate model for predicting

Adil Gahramanov 1 Jan 24, 2022
YOLOv5 in PyTorch > ONNX > CoreML > TFLite

This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices evolved over thousands of hours of training and e

Ultralytics 34.1k Dec 31, 2022
Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 01, 2023
Lux AI environment interface for RLlib multi-agents

Lux AI interface to RLlib MultiAgentsEnv For Lux AI Season 1 Kaggle competition. LuxAI repo RLlib-multiagents docs Kaggle environments repo Please let

Jaime 12 Nov 07, 2022
ReSSL: Relational Self-Supervised Learning with Weak Augmentation

ReSSL: Relational Self-Supervised Learning with Weak Augmentation This repository contains PyTorch evaluation code, training code and pretrained model

mingkai 45 Oct 25, 2022
To provide 100 JAX exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts

JaxTon 💯 JAX exercises Mission 🚀 To provide 100 JAX exercises over different sections structured as a course or tutorials to teach and learn for beg

Rohan Rao 512 Jan 01, 2023
A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

How to apply? Create your config.ini file following the example provided in config.ini Choose one of the options below to run: Run with Python3 pip in

Scientific Data Management Group 3 Jun 23, 2022
Repo for parser tensorflow(.pb) and tflite(.tflite)

tfmodel_parser .pb file is the format of tensorflow model .tflite file is the format of tflite model, which usually used in mobile devices before star

1 Dec 23, 2021
Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains This is an accompanying repository to the ICAIL 2021 pap

4 Dec 16, 2021
LBK 26 Dec 28, 2022
[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation This is the official implementation for the method described in Ch

Jiaxing Yan 27 Dec 30, 2022
The PyTorch improved version of TPAMI 2017 paper: Face Alignment in Full Pose Range: A 3D Total Solution.

Face Alignment in Full Pose Range: A 3D Total Solution By Jianzhu Guo. [Updates] 2020.8.30: The pre-trained model and code of ECCV-20 are made public

Jianzhu Guo 3.4k Jan 02, 2023
使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,包含C++和Python两种版本的程序实现。本套程序只依赖opencv库就可以运行, 从而彻底摆脱对任何深度学习框架的依赖。

YOLOP-opencv-dnn 使用OpenCV部署全景驾驶感知网络YOLOP,可同时处理交通目标检测、可驾驶区域分割、车道线检测,三项视觉感知任务,依然是包含C++和Python两种版本的程序实现 onnx文件从百度云盘下载,链接:https://pan.baidu.com/s/1A_9cldU

178 Jan 07, 2023
This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization News: [2020/05/04] Added EGL rendering option for training data g

Shunsuke Saito 1.5k Jan 03, 2023
Structured Data Gradient Pruning (SDGP)

Structured Data Gradient Pruning (SDGP) Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by re

Bradley McDanel 10 Nov 11, 2022
SoGCN: Second-Order Graph Convolutional Networks

SoGCN: Second-Order Graph Convolutional Networks This is the authors' implementation of paper "SoGCN: Second-Order Graph Convolutional Networks" in Py

Yuehao 7 Aug 16, 2022
Anomaly detection in multi-agent trajectories: Code for training, evaluation and the OpenAI highway simulation.

Anomaly Detection in Multi-Agent Trajectories for Automated Driving This is the official project page including the paper, code, simulation, baseline

12 Dec 02, 2022
Spectralformer: Rethinking hyperspectral image classification with transformers

Spectralformer: Rethinking hyperspectral image classification with transformers Danfeng Hong, Zhu Han, Jing Yao, Lianru Gao, Bing Zhang, Antonio Plaza

Danfeng Hong 102 Dec 29, 2022
Official PyTorch implementation of the paper: DeepSIM: Image Shape Manipulation from a Single Augmented Training Sample

DeepSIM: Image Shape Manipulation from a Single Augmented Training Sample (ICCV 2021 Oral) Project | Paper Official PyTorch implementation of the pape

Eliahu Horwitz 393 Dec 22, 2022
This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.

Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation This repository contains PyTorch evaluation code, trainin

Meta Research 45 Dec 20, 2022