Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

Overview

Wav2CLIP

🚧 WIP 🚧

Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP 📄 🔗

Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and generation, and show that Wav2CLIP can outperform several publicly available pre-trained audio representation algorithms. Wav2CLIP projects audio into a shared embedding space with images and text, which enables multimodal applications such as zero-shot classification, and cross-modal retrieval. Furthermore, Wav2CLIP needs just ~10% of the data to achieve competitive performance on downstream tasks compared with fully supervised models, and is more efficient to pre-train than competing methods as it does not require learning a visual model in concert with an auditory model. Finally, we demonstrate image generation from Wav2CLIP as qualitative assessment of the shared embedding space. Our code and model weights are open sourced and made available for further applications.

Installation

pip install wav2clip

Usage

Clip-Level Embeddings

import wav2clip

model = wav2clip.get_model()
embeddings = wav2clip.embed_audio(audio, model)

Frame-Level Embeddings

import wav2clip

model = wav2clip.get_model(frame_length=16000, hop_length=16000)
embeddings = wav2clip.embed_audio(audio, model)
Comments
  • request of projection layer weight

    request of projection layer weight

    Hi @hohsiangwu , Thanks for great work! Request pre-trained weights of image_transform (MLP layer) for audio-image-language joint embedding space.

    Currently, only audio encoders seem to exist in the get_model function. Is there any big problem if I use CLIP embedding (text or image) without projection layer?

    opened by SeungHeonDoh 2
  • Initial checkin for accessing pre-trained model via pip install

    Initial checkin for accessing pre-trained model via pip install

    I am considering using the release feature of GitHub to host model weights, once the url is added to MODEL_WEIGHTS_URL, and the repository is made public, we should be able to model = torch.hub.load('descriptinc/lyrebird-wav2clip', 'wav2clip', pretrained=True)

    opened by hohsiangwu 1
  • Adding VQGAN-CLIP with modification to generate audio

    Adding VQGAN-CLIP with modification to generate audio

    • Adding a working snapshot of original generate.py from https://github.com/nerdyrodent/VQGAN-CLIP/
    • Modify to add audio related params and functions
    • Add scripts to generate image and video with options for conditioning and interpolation
    opened by hohsiangwu 0
  • Supervised scenario no transform

    Supervised scenario no transform

    In the supervise scenario in the __init__.py the transform flag is not set to True, so the model doesn't contain the MLP layer after training. I'm wondering how you train the MLP layer when using as pretrained.

    opened by alirezadir 0
  • Integrated into VQGAN+CLIP 3D Zooming notebook

    Integrated into VQGAN+CLIP 3D Zooming notebook

    Dear researchers,

    I integrated Wav2CLIP into a VQGAN+CLIP animation notebook.

    It is available on colab here: https://colab.research.google.com/github/pollinations/hive/blob/main/notebooks/2%20Text-To-Video/1%20CLIP-Guided%20VQGAN%203D%20Turbo%20Zoom.ipynb

    I'm part of a team creating an open-source generative art platform called Pollinations.AI. It's also possible to use through our frontend if you are interested. https://pollinations.ai/p/QmT7yt67DF3GF4wd2vyw6bAgN3QZx7Xpnoyx98YWEsEuV7/create

    Here is an example output: https://user-images.githubusercontent.com/5099901/168467451-f633468d-e596-48f5-8c2c-2dc54648ead3.mp4

    opened by voodoohop 0
  • The details concerning loading raw audio files

    The details concerning loading raw audio files

    Hi !

    I haved imported the wave2clip as a package, however when testing, the inputs for the model to extract features are not original audio files. Thus can you provided the details to load the audio files to processed data for the model?

    opened by jinx2018 0
  • torch version

    torch version

    Hi, thanks for sharing the wonderful work! I encountered some issues during pip installing it, so may I ask what is the torch version you used? I cannot find the requirement of this project. Thanks!

    opened by annahung31 0
  • Error when importing after fresh installation on colab

    Error when importing after fresh installation on colab

    What CUDA and Python versions have you tested the pip package in? After installation on a fresh collab I receive the following error:


    OSError Traceback (most recent call last) in () ----> 1 import wav2clip

    7 frames /usr/local/lib/python3.7/dist-packages/wav2clip/init.py in () 2 import torch 3 ----> 4 from .model.encoder import ResNetExtractor 5 6

    /usr/local/lib/python3.7/dist-packages/wav2clip/model/encoder.py in () 4 from torch import nn 5 ----> 6 from .resnet import BasicBlock 7 from .resnet import ResNet 8

    /usr/local/lib/python3.7/dist-packages/wav2clip/model/resnet.py in () 3 import torch.nn as nn 4 import torch.nn.functional as F ----> 5 import torchaudio 6 7

    /usr/local/lib/python3.7/dist-packages/torchaudio/init.py in () ----> 1 from torchaudio import _extension # noqa: F401 2 from torchaudio import ( 3 compliance, 4 datasets, 5 functional,

    /usr/local/lib/python3.7/dist-packages/torchaudio/_extension.py in () 25 26 ---> 27 _init_extension()

    /usr/local/lib/python3.7/dist-packages/torchaudio/_extension.py in _init_extension() 19 # which depends on libtorchaudio and dynamic loader will handle it for us. 20 if path.exists(): ---> 21 torch.ops.load_library(path) 22 torch.classes.load_library(path) 23 # This import is for initializing the methods registered via PyBind11

    /usr/local/lib/python3.7/dist-packages/torch/_ops.py in load_library(self, path) 108 # static (global) initialization code in order to register custom 109 # operators with the JIT. --> 110 ctypes.CDLL(path) 111 self.loaded_libraries.add(path) 112

    /usr/lib/python3.7/ctypes/init.py in init(self, name, mode, handle, use_errno, use_last_error) 362 363 if handle is None: --> 364 self._handle = _dlopen(self._name, mode) 365 else: 366 self._handle = handle

    OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory

    opened by janzuiderveld 0
Releases(v0.1.0-alpha)
Owner
Descript
Descript
Simple, but essential Bayesian optimization package

BayesO: A Bayesian optimization framework in Python Simple, but essential Bayesian optimization package. http://bayeso.org Online documentation Instal

Jungtaek Kim 74 Dec 05, 2022
Open standard for machine learning interoperability

Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides

Open Neural Network Exchange 13.9k Dec 30, 2022
Airbus Ship Detection Challenge

Airbus Ship Detection Challenge This is an open solution to the Airbus Ship Detection Challenge. Our goals We are building entirely open solution to t

minerva.ml 55 Nov 29, 2022
A privacy-focused, intelligent security camera system.

Self-Hosted Home Security Camera System A privacy-focused, intelligent security camera system. Features: Multi-camera support w/ minimal configuration

Scott Barnes 175 Jan 01, 2023
Differentiable Wavetable Synthesis

Differentiable Wavetable Synthesis

4 Feb 11, 2022
PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks This repo contains the PyTorch implementation of the ACL, 2021 pa

Rabeeh Karimi Mahabadi 98 Dec 28, 2022
Some useful blender add-ons for SMPL skeleton's poses and global translation.

Blender add-ons for SMPL skeleton's poses and trans There are two blender add-ons for SMPL skeleton's poses and trans.The first is for making an offli

犹在镜中 154 Jan 04, 2023
Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Non-Metric Space Library (NMSLIB) Important Notes NMSLIB is generic but fast, see the results of ANN benchmarks. A standalone implementation of our fa

2.9k Jan 04, 2023
Jittor implementation of Recursive-NeRF: An Efficient and Dynamically Growing NeRF

Recursive-NeRF: An Efficient and Dynamically Growing NeRF This is a Jittor implementation of Recursive-NeRF: An Efficient and Dynamically Growing NeRF

33 Nov 30, 2022
A collection of SOTA Image Classification Models in PyTorch

A collection of SOTA Image Classification Models in PyTorch

sithu3 85 Dec 30, 2022
ScriptProfilerPy - Module to visualize where your python script is slow

ScriptProfiler helps you track where your code is slow It provides: Code lines t

Lucas BLP 3 Jun 02, 2022
An Implicit Function Theorem (IFT) optimizer for bi-level optimizations

iftopt An Implicit Function Theorem (IFT) optimizer for bi-level optimizations. Requirements Python 3.7+ PyTorch 1.x Installation $ pip install git+ht

The Money Shredder Lab 2 Dec 02, 2021
Pytorch implemenation of Stochastic Multi-Label Image-to-image Translation (SMIT)

SMIT: Stochastic Multi-Label Image-to-image Translation This repository provides a PyTorch implementation of SMIT. SMIT can stochastically translate a

Biomedical Computer Vision Group @ Uniandes 37 Mar 01, 2022
🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

🏅 Collection of Kaggle Solutions and Ideas 🏅

Farid Rashidi 2.3k Jan 08, 2023
HDMapNet: A Local Semantic Map Learning and Evaluation Framework

HDMapNet_devkit Devkit for HDMapNet. HDMapNet: A Local Semantic Map Learning and Evaluation Framework Qi Li, Yue Wang, Yilun Wang, Hang Zhao [Paper] [

Tsinghua MARS Lab 421 Jan 04, 2023
Garbage Detection system which will detect objects based on whether it is plastic waste or plastics or just garbage.

Garbage Detection using Yolov5 on Jetson Nano 2gb Developer Kit. Garbage detection system which will detect objects based on whether it is plastic was

Rishikesh A. Bondade 2 May 13, 2022
Image segmentation with private İstanbul Dataset

Image Segmentation This repo was created for academic research and test result. Repo will update after academic article online. This repo contains wei

İrem KÖMÜRCÜ 9 Dec 11, 2022
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 03, 2023
pixelNeRF: Neural Radiance Fields from One or Few Images

pixelNeRF: Neural Radiance Fields from One or Few Images Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa UC Berkeley arXiv: http://arxiv.org/abs/2

Alex Yu 1k Jan 04, 2023
Improving the robustness and performance of biomedical NLP models through adversarial training

RobustBioNLP Improving the robustness and performance of biomedical NLP models through adversarial training In this repository you can find suppliment

Milad Moradi 3 Sep 20, 2022