SinGlow: Generative Flow for SVS tasks in Tensorflow 2

Related tags

AudioSinGlow
Overview

SinGlow: Generative Flow for SVS tasks in Tensorflow 2

python 3 tensorflow 2

See more in the paper: SinGlow: Singing Voice Synthesis with Glow ---- Help Virtual Singers More Human-like

SinGlow is a part of my Singing voice synthesis system. It can extract features of sound, particularly songs and musics. Then we can use these features (or perfect encoding) for feature migrating tasks. For example migrate features of real singers' song to those virtual singers' songs.

This project is developed above the project GLOW-tf2 under MIT licence, and the following words are from its developers.

My implementation of GLOW from the paper https://arxiv.org/pdf/1807.03039 in Tensorflow 2. GLOW is an interesting generative model as it uses invertible neural network to transform images to normal distribution and vice versa. Additionally, it is strongly based on RealNVP, so knowing it would be helpful to understand GLOW's contribution.

Table of Contents


Abstract

Singing voice synthesis (SVS) is a task using the computer to generate songs with lyrics. So far, researchers are focusing on tunning the pre-recorded sound pieces according to rigid rules. For example, in Vocaloid, one of the commercial SVS systems, there are 8 principal parameters modifiable by song creators. The system uses these parameters to synthesise sound pieces pre-recorded from professional voice actors. We notice a common difference between computer-generated songs and real singers' songs. This difference can be addressed to help the generated ones become more like the real-singer ones.

In this paper, we propose SinGlow, as a solution to minimise this difference. SinGlow is one of the Normalising Flow that directly uses the calculated Negative Log-Likelihood value to optimise the trainable parameters. This feature gives SinGlow the ability to perfectly encode inputs into feature vectors, which allows us to manipulate the feature space to minimise the difference we discussed before. To our best knowledge, we are the first to propose an application of Normalising Flow in SVS fields.

In our experiments, SinGlow shows the ability to encode sound and make the input virtual-singer songs more human-like.

Structure

SinGlow
│   train.py //need modification, replace data dirs with yours
│   common_definitions.py //model configurations are located here
│   data_loarder.py //construct tfrecord dataset from wav or mp3 data, and load it
│   model.py //Glow Model / SinGlow Model
│   pipeline.py //training pipeline
│   README.md
├───utils
│       utils.py //originate from Glow-OpenAI
│       weightnorm.py //originate from Tensorflow
├───checkpoints
│       weights.h5 //the model weights file
├───runs //outputs and rerecords
├───logs //tensorboard logdir
├───design //model architecture information
└───notebooks
        run.ipynb //dataset construction and applying model
        experiment.ipynb //evaluate model
        README.md //some user guide

Requirements

pip3 install -r requirements.txt

Training

After every epoch, the network's weights will be stored in the checkpoints directory defined in common_definitions.py.

There are also some sampling of the network (image generation mode) that are going to be stored in results directory. Additionally, TensorBoard is used to track z's mean and variance, as well as the negative log-likelihood.

In optimal state, z should have zero mean and one variance. Additionally, the TensorBoard stores sampling with temperature of 0.7.

python3 train.py [-h] [--dataset [DATASET]] [--k_glow [K_GLOW]] [--l_glow [L_GLOW]]
       [--img_size [IMG_SIZE]] [--channel_size [CHANNEL_SIZE]]

optional arguments:
  -h, --help            show this help message and exit
  --dataset [DATASET]   The dataset to train on ("mnist", "cifar10", "cifar100")
  --k_glow [K_GLOW]     The amount of blocks per layer
  --l_glow [L_GLOW]     The amount of layers
  --img_size [IMG_SIZE] The width and height of the input images
  --channel_size [CHANNEL_SIZE]
                        The channel size of the input images

CONTRIBUTING

To contribute to the project, these steps can be followed. Anyone that contributes will surely be recognized and mentioned here!

Contributions to the project are made using the "Fork & Pull" model. The typical steps would be:

  1. create an account on github
  2. fork this repository
  3. make a local clone
  4. make changes on the local copy
  5. commit changes git commit -m "my message"
  6. push to your GitHub account: git push origin
  7. create a Pull Request (PR) from your GitHub fork (go to your fork's webpage and click on "Pull Request." You can then add a message to describe your proposal.)

LICENSE

This open-source project is licensed under MIT License.

Reference

TODO the reference information

中文注释

这是一个基于流模型的歌曲特征提取,并进行风格迁移的项目。我们一定程度上实现了将真实人声歌曲的特征迁移到虚拟歌手的歌曲上。

我们接下来的计划是继续优化模型,并在歌曲切割上取得进展,向着研究落地努力。


我们有一个堆满创意点子的秘密基地,里面有很多有意思的小伙伴。生活什么的、技术什么的、二次元什么的都可以聊得开。

欢迎加入我们的小群:兔叽的魔术工房。群内会经常发布各种各样的企划,总会遇上你感兴趣的。

Owner
Haobo Yang
A 3rd-year undergraduate student, hope to be an AI Architect in the future.
Haobo Yang
The official repository for Audio ALBERT

AALBERT Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-

pohan 55 Dec 11, 2022
Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Sync Toolbox - Python package with reference implementations for efficient, robust, and accurate music synchronization based on dynamic time warping (DTW)

Meinard Mueller 66 Jan 02, 2023
A2DP agent for promiscuous/permissive audio sinc.

Promiscuous Bluetooth audio sinc A2DP agent for promiscuous/permissive audio sinc for Linux. Once installed, a Bluetooth client, such as a smart phone

Jasper Aorangi 4 May 27, 2022
?️ Open Source Audio Matching and Mastering

Matching + Mastering = ❤️ Matchering 2.0 is a novel Containerized Web Application and Python Library for audio matching and mastering. It follows a si

Sergey Grishakov 781 Jan 05, 2023
Music bot of # Owner

Pokimane-Music Music bot of # Owner How To Host The easiest way to deploy this Bot Support Channel :- TeamDlt Support Group :- TeamDlt Please fork thi

5 Dec 23, 2022
Enhanced Audio Player for Discord

Discodo is an enhanced audio player for discord

Mary 42 Oct 05, 2022
praudio provides audio preprocessing framework for Deep Learning audio applications

praudio provides objects and a script for performing complex preprocessing operations on entire audio datasets with one command.

Valerio Velardo 105 Dec 26, 2022
SomaFM Plugin for Kodi

SomaFM XBMC Plugin This description is a bit outdated. You can simply install this addon by browsing the official repositories from within Kodi. Insta

7 Jan 21, 2022
A GUI-based audio player with support for a large variety of formats

Miza-Player A GUI-based audio player with support for a large variety of formats, able to play from web-hosted media platforms such as YouTube, includ

Thomas Xin 3 Dec 14, 2022
A python package for calculating the PESQ.

PyPESQ (WIP) Pypesq is a python wrapper for the PESQ score calculation C routine. It only can be used in evaluation purpose. INSTALL pip install https

Jingdong Li 269 Dec 18, 2022
A voice control utility for Spotify

Spotify Voice Control A voice control utility for Spotify · Report Bug · Request

Shoubhit Dash 27 Jan 01, 2023
A fast MDCT implementation using SciPy and FFTs

MDCT A fast MDCT implementation using SciPy and FFTs Installation As usual pip install mdct Dependencies NumPy SciPy STFT Usage import mdct spectrum

Nils Werner 43 Sep 02, 2022
Anki vector Music ❤ is the best and only Telegram VC player with playlists, Multi Playback, Channel play and more

Anki Vector Music 🎵 A bot that can play music on Telegram Group and Channel Voice Chats Available on telegram as @Anki Vector Music Features 🔥 Thumb

Damantha Jasinghe 12 Nov 12, 2022
A python wrapper for REAPER

pyreaper A python wrapper for REAPER (Robust Epoch And Pitch EstimatoR) Installation pip install pyreaper Demonstration notebnook http://nbviewer.jupy

Ryuichi Yamamoto 56 Dec 27, 2022
Terminal-based audio-to-text converter

att Terminal-based audio-to-text converter Project description A terminal-based audio-to-text converter written in python, enabling you to convert .wa

Sven Eschlbeck 4 Dec 15, 2022
Inner ear models for Python

cochlea cochlea is a collection of inner ear models. All models are easily accessible as Python functions. They take sound signal as input and return

98 Jan 05, 2023
Python game programming in Jupyter notebooks.

Jupylet Jupylet is a Python library for programming 2D and 3D games, graphics, music and sound synthesizers, interactively in a Jupyter notebook. It i

Nir Aides 178 Dec 09, 2022
Graphical interface to control granular sound synthesis.

Granular sound synthesis interface SoundGrain is a graphical interface where users can draw and edit trajectories to control granular sound synthesis

Olivier Bélanger 122 Dec 10, 2022
User-friendly Voice Cloning Application

Multi-Language-RTVC stands for Multi-Language Real Time Voice Cloning and is a Voice Cloning Tool capable of transfering speaker-specific audio featur

Sven Eschlbeck 19 Dec 30, 2022
A Music Player Bot for Discord Servers

A Music Player Bot for Discord Servers

Halil Acar 2 Oct 25, 2021