A Python wrapper for the high-quality vocoder "World"

Overview

PyWORLD - A Python wrapper of WORLD Vocoder

Linux Windows
Build Status Build Status

WORLD Vocoder is a fast and high-quality vocoder which parameterizes speech into three components:

  1. f0: Pitch contour
  2. sp: Harmonic spectral envelope
  3. ap: Aperiodic spectral envelope (relative to the harmonic spectral envelope)

It can also (re)synthesize speech using these features (see examples below).

For more information, please visit Dr. Morise's WORLD repository and the official website of WORLD Vocoder

APIs

Vocoder Functions

import pyworld as pw
_f0, t = pw.dio(x, fs)    # raw pitch extractor
f0 = pw.stonemask(x, _f0, t, fs)  # pitch refinement
sp = pw.cheaptrick(x, f0, t, fs)  # extract smoothed spectrogram
ap = pw.d4c(x, f0, t, fs)         # extract aperiodicity

y = pw.synthesize(f0, sp, ap, fs) # synthesize an utterance using the parameters

Utility

# Convert speech into features (using default arguments)
f0, sp, ap = pw.wav2world(x, fs)

You can change the default arguments of the function, too. See more info using help.

Installation

Using Pip

pip install pyworld

Building from Source

git clone https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder.git
cd Python-Wrapper-for-World-Vocoder
git submodule update --init
pip install -U pip
pip install -r requirements.txt
pip install .

It will automatically git clone Morise's World Vocoder (C++ version).
(It seems to me that using virtualenv or conda is the best practice.)

Installation Validation

You can validate installation by running

cd demo
python demo.py

to see if you get results in test/ direcotry. (Please avoid writing and executing codes in the Python-Wrapper-for-World-Vocoder folder for now.)

Environment/Dependencies

  • Operating systems
    • Linux Ubuntu 14.04+
    • Windows (thanks to wuaalb)
    • WSL
  • Python
    • 2.7 (Windows is currently not supported)
    • 3.7/3.6/3.5

You can install dependencies these by pip install -r requirements.txt

Notice

  • WORLD vocoder is designed for speech sampled ≥ 16 kHz. Applying WORLD to 8 kHz speech will fail. See a possible workaround here.
  • When the SNR is low, extracting pitch using harvest instead of dio is a better option.

Troubleshooting

  1. Upgrade your Cython version to 0.24.
    (I failed to build it on Cython 0.20.1post0)
    It'll require you to download Cython form http://cython.org/
    Unzip it, and python setup.py install it.
    (I tried pip install Cython but the upgrade didn't seem correct)
    (Again, add --user if you don't have root access.)
  2. Upon executing demo/demo.py, the following code might be needed in some environments (e.g. when you're working on a remote Linux server):
import matplotlib
matplotlib.use('Agg')
  1. If you encounter library not found: sndfile error upon executing demo.py,
    you might have to install it by apt-get install libsoundfile1.
    You can also replace pysoundfile with scipy or librosa, but some modification is needed:

    • librosa:
      • load(fiilename, dtype=np.float64)
      • output.write_wav(filename, wav, fs)
      • remember to pass dtype argument to ensure that the method gives you a double.
    • scipy:
      • You'll have to write a customized utility function based on the following methods
      • scipy.io.wavfile.read (but this gives you short)
      • scipy.io.wavfile.write
  2. If you have installation issue on Windows, I probably could not provide much help because my development environment is Ubuntu and Windows Subsystem for Linux (read this if you are interested in installing it).

Other Installation Suggestions

  1. Use pip install . is safer and you can easily uninstall pyworld by pip uninstall pyworld
  • For Mac users: You might need to do MACOSX_DEPLOYMENT_TARGET=10.9 pip install . See issue.
  1. Another way to install pyworld is via
    python setup.py install
    • Add --user if you don't have root access
    • Add --record install.txt to track the installation dir
  2. If you just want to try out some experiments, execute
    python setup.py build_ext --inplace
    Then you can use PyWorld from this directory.
    You can also copy the resulting pyworld.so (pyworld.{arch}.pyd on Windows) file to ~/.local/lib/python2.7/site-packages (or corresponding Windows directory) so that you can use it everywhere like an installed package.
    Alternatively you can copy/symlink the compiled files using pip, e.g. pip install -e .

Acknowledgement

Thank all contributors (tats-u, wuaalb, r9y9, rikrd, kudan2510) for making this repo better and sotelo whose world.py inspired this repo.

Owner
Jeremy Hsu
A PhD student drowning in the ocean of generative models.
Jeremy Hsu
A Simple Script that will help you to Play / Change Songs with just your Voice

Auto-Spotify using Voice Recognition A Simple Script that will help you to Play / Change Songs with just your Voice Explore the docs » Table of Conten

Mehul Shah 1 Nov 21, 2021
Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Batch Sorting Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files accord

David Mainoo 1 Oct 29, 2021
Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / 筆墨クミDeepvocal中文音源

Hitsuboku Kumi (筆墨クミ) is a UTAU virtual singer developed by Cubialpha. This project ports Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. This is the first open-source deepvocal voicebank on Gith

8 Apr 26, 2022
ianZiPu is a way to write notation for Guqin (古琴) music.

PyBetween Wrapper for Between - 비트윈을 위한 파이썬 라이브러리 Legal Disclaimer 오직 교육적 목적으로만 사용할수 있으며, 비트윈은 VCNC의 자산입니다. 악의적 공격에 이용할시 처벌 받을수 있습니다. 사용에 따른 책임은 사용자가

Nancy Yi Liang 8 Nov 25, 2022
Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

Pitcher.py Free & OS emulation of the SP-12 & SP-1200 signal chain (now with GUI) Pitch shift / bitcrush / resample audio files Written and tested in

morgan 13 Oct 03, 2022
Telegram Voice-Chat Bot Written In Python Using Pyrogram.

Telegram Voice-Chat Bot Telegram Voice-Chat Bot To Play Music From Various Sources In Your Group Support All linux based os. Windows Mac Diagram Requi

TheHamkerCat 314 Dec 29, 2022
GNU Radio – the Free and Open Software Radio Ecosystem

GNU Radio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios. It can be used wit

GNU Radio 4.1k Jan 06, 2023
A voice control utility for Spotify

Spotify Voice Control A voice control utility for Spotify · Report Bug · Request

Shoubhit Dash 27 Jan 01, 2023
Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

MediumVC MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utter

谷下雨 47 Dec 25, 2022
We built this fully functioning Music player in Python. The music player allows you to play/pause and switch to different songs easily.

We built this fully functioning Music player in Python. The music player allows you to play/pause and switch to different songs easily.

1 Nov 19, 2021
live coding in python + supercollider

live coding in python + supercollider

Zack 6 Feb 06, 2022
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Summary Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the pack

Audiovisual Communications Laboratory 1k Jan 09, 2023
AudioDVP:Photorealistic Audio-driven Video Portraits

AudioDVP This is the official implementation of Photorealistic Audio-driven Video Portraits. Major Requirements Ubuntu = 18.04 PyTorch = 1.2 GCC =

232 Jan 03, 2023
Desktop music recognition application for windows

MusicRecognizer Music recognition application for windows You can choose from which of the devices the recording will be made. If you choose speakers,

Nikita Merzlyakov 28 Dec 13, 2022
A GUI-based audio player with support for a large variety of formats

Miza-Player A GUI-based audio player with support for a large variety of formats, able to play from web-hosted media platforms such as YouTube, includ

Thomas Xin 3 Dec 14, 2022
Generating a structured library of .wav samples with Python.

sample-library Scripts for generating a structured sample library with Python Requires Docker about Samples are written to wave files in lib/. Differe

Ben Mangold 1 Nov 11, 2021
nicfit 425 Jan 01, 2023
Algorithmic and AI MIDI Drums Generator Implementation

Algorithmic and AI MIDI Drums Generator Implementation

Tegridy Code 8 Dec 30, 2022
Simple, hackable offline speech to text - using the VOSK-API.

Nerd Dictation Offline Speech to Text for Desktop Linux. This is a utility that provides simple access speech to text for using in Linux without being

Campbell Barton 844 Jan 07, 2023
cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python

audioread Decode audio files using whichever backend is available. The library currently supports: Gstreamer via PyGObject. Core Audio on Mac OS X via

beetbox 419 Dec 26, 2022