A Python wrapper for the high-quality vocoder "World"

Last update: Dec 15, 2022

Related tags

Overview

PyWORLD - A Python wrapper of WORLD Vocoder

`Linux`	`Windows`

WORLD Vocoder is a fast and high-quality vocoder which parameterizes speech into three components:

f0: Pitch contour
sp: Harmonic spectral envelope
ap: Aperiodic spectral envelope (relative to the harmonic spectral envelope)

It can also (re)synthesize speech using these features (see examples below).

For more information, please visit Dr. Morise's WORLD repository and the official website of WORLD Vocoder

APIs

Vocoder Functions

import pyworld as pw
_f0, t = pw.dio(x, fs)    # raw pitch extractor
f0 = pw.stonemask(x, _f0, t, fs)  # pitch refinement
sp = pw.cheaptrick(x, f0, t, fs)  # extract smoothed spectrogram
ap = pw.d4c(x, f0, t, fs)         # extract aperiodicity

y = pw.synthesize(f0, sp, ap, fs) # synthesize an utterance using the parameters

Utility

# Convert speech into features (using default arguments)
f0, sp, ap = pw.wav2world(x, fs)

You can change the default arguments of the function, too. See more info using help.

Installation

Using Pip

pip install pyworld

Building from Source

git clone https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder.git
cd Python-Wrapper-for-World-Vocoder
git submodule update --init
pip install -U pip
pip install -r requirements.txt
pip install .

It will automatically git clone Morise's World Vocoder (C++ version).
(It seems to me that using virtualenv or conda is the best practice.)

Installation Validation

You can validate installation by running

cd demo
python demo.py

to see if you get results in test/ direcotry. (Please avoid writing and executing codes in the Python-Wrapper-for-World-Vocoder folder for now.)

Environment/Dependencies

Operating systems
- Linux Ubuntu 14.04+
- Windows (thanks to wuaalb)
- WSL
Python
- 2.7 (Windows is currently not supported)
- 3.7/3.6/3.5

You can install dependencies these by pip install -r requirements.txt

Notice

WORLD vocoder is designed for speech sampled ≥ 16 kHz. Applying WORLD to 8 kHz speech will fail. See a possible workaround here.
When the SNR is low, extracting pitch using harvest instead of dio is a better option.

Troubleshooting

Upgrade your Cython version to 0.24.
(I failed to build it on Cython 0.20.1post0)
It'll require you to download Cython form http://cython.org/
Unzip it, and python setup.py install it.
(I tried pip install Cython but the upgrade didn't seem correct)
(Again, add --user if you don't have root access.)
Upon executing demo/demo.py, the following code might be needed in some environments (e.g. when you're working on a remote Linux server):

import matplotlib
matplotlib.use('Agg')

If you encounter library not found: sndfile error upon executing demo.py,
you might have to install it by apt-get install libsoundfile1.
You can also replace pysoundfile with scipy or librosa, but some modification is needed:
- librosa:
  - load(fiilename, dtype=np.float64)
  - output.write_wav(filename, wav, fs)
  - remember to pass dtype argument to ensure that the method gives you a double.
- scipy:
  - You'll have to write a customized utility function based on the following methods
  - scipy.io.wavfile.read (but this gives you short)
  - scipy.io.wavfile.write
If you have installation issue on Windows, I probably could not provide much help because my development environment is Ubuntu and Windows Subsystem for Linux (read this if you are interested in installing it).

Other Installation Suggestions

Use pip install . is safer and you can easily uninstall pyworld by pip uninstall pyworld

For Mac users: You might need to do MACOSX_DEPLOYMENT_TARGET=10.9 pip install . See issue.

Another way to install pyworld is via
python setup.py install
- Add --user if you don't have root access
- Add --record install.txt to track the installation dir
If you just want to try out some experiments, execute
python setup.py build_ext --inplace
Then you can use PyWorld from this directory.
You can also copy the resulting pyworld.so (pyworld.{arch}.pyd on Windows) file to ~/.local/lib/python2.7/site-packages (or corresponding Windows directory) so that you can use it everywhere like an installed package.
Alternatively you can copy/symlink the compiled files using pip, e.g. pip install -e .

Acknowledgement

Thank all contributors (tats-u, wuaalb, r9y9, rikrd, kudan2510) for making this repo better and sotelo whose world.py inspired this repo.

A Python wrapper for the high-quality vocoder "World"

Related tags

Overview

PyWORLD - A Python wrapper of WORLD Vocoder

APIs

Vocoder Functions

Utility

Installation

Using Pip

Building from Source

Installation Validation

Environment/Dependencies

Notice

Troubleshooting

Other Installation Suggestions

Acknowledgement

Owner

Jeremy Hsu

A Simple Script that will help you to Play / Change Songs with just your Voice

Using python to generate a bat script of repetitive lines of code that differ in some way but can sort out a group of audio files according to their common names

Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / 筆墨クミDeepvocal中文音源

ianZiPu is a way to write notation for Guqin (古琴) music.

Audio pitch-shifting & re-sampling utility, based on the EMU SP-1200

Telegram Voice-Chat Bot Written In Python Using Pyrogram.

GNU Radio – the Free and Open Software Radio Ecosystem

A voice control utility for Spotify

Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

We built this fully functioning Music player in Python. The music player allows you to play/pause and switch to different songs easily.

live coding in python + supercollider

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

AudioDVP:Photorealistic Audio-driven Video Portraits

Desktop music recognition application for windows

A GUI-based audio player with support for a large variety of formats

Generating a structured library of .wav samples with Python.

eyeD3 is a Python module and command line program for processing ID3 tags. Information about mp3 files (i.e bit rate, sample frequency, play time, etc.) is also provided. The formats supported are ID3v1 (1.0/1.1) and ID3v2 (2.3/2.4).

Algorithmic and AI MIDI Drums Generator Implementation

Simple, hackable offline speech to text - using the VOSK-API.

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python