GitHub

Unofficial Implementation of FFTNet vocode paper.

implement the model.
implement tests.
overfit on a single batch (sanity check).
linearize weights for eval time.
measure the run-time on GPU and CPU. (1 sec audio takes ~47 secs) If anyone knows additional tricks from the paper, let me know. So far I asked the authors but nobody returned.
train on LJSpeech spectrograms.
distill model as in Parallel WaveNet paper.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
.compute		.compute
.gitignore		.gitignore
.install		.install
LICENSE.txt		LICENSE.txt
README.md		README.md
TODO.txt		TODO.txt
audio.py		audio.py
check-dataloader.ipynb		check-dataloader.ipynb
conf.json		conf.json
conf_test_train.json		conf_test_train.json
dataset.py		dataset.py
extract_mel.py		extract_mel.py
generic_utils.py		generic_utils.py
model.py		model.py
mulaw-encode.ipynb		mulaw-encode.ipynb
requirements.txt		requirements.txt
run_time_test.py		run_time_test.py
setup.py		setup.py
test.py		test.py
test_conf.json		test_conf.json
test_train.py		test_train.py
train.py		train.py
visual.py		visual.py