A Python/Pytorch app for easily synthesising human voices

Overview

Voice Cloning App

CircleCI Discord codecov comment comment

A Python/Pytorch app for easily synthesising human voices

Preview

Documentation

Discord Server

Video guide

Voice Sharing Hub

FAQ's

System Requirements

  • Windows 10 or Ubuntu 20.04+ operating system
  • 5GB+ Disk space
  • NVIDIA GPU with at least 4GB of memory & driver version 456.38+ (optional)

Key features

  • Automatic dataset generation (with support for subtitles and audiobooks)
  • Additional language support
  • Local & remote training
  • Easy train start/stop
  • Data importing/exporting
  • Multi GPU support

Manual Guides

Future Improvements

  • Add support for Talknet
  • Add GTA alignment for Hifi-gan
  • Improved batch size estimation
  • AMD GPU support

Other resources

Acknowledgements

This project uses a reworked version of Tacotron2. All rights for belong to NVIDIA and follow the requirements of their BSD-3 licence.

Additionally, the project uses DSAlign, Silero, DeepSpeech & hifi-gan.

Thank you to Dr. John Bustard at Queen's University Belfast for his support throughout the project.

Supported by uberduck.ai, reach out to them for live model hosting.

Also a big thanks to the members of the VocalSynthesis subreddit for their feedback.

Finally thank you to everyone raising issues and contributing to the project.

Comments
  • Transcription error: wav file is empty

    Transcription error: wav file is empty

    Hello

    I am running the Voice-Cloning-App.exe on Windows 10. I have a GeForce RTX 2060 Graphics Card with the GeForce Game Ready Driver Version 461.92.

    When I attempt build the data set, the windows console stops after the following:

    [12644] WARNING: file already exists but should not: C:\Users\GREGOR~1\AppData\Local\Temp_MEI126442\torch_C.cp38-win_amd64.pyd Server initialized for threading. Server initialized for threading. pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available. torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. INFO:matplotlib.font_manager:Generating new fontManager, this may take some time... [nltk_data] Downloading package wordnet to C:\Users\GREGOR~1\AppData\L [nltk_data] ocal\Temp_MEI126442\nltk_data... [nltk_data] Package wordnet is already up-to-date! WARNING:werkzeug:WebSocket transport not available. Install eventlet or gevent and gevent-websocket for improved performance.

    • Serving Flask app "main" (lazy loading)
    • Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
    • Debug mode: off INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:25] "GET / HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "POST / HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "GET /static/error.css HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:56:57] "GET /favicon.ico HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:11] "GET / HTTP/1.1" 200 - Starting Thread INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "POST / HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet OPEN data {'sid': 'qINJoZN0iSsAW66FAAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000} INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet OPEN data {'sid': 'qINJoZN0iSsAW66FAAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000} INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmkr HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Received packet MESSAGE data 0/voice, INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet MESSAGE data 0/voice, qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 0/voice,{"sid":"hvDlhnRAa1GAVtomAAAB"} INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 0/voice,{"sid":"hvDlhnRAa1GAVtomAAAB"} INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "POST /socket.io/?EIO=4&transport=polling&t=NXjKmlA&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmlB&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading audio from data\datasets\JamesEarlJones\audio.mp3..."}] INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:57:47] "GET /socket.io/?EIO=4&transport=polling&t=NXjKmlb&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading audio from data\datasets\JamesEarlJones\audio.mp3..."}] INFO:voice:Loading audio from data\datasets\JamesEarlJones\audio.mp3... emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\datasets\JamesEarlJones\text.txt..."}] INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjKnxd&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\datasets\JamesEarlJones\text.txt..."}] INFO:voice:Loading script from data\datasets\JamesEarlJones\text.txt... emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}] INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}] INFO:voice:Fetching segments... INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjKrgH&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:07] "GET /socket.io/?EIO=4&transport=polling&t=NXjKrgS&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:07] "POST /socket.io/?EIO=4&transport=polling&t=NXjKsst&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - emitting event "logs" to all [/voice] INFO:socketio.server:emitting event "logs" to all [/voice] qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Transcribing segments..."}] INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:20] "GET /socket.io/?EIO=4&transport=polling&t=NXjKssu&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Transcribing segments..."}] INFO:voice:Transcribing segments... Using cache found in C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master torchaudio\backend\utils.py:63: UserWarning: The interface of "soundfile" backend is planned to change in 0.8.0 to match that of "sox_io" backend and the current interface will be removed in 0.9.0. To use the new interface, do torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False before setting the backend to "soundfile". Please refer to https://github.com/pytorch/audio/issues/903 for the detail. Exception in thread Thread-13: Traceback (most recent call last): File "application\utils.py", line 47, in background_task max_seqlength = max(max([len(_) for _ in batch]), 12800) File "application\utils.py", line 32, in create_dataset if wav.size(0) > 1: File "dataset\forced_alignment\align.py", line 123, in align File "dataset\transcribe.py", line 34, in stt File "dataset\transcribe.py", line 16, in transcribe File "torch\hub.py", line 370, in load File "torch\hub.py", line 399, in _load_local File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 24, in silero_stt model, decoder = init_jit_model(model_url=models.stt_models.get(language).latest.jit, File "C:\Users\Gregory Betsey/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 135, in init_jit_model model = torch.jit.load(model_path, map_location=device) File "torch\jit_serialization.py", line 161, in load RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "threading.py", line 932, in bootstrap_inner File "threading.py", line 870, in run File "application\utils.py", line 50, in background_task inputs[i, :len(wav)].copy(wav) NameError: name 'traceback' is not defined qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "GET /socket.io/?EIO=4&transport=polling&t=NXjKvy4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:32] "POST /socket.io/?EIO=4&transport=polling&t=NXjKyzw&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "GET /socket.io/?EIO=4&transport=polling&t=NXjKyzw.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:58:57] "POST /socket.io/?EIO=4&transport=polling&t=NXjL358&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "GET /socket.io/?EIO=4&transport=polling&t=NXjL358.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:22] "POST /socket.io/?EIO=4&transport=polling&t=NXjL9CA&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "GET /socket.io/?EIO=4&transport=polling&t=NXjL9CB&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 00:59:47] "POST /socket.io/?EIO=4&transport=polling&t=NXjLFJ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "GET /socket.io/?EIO=4&transport=polling&t=NXjLFJ8.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:12] "POST /socket.io/?EIO=4&transport=polling&t=NXjLLQ8&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "GET /socket.io/?EIO=4&transport=polling&t=NXjLLQ9&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:00:37] "POST /socket.io/?EIO=4&transport=polling&t=NXjLRWz&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "GET /socket.io/?EIO=4&transport=polling&t=NXjLRW-&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:02] "POST /socket.io/?EIO=4&transport=polling&t=NXjLXe3&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "GET /socket.io/?EIO=4&transport=polling&t=NXjLXe4&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:27] "POST /socket.io/?EIO=4&transport=polling&t=NXjLdkv&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "GET /socket.io/?EIO=4&transport=polling&t=NXjLdkv.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:01:52] "POST /socket.io/?EIO=4&transport=polling&t=NXjLjrp&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "GET /socket.io/?EIO=4&transport=polling&t=NXjLjrq&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:17] "POST /socket.io/?EIO=4&transport=polling&t=NXjLpyh&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "GET /socket.io/?EIO=4&transport=polling&t=NXjLpyh.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Sending packet PING data None qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:engineio.server:qINJoZN0iSsAW66FAAAA: Received packet PONG data INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:02:42] "POST /socket.io/?EIO=4&transport=polling&t=NXjLw3g&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1" 200 - qINJoZN0iSsAW66FAAAA: Sending packet PING data None INFO:werkzeug:127.0.0.1 - - [26/Mar/2021 01:03:07] "qINJoZN0iSsAW66FAAAA: Received packet CLOSE data GET /socket.io/?EIO=4&transport=polling&t=NXjLw3g.0&sid=qINJoZN0iSsAW66FAAAA HTTP/1.1qINJoZN0iSsAW66FAAAA: Client is gone, closing socket Error.txt

    bug 
    opened by GregoryBetsey 58
  • GPU Memory exhausted (training)

    GPU Memory exhausted (training)

    Label for this issue is help wanted.

    Hardware

    Processor: Intel Core i9 - 9900K CPU @ 3.60GHz Installed RAM: 64.0 GB GPU: NVIDIA GeForce RTX 2070 - 8GB


    Attempt

    command


    Failure

    gpu blasted


    Tried Steps

    1. Epoch selection:
      • 8000
      • 5000
      • 3000
      • 1000 and 10 iterations

    Steps that might help to resolve

    I found some articles that might be be helpful to resolve this:

    1. https://pytorch.org/docs/stable/notes/faq.html
    2. https://forums.fast.ai/t/clearing-gpu-memory-pytorch/14637/4

    Questions

    1. Would you suggest to reduce the iterator counter from 1000 to a smaller number? If yes then what number would be ideal.
    2. Would you suggest to reduce the epochs from 8000 to a smaller number? If yes then what number would be ideal.
    bug 
    opened by vinamramunot-tech 15
  • HTTPError

    HTTPError

    First time user - after uploading .txt and .wav I get the following error. Any guidance?

    Type: HTTPError Text: HTTP Error 503: Service Temporarily Unavailable Full: Traceback (most recent call last): File "flask\app.py", line 1950, in full_dispatch_request File "flask\app.py", line 1936, in dispatch_request File "application\views.py", line 99, in create_dataset_post File "dataset\transcribe.py", line 85, in create_transcription_model File "dataset\transcribe.py", line 58, in init File "torch\hub.py", line 364, in load model = _load_local(repo_or_dir, model, *args, **kwargs) File "torch\hub.py", line 393, in _load_local model = entry(*args, **kwargs) File "C:\Users\Sam London/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 28, in silero_stt **kwargs) File "C:\Users\Sam London/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 128, in init_jit_model progress=True) File "torch\hub.py", line 419, in download_url_to_file u = urlopen(req) File "urllib\request.py", line 223, in urlopen File "urllib\request.py", line 532, in open File "urllib\request.py", line 642, in http_response File "urllib\request.py", line 570, in error File "urllib\request.py", line 504, in _call_chain File "urllib\request.py", line 650, in http_error_default urllib.error.HTTPError: HTTP Error 503: Service Temporarily Unavailable

    opened by ManBearPig87 13
  • Training unexpectedly crashes

    Training unexpectedly crashes

    Okay I have my wavs folder and my metadata.csv, all is working good except that when I train it loads all data (I can see it on the "embedded console", and then after some seconds opens a new tab in my browser with the main page (localhost:5000). If I switch to the training tab and I wait for some time it doesn't make any progress. Can you help me with this please?

    bug 
    opened by gbh4x 12
  • Error fetching silero model

    Error fetching silero model

    Full: Traceback (most recent call last): File "C:\Users\superuser\Anaconda3\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\superuser\Anaconda3\lib\site-packages\flask\app.py", line 1936, in dispatch_request return self.view_functionsrule.endpoint File "C:\Users\superuser\Desktop\Voice-Cloning-App\application\views.py", line 249, in synthesis_post alertnative_words=get_alternative_word_suggestions(audio_path, text), File "C:\Users\superuser\Desktop\Voice-Cloning-App\synthesis\synonyms.py", line 29, in get_alternative_word_suggestions poor_words = evalulate_audio(audio, text) File "C:\Users\superuser\Desktop\Voice-Cloning-App\synthesis\synonyms.py", line 20, in evalulate_audio results = transcribe(audio) File "C:\Users\superuser\Desktop\Voice-Cloning-App\dataset\transcribe.py", line 27, in transcribe repo_or_dir="snakers4/silero-models", model="silero_stt", language="en", device=device File "C:\Users\superuser\Anaconda3\lib\site-packages\torch\hub.py", line 370, in load model = _load_local(repo_or_dir, model, *args, **kwargs) File "C:\Users\superuser\Anaconda3\lib\site-packages\torch\hub.py", line 399, in _load_local model = entry(*args, **kwargs) File "C:\Users\superuser/.cache\torch\hub\snakers4_silero-models_master\hubconf.py", line 25, in silero_stt **kwargs) File "C:\Users\superuser/.cache\torch\hub\snakers4_silero-models_master\utils.py", line 133, in init_jit_model progress=True) File "C:\Users\superuser\Anaconda3\lib\site-packages\torch\hub.py", line 445, in download_url_to_file unit='B', unit_scale=True, unit_divisor=1024) as pbar: File "C:\Users\superuser\Anaconda3\lib\site-packages\tqdm_tqdm.py", line 662, in init TqdmKeyError("Unknown argument(s): " + str(kwargs))) tqdm._tqdm.TqdmKeyError: "Unknown argument(s): {'unit_divisor': 1024}"

    bug 
    opened by CrazyPlaysHD 9
  • Force Align text and Audio (dataset)

    Force Align text and Audio (dataset)

    Hi @BenAAndrew I am on the step where I am trying to align the text and audio of an audiobook. I have acquired the audio and text from amazon audible. Unfortunately, I was not able to assign the help label to this issue. I don't think I have the permission for that.

    • [X] using virtualenv

    In order to work through align.py, I had to modify it. After modifying I was able to run the file. Below is the modified part of the file. Also in the screenshot category I have mentioned how I am trying to execute this file.

    import os
    import sys
    import json
    import logging
    import argparse
    from pydub import AudioSegment
    
    sys.path.append(".")
    
    from search import FuzzySearch
    from audio import DEFAULT_RATE, read_frames_from_file, vad_split
    from dataset.transcribe import stt
    

    Screenshots

    Failure Point

    failure point


    Questions

    1. Do you suggest to use a virtualenv?
    2. Do I need to reduce the quality of wav file or the mp3 file?

    Link to the dataset

    Audio Dataset

    I have the book.txt and the mp3 file. I have converted that mp3 to wav file when I am trying to use the align. Please let me know if you can try using my dataset. Thanks for the help in advance.

    bug question 
    opened by vinamramunot-tech 9
  • All unlabeled clips unplayable in the Manage datasets interface for a freshly built dataset.

    All unlabeled clips unplayable in the Manage datasets interface for a freshly built dataset.

    This is with VCA release 1.0.2 using the compiled executable on Windows 10. The play option is simply grayed out for all listed clips. Have checked, and all the listed clips are playable from within file explorer, and in the case of this dataset, ranging in length from 1-6 seconds.

    bug 
    opened by RayDAnt3D 8
  • Imported model cannot be loaded

    Imported model cannot be loaded

    Was exporting a model beforehand. (Using a google colab notebook.)

    Type: ValueError Text: invalid literal for int() with base 10: 'checkpoints' Full: Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.7/dist-packages/flask/app.py", line 1936, in dispatch_request return self.view_functionsrule.endpoint File "/content/Voice-Cloning-App/application/views.py", line 391, in download_model model_path = get_latest_checkpoint(os.path.join(paths["models"], model_name)) File "/content/Voice-Cloning-App/training/checkpoint.py", line 27, in get_latest_checkpoint if int(checkpoint.split("")[1].split(".")[0]) > int(latest_checkpoint.split("")[1].split(".")[0]): ValueError: invalid literal for int() with base 10: 'checkpoints'

    bug 
    opened by ericstheguy 8
  • Hang-up when attempting to process dataset on Ampere card [dataset]

    Hang-up when attempting to process dataset on Ampere card [dataset]

    Description: Application freezes during dataset preparation process, I believe this is due to the bundled version of pytorch not supporting ampere cards.

    Version used: 0.6.1 (Windows Executable)

    OS: Win 10 21H1

    Device Specificiations: r9 3900x, dual 3090s, 32gb system memory

    Nvidia Driver version: 461.92

    Log:

    INFO:werkzeug: * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:24] "GET / HTTP/1.1" 200 -
    Starting Thread
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "POST / HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "GET /static/application.js HTTP/1.1" 200 -
    PQVUzoQCJzc-rrA_AAAA: Sending packet OPEN data {'sid': 'PQVUzoQCJzc-rrA_AAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000}
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet OPEN data {'sid': 'PQVUzoQCJzc-rrA_AAAA', 'upgrades': [], 'pingTimeout': 5000, 'pingInterval': 25000}
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0WEU HTTP/1.1" 200 -
    PQVUzoQCJzc-rrA_AAAA: Received packet MESSAGE data 0/voice,
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Received packet MESSAGE data 0/voice,
    PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 0/voice,{"sid":"iBJGih1vKc84-OLJAAAB"}
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 0/voice,{"sid":"iBJGih1vKc84-OLJAAAB"}
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "POST /socket.io/?EIO=4&transport=polling&t=NZQ0WII&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:48] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0WII.0&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Coverting data\\datasets\\Diana\\audio.mp3..."}]
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:53] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0WN2&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Coverting data\\datasets\\Diana\\audio.mp3..."}]
    INFO:voice:Coverting data\datasets\Diana\audio.mp3...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\\datasets\\Diana\\text.txt..."}]
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:56] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0XNM&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Loading script from data\\datasets\\Diana\\text.txt..."}]
    INFO:voice:Loading script from data\datasets\Diana\text.txt...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Searching text for matching fragments..."}]
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Searching text for matching fragments..."}]
    INFO:voice:Searching text for matching fragments...
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:56] "emitting event "logs" to all [/voice]
    GET /socket.io/?EIO=4&transport=polling&t=NZQ0YFA&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    INFO:socketio.server:emitting event "logs" to all [/voice]
    PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Changing sample rate..."}]
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Changing sample rate..."}]
    INFO:voice:Changing sample rate...
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:56] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0YFR&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}]
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:57] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0YKC&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Fetching segments..."}]
    INFO:voice:Fetching segments...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Matching segments..."}]
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:57] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0YOL&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Matching segments..."}]
    INFO:voice:Matching segments...
    emitting event "logs" to all [/voice]
    INFO:socketio.server:emitting event "logs" to all [/voice]
    PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Generating segments..."}]
    INFO:engineio.server:PQVUzoQCJzc-rrA_AAAA: Sending packet MESSAGE data 2/voice,["logs",{"text":"Generating segments..."}]
    INFO:voice:Generating segments...
    INFO:werkzeug:127.0.0.1 - - [16/Apr/2021 10:27:57] "GET /socket.io/?EIO=4&transport=polling&t=NZQ0YYk&sid=PQVUzoQCJzc-rrA_AAAA HTTP/1.1" 200 -
    Using cache found in C:\Users\thefi/.cache\torch\hub\snakers4_silero-models_master
    torch\cuda\__init__.py:104: UserWarning:
    GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
    The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
    If you want to use the GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
    
    bug 
    opened by LexCybermac 8
  • cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

    cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

    i m trying use my own audio and text, this is the error i m having.

    cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

    using windows version. python version is giving another error.

    File "main.py", line 4, in from engineio.async_drivers import threading ModuleNotFoundError: No module named 'engineio'

    I m new to both worlds, is there anything I m not doing?

    bug 
    opened by Syed044 8
  • Invalid characters in text

    Invalid characters in text

    I tried training with it and I got this: Invalid characters in text (for alphabet): ” (RIGHT DOUBLE QUOTATION MARK),­ (SOFT HYPHEN),’ (RIGHT SINGLE QUOTATION MARK),“ (LEFT DOUBLE QUOTATION MARK),— (EM DASH) and it refused to start. Does this mean the text shouldn't have punctuations anymore with this version? If I change to this version will I have to start all over?

    question 
    opened by LeeroyJenkinsss 7
  • Get Audio file via Command

    Get Audio file via Command

    I use this command to get an audio file from cli but here the audio file Im getting is not saying anything at all, via GUI it works perfectly. Im using a more extended alphabet than english python synthesis/synthesize.py -m data/models/..... -vm "data/hifigan/vocoder/model.pt" -hc "data/hifigan/vocoder/config.json" -t "test this is a test" -a audio.wav Any idea? Thanks in advance

    opened by miguelgh65 2
  • JSONDecodeError

    JSONDecodeError

    When I go to synthesize and submit, this is the error message that appears: Text: Expecting value: line 1 column 1 (char 0) Full: Traceback (most recent call last): File "flask\app.py", line 1950, in full_dispatch_request File "flask\app.py", line 1936, in dispatch_request File "application\views.py", line 302, in synthesis_setup_post File "synthesis\vocoders\hifigan.py", line 25, in init File "json_init_.py", line 354, in loads File "json\decoder.py", line 339, in decode File "json\decoder.py", line 357, in raw_decode json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

    opened by skydam 1
  • App is not starting.

    App is not starting.

    I installed all requirements with python 3.6.15. When i try to start app i am getting this error

    Traceback (most recent call last):
      File "main.py", line 52, in <module>
        from application.views import *  # noqa
      File "/home/libir/Desktop/voiceCloning/Voice-Cloning-App/application/views.py", line 11, in <module>
        from application.utils import (
      File "/home/libir/Desktop/voiceCloning/Voice-Cloning-App/application/utils.py", line 8, in <module>
        import librosa
      File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/librosa/__init__.py", line 12, in <module>
        from . import core
      File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/librosa/core/__init__.py", line 103, in <module>
        from .audio import *  # pylint: disable=wildcard-import
      File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/librosa/core/audio.py", line 12, in <module>
        import resampy
      File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/resampy/__init__.py", line 7, in <module>
        from .core import *
      File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/resampy/core.py", line 9, in <module>
        from .interpn import resample_f_s, resample_f_p
      File "/home/libir/.pyenv/versions/3.6.15/lib/python3.6/site-packages/resampy/interpn.py", line 75, in <module>
        nopython=True,
    TypeError: guvectorize() missing 1 required positional argument: 'signature'
    

    Also i am using EndeavourOS.

    opened by LibirSoft 3
  • Is it possible for me to extract a runtime/environment from this?

    Is it possible for me to extract a runtime/environment from this?

    I can't find a python runtime/environment in the temp folder for this program. This is important as installing a correct environment to run tacotron2 is the biggest pain in the world, and I want to use the voice-cloning-app environment to run tacotron2 in another program, without being restricted to using the app. I could not find a single trace of python.exe, but i did find python related files, so how is it able to run? What is it doing to call the synthesis script?

    opened by FlashlightET 1
Releases(v1.1.1)
Owner
Ben Andrew
Developer working on open source Machine Learning, IoT, CAD & electronics projects.
Ben Andrew
Understand Text Summarization and create your own summarizer in python

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent

Sreekanth M 1 Oct 18, 2022
HF's ML for Audio study group

Hugging Face Machine Learning for Audio Study Group Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and disc

Vaibhav Srivastav 110 Jan 01, 2023
OpenChat: Opensource chatting framework for generative models

OpenChat is opensource chatting framework for generative models.

Hyunwoong Ko 427 Jan 06, 2023
Question and answer retrieval in Turkish with BERT

trfaq Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! 🎉 What is this? At this repo, I'm

M. Yusuf Sarıgöz 13 Oct 10, 2022
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

CrossSum This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summ

BUET CSE NLP Group 29 Nov 19, 2022
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

Soohwan Kim 40 Sep 19, 2022
T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets

T‘rex Park is a Youzan sponsored project. Offering Chinese NLP and image models pretrained from E-commerce datasets (product titles, images, comments, etc.).

55 Nov 22, 2022
The source code of HeCo

HeCo This repo is for source code of KDD 2021 paper "Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning". Paper Link: htt

Nian Liu 106 Dec 27, 2022
TweebankNLP - Pre-trained Tweet NLP Pipeline (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Models + Tweebank-NER

TweebankNLP This repo contains the new Tweebank-NER dataset and off-the-shelf Twitter-Stanza pipeline for state-of-the-art Tweet NLP, as described in

Laboratory for Social Machines 84 Dec 20, 2022
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 Corpora 📃 Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models 🤖 RoBERTa-base BNE: https://huggingface.co

PlanTL-SANIDAD 203 Dec 20, 2022
MiCECo - Misskey Custom Emoji Counter

MiCECo Misskey Custom Emoji Counter Introduction This little script counts custo

7 Dec 25, 2022
Fine-tune GPT-3 with a Google Chat conversation history

Google Chat GPT-3 This repo will help you fine-tune GPT-3 with a Google Chat conversation history. The trained model will be able to converse as one o

Nate Baer 7 Dec 10, 2022
Simple NLP based project without any use of AI

Simple NLP based project without any use of AI

Shripad Rao 1 Apr 26, 2022
gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

决赛答辩已经过去一段时间了,我们队伍ac milan最终获得了复赛第3,决赛第4的成绩。在此首先感谢一些队友的carry~ 经过2个多月的比赛,学习收获了很多,也认识了很多大佬,在这里记录一下自己的参赛体验和学习收获。

102 Dec 19, 2022
Simple text to phones converter for multiple languages

Phonemizer -- foʊnmaɪzɚ The phonemizer allows simple phonemization of words and texts in many languages. Provides both the phonemize command-line tool

CoML 762 Dec 29, 2022
:id: A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Dedupe Python Library dedupe is a python library that uses machine learning to perform fuzzy matching, deduplication and entity resolution quickly on

Dedupe.io 3.6k Jan 02, 2023
AllenNLP integration for Shiba: Japanese CANINE model

Allennlp Integration for Shiba allennlp-shiab-model is a Python library that provides AllenNLP integration for shiba-model. SHIBA is an approximate re

Shunsuke KITADA 12 Feb 16, 2022
A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

Ian 1 Jan 15, 2022
A paper list of pre-trained language models (PLMs).

Large-scale pre-trained language models (PLMs) such as BERT and GPT have achieved great success and become a milestone in NLP.

RUCAIBox 124 Jan 02, 2023
结巴中文分词

jieba “结巴”中文分词:做最好的 Python 中文分词组件 "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best Python Chinese word segmentation

Sun Junyi 29.8k Jan 02, 2023