Analysis of voices based on the Mel-frequency band

Last update: Feb 06, 2022

Related tags

Audio Speaker_partition_module

Overview

Speaker_partition_module

Analysis of voices based on the Mel-frequency band.
Goal: Identification of voices speaking (diarization) and calculation of speech partition (in %).

Methodology:

Collect voice data
Sample audio data of x speakers that talk y times to represent a round of people talking
Annotate samples with labels and merge audio file
Create train & test split of samples
Train unsupervised clustering module to detect number of people
Train supervised RNN classifier to determine who is speaking at time x

Preprocessing

Convert files to .wav convertFlac2Wav.py
Collect data via LibriSpeech voices library (audiofiles) audio_manipulation02.py
Extract x random speakers with y audio samples per speaker Result: Generated audio samples of length 30-60 seconds

Feature extraction:

Create mel-frequency spectrum for each audio file feature_extraction.py
Define overlapping feature window for training

Training:

Implementation of google-diarizer module
Training accuracy is only at 40 %

Further activity

Create own unsupervised clustering module
Try out different libraries

Owner

GitHub Repository

Library for Python 3 to communicate with the Google Chromecast.

pychromecast Library for Python 3.6+ to communicate with the Google Chromecast. It currently supports: Auto discovering connected Chromecasts on the n

2.4k Jan 02, 2023

A Python library and tools AUCTUS A6 based radios.

A Python library and tools AUCTUS A6 based radios.

6 Nov 23, 2022

Frescobaldi LilyPond Editor

README for Frescobaldi Homepage: http://www.frescobaldi.org/ Main author: Wilbert Berendsen Frescobaldi is a LilyPond sheet music text editor. It aims

600 Dec 29, 2022

Improved Python UI to convert Youtube URL to .mp3 file.

YT-MP3 Improved Python UI to convert Youtube URL to .mp3 file. How to use? Just run python3 main.py Enter the URL of the video Enter the PATH of where

8 Jun 19, 2022

Praat in Python, the Pythonic way

Parselmouth - Praat in Python, the Pythonic way Parselmouth is a Python library for the Praat software. Though other attempts have been made at portin

786 Jan 09, 2023

Muzic: Music Understanding and Generation with Artificial Intelligence

Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence.

2.6k Dec 30, 2022

Audio Retrieval with Natural Language Queries: A Benchmark Study

Audio Retrieval with Natural Language Queries: A Benchmark Study Paper | Project page | Text-to-audio search demo This repository is the implementatio

21 Oct 31, 2022

gentle forced aligner

Gentle Robust yet lenient forced-aligner built on Kaldi. A tool for aligning speech with text. Getting Started There are three ways to install Gentle.

1.2k Dec 30, 2022

Terminal-based audio-to-text converter

att Terminal-based audio-to-text converter Project description A terminal-based audio-to-text converter written in python, enabling you to convert .wa

4 Dec 15, 2022

Small Python application that links a Digico console and Reaper, handling automatic marker insertion and tracking.

Digico-Reaper-Link This is a small GUI based helper application designed to help with using Digico's Copy Audio function with a Reaper DAW used for re

10 Oct 24, 2022

A collection of free MIDI chords and progressions ready to be used in your DAW, Akai MPC, or Roland MC-707/101

A collection of free MIDI chords and progressions ready to be used in your DAW, Akai MPC, or Roland MC-707/101

921 Jan 05, 2023

A rofi-blocks script that searches youtube and plays the selected audio on mpv.

rofi-ytm A rofi-blocks script that searches youtube and plays the selected audio on mpv. To use the script, run the following command rofi -modi block

26 Dec 21, 2022

Xbot-Music - Bot Play Music and Video in Voice Chat Group Telegram

XBOT-MUSIC A Telegram Music+video Bot written in Python using Pyrogram and Py-Tg

2 Jan 20, 2022

Library for working with sound files of the format: .ogg, .mp3, .wav

Library for working with sound files of the format: .ogg, .mp3, .wav. By work is meant - playing sound files in a straight line and in the background, obtaining information about the sound file (auth

2 Dec 15, 2022

Desktop music recognition application for windows

MusicRecognizer Music recognition application for windows You can choose from which of the devices the recording will be made. If you choose speakers,

28 Dec 13, 2022

Okaeri-Music is a telegram music bot project, allow you to play music on voice chat group telegram.

Okaeri-Music is a telegram bot project that's allow you to play music on telegram voice chat group

1 Dec 22, 2021

Gammatone-based spectrograms, using gammatone filterbanks or Fourier transform weightings.

Gammatone Filterbank Toolkit Utilities for analysing sound using perceptual models of human hearing. Jason Heeris, 2013 Summary This is a port of Malc

188 Dec 14, 2022

Graphical interface to control granular sound synthesis.

Granular sound synthesis interface SoundGrain is a graphical interface where users can draw and edit trajectories to control granular sound synthesis

122 Dec 10, 2022

📺Headless全自动B站直播录播、切片、上传一体工具

DDRecorder Headless全自动B站直播录播、切片、上传一体工具感谢 FortuneDayssss/BilibiliUploader 安装指南（Windows）在Release下载zip包解压。修改配置文件config.json 双击运行DDRecorder.exe （这将使用co

322 Dec 27, 2022

ianZiPu is a way to write notation for Guqin (古琴) music.

PyBetween Wrapper for Between - 비트윈을 위한 파이썬 라이브러리 Legal Disclaimer 오직 교육적 목적으로만 사용할수 있으며, 비트윈은 VCNC의 자산입니다. 악의적 공격에 이용할시 처벌 받을수 있습니다. 사용에 따른 책임은 사용자가

8 Nov 25, 2022