C++ library for audio and music analysis, description and synthesis, including Python bindings

Last update: Jan 03, 2023

Overview

Essentia

Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license. It contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. The library is also wrapped in Python and includes a number of predefined executable extractors for the available music descriptors, which facilitates its use for fast prototyping and allows setting up research experiments very rapidly. Furthermore, it includes a Vamp plugin to be used with Sonic Visualiser for visualization purposes. Essentia is designed with a focus on the robustness of the provided music descriptors and is optimized in terms of the computational cost of the algorithms. The provided functionality, specifically the music descriptors included in-the-box and signal processing algorithms, is easily expandable and allows for both research experiments and development of large-scale industrial applications.

Documentation online: http://essentia.upf.edu

Installation

The library is cross-platform and currently supports Linux, Mac OS X, Windows, iOS and Android systems. Read installation instructions:

You can download and use prebuilt static binaries for a number of Essentia's command-line music extractors instead of installing the complete library

doc/sphinxdoc/extractors_out_of_box.rst

Quick start

Quick start using python:

Command-line tools to compute common music descriptors:

doc/sphinxdoc/extractors_out_of_box.rst

Asking for help

Read frequently asked questions
Create an issue on github if your question was not answered before

Versions

Official releases:

https://github.com/MTG/essentia/releases

Github branches:

master: the most updated version of Essentia (Ubuntu 14.10 or higher, OSX); if you got any problem - try it first.

If you use example extractors (located in src/examples), or your own code employing Essentia algorithms to compute descriptors, you should be aware of possible incompatibilities when using different versions of Essentia.

How to contribute

We are more than happy to collaborate and receive your contributions to Essentia. The best practice of submitting your code is by creating pull requests to our GitHub repository following our contribution policy. By submitting your code you authorize that it complies with the Developer's Certificate of Origin. For more details see: http://essentia.upf.edu/documentation/contribute.html

You are also more than welcome to suggest any improvements, including proposals for new algorithms, etc.

Comments

Remove support for libswresample as we have libavresample
I've installed all of the dependencies that I can uncover, and when I do: $ ./waf configure --mode=release --with-python --with-examples --with-vamp --with-cpptest

I get: Setting top to : /home/roger/AudioSignalProcessing/essentia-2.0.1 Setting out to : /home/roger/AudioSignalProcessing/essentia-2.0.1/build → configuring the project in /home/roger/AudioSignalProcessing/essentia-2.0.1 → Building in release mode Checking for 'g++' (c++ compiler) : /usr/bin/g++ Checking for 'gcc' (c compiler) : /usr/bin/gcc Checking for program pkg-config : /usr/bin/pkg-config Checking for 'libavcodec' : yes Checking for 'libavformat' : yes Checking for 'libavutil' : yes Checking for 'libswresample' : yes Checking for 'taglib' : yes Checking for 'yaml-0.1' : yes Checking for 'fftw3f' : yes Checking for 'samplerate' : yes Checking for 'gaia2' : yes Checking for program python : /usr/bin/python Checking for python version : (2, 7, 6, 'final', 0) Checking for library python2.7 in LIBDIR : yes Checking for program /usr/bin/python-config,python2.7-config,python-config-2.7,python2.7m-config : /usr/bin/python-config Checking for header Python.h : yes ================================ CONFIGURATION SUMMARY

FFmpeg / libav detected! The following algorithms will be included: ['AudioLoader', 'MonoLoader', 'EqloudLoader', 'EasyLoader', 'MonoWriter', 'AudioWriter']

libsamplerate (SRC) detected! The following algorithms will be included: ['Resample']

TagLib detected! The following algorithms will be included: ['MetadataReader']

Gaia2 detected! The following algorithms will be included: ['GaiaTransform']
'configure' finished successfully (1.766s)

But when I do: $ ./waf

I get a bunch of errors. Some are below and all seem to bee related: ../src/essentia/utils/audiocontext.cpp: In member function ‘int essentia::AudioContext::create(const string&, const string&, int, int, int)’: ../src/essentia/utils/audiocontext.cpp:107:10: error: ‘CODEC_ID_PCM_S16LE’ was not declared in this scope case CODEC_ID_PCM_S16LE: ^ ../src/essentia/utils/audiocontext.cpp:108:10: error: ‘CODEC_ID_PCM_S16BE’ was not declared in this scope case CODEC_ID_PCM_S16BE: ^ ../src/essentia/utils/audiocontext.cpp:109:10: error: ‘CODEC_ID_PCM_U16LE’ was not declared in this scope case CODEC_ID_PCM_U16LE: ^ ../src/essentia/utils/audiocontext.cpp:110:10: error: ‘CODEC_ID_PCM_U16BE’ was not declared in this scope case CODEC_ID_PCM_U16BE: ^ and I end up with: Build failed -> task in 'essentia' failed (exit status 1): ...

Can anyone help? I am using Ubuntu 14.04.
bug
opened by rgonnering 30
configuration issue on mac (Getting pyembed flags from python-config: Could not build a python embedded interpreter)

After ./waf configure --mode=release --with-python --with-cpptests --with-examples --with-vamp

I got this

python executable ... differs from system... ... Checking for library python2.7 in LIBPATH_PYEMBED: not found Checking for library python2.7 in LIBDIR: not found Checking for library python2.7 in python_LIBPL: not found Checking for library python2.7 in $prefix/libs: not found ... Getting pyembed flags from python-config: Could not build a python embedded interpreter ...

The configuration failed

any pointer on how to resolve this? thanks.

opened by yyf 28
Probabilistic Yin and CREPE
As the monophonic pitch extraction algorithms in Essentia are out-of-date, it is appealing to implement two state of the art pitch extraction algorithms which lead to better pitch extraction accuracy:

[x] Pyin: https://code.soundsoftware.ac.uk/projects/pyin

[ ] CREPE: https://github.com/marl/crepe

algorithms wishlist
opened by ronggong 21
GaiaTransfrom not found in registry

Hello,

I have compiled and installed first Gaia then Essentia library to my Ubuntu 16.04. I want to use the out of box streaming_extractor_music executable. When I run streaming_extractor_music without any profile I get no problem and a nice output.

However, when I create a profile file that includes: highlevel: compute: 1 svm_models: ['svm_models/genre_tzanetakis.history', 'svm_models/mood_sad.history']

I get GaiaTransform not found in the registry error when it processes the high level svm models.

Any help will be appreciated.

opened by oak94 20
Allow filtering negative energy values

The PredominantPitchMelodia algorithm can return negative confidence values if guessUnvoiced=True. This adds a new option to PitchFilterMakam to automatically take the absolute value of any negative values. Also fix a problem where the octaveFilter parameter wasn't being loaded properly

opened by alastair 19
./waf build fail - TagLib
Hello, I'm trying to run the script ./waf and when I use flags --mode=release --build-static --with-python --with-cpptests --with-examples --with-vamp, I always get stuck at the file metadatareader.cpp. Stacktrace:

[338/374] Linking build/src/examples/essentia_standard_beatsmarker [339/374] Linking build/src/examples/essentia_standard_onsetrate src/libessentia.a(metadatareader.cpp.1.o): In functionformatString(TagLib::StringList const&)': metadatareader.cpp:(.text+0x14f1): undefined reference to TagLib::String::to8Bit(bool) const' metadatareader.cpp:(.text+0x157d): undefined reference toTagLib::String::to8Bit(bool) const' metadatareader.cpp:(.text+0x15b4): undefined reference to TagLib::String::to8Bit(bool) const' src/libessentia.a(metadatareader.cpp.1.o): In functionessentia::standard::MetadataReader::compute()': metadatareader.cpp:(.text+0x2a85): undefined reference to TagLib::String::to8Bit(bool) const' metadatareader.cpp:(.text+0x2c67): undefined reference toTagLib::String::to8Bit(bool) const' collect2: error: ld returned 1 exit status

src/libessentia.a(metadatareader.cpp.1.o): In function `formatString(TagLib::StringList const&)': metadatareader.cpp:(.text+0x14f1): undefined reference to `TagLib::String::to8Bit(bool) const' metadatareader.cpp:(.text+0x157d): undefined reference to `TagLib::String::to8Bit(bool) const' metadatareader.cpp:(.text+0x15b4): undefined reference to `TagLib::String::to8Bit(bool) const' src/libessentia.a(metadatareader.cpp.1.o): In function `essentia::standard::MetadataReader::compute()': metadatareader.cpp:(.text+0x2a85): undefined reference to `TagLib::String::to8Bit(bool) const' metadatareader.cpp:(.text+0x2c67): undefined reference to `TagLib::String::to8Bit(bool) const' collect2: error: ld returned 1 exit status Waf: Leaving directory `/home/kapi/essentia/build' Build failed -> task in 'essentia_standard_beatsmarker' failed with exit status 1 (run with -v to display more information) -> task in 'essentia_standard_onsetrate' failed with exit status 1 (run with -v to display more information)

` I tried installing both the newest (1.11.1) and one of the older (1.9) versions of the TagLib. What can I do to make it work? My operating system is Ubuntu 16.04 LTS.
builds
opened by katpi 16
ConstantQ Transform?

My search in the algorithm reference documentation and a quick search of the repository proved fruitless. Is there an implementation of it (e.g. like this) available in Essentia?
algorithms wishlist

opened by constd 16
cannot import the essentia.standard nor essentia.streaming

when i import essentia it's fine i have no problem but when i try to import the essentia.standard or essentia.streaming i get no module named '..........' i don't know what's the problem

opened by ahmed-jbeli 15
PitchYIN error on stationary signals

Hello

lately we used the YIN implementation in essentia a lot. However for many applications (speech, instruments) I found an constant error compared to other pitch estimators like RAPT.

I tried to produce some more systematic results by running a simple test script (https://gist.github.com/faroit/2ebcf956633f63d92ace) which generates a stationary sine wave of constant f0. The signal then is processed by the YIN algorithm and the mean of the estimate is compared to the (constant) ground truth.

This is what I get:

Obviously the estimation error is frequency depended, which is expected. Over 1 Khz, however, the estimate looks to be unstable.

Did anyone have tested the estimate in comparison to the original C Yin implementation?
bug

opened by faroit 15
Experimental windows support
Hey,

Here's the modifications I did to get things building on Windows with MinGW, with the outcome that with the correct environment setup it should be a case of just supplying these three commands:-

python waf configure --prefix="C:\Program Files (x86)\CodeBlocks\MinGW"

python waf

python waf install

You need to install python and MinGW with pthreads (I used Codeblocks with built in TDM-GCC). During the configure stage it copies the dependencies into bin/include/lib in the MinGW root specified by the prefix option.

I took the built dependencies from the mingw_port and made a few changes:-

I removed the pthread headers from libav as TDM-GCC has them already.

Recompiled libsamplerate to fix def file / dll inconsistency

moved taglib headers down a level in /include/taglib and added missing "tnmap.tcc"
opened by carthach 15
not finding actual directory of libessentia.so

I am new in linux, python and essentia. Using debian jessy

When I call (in python) import essentia:

Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/essentia/init.py", line 1, in import _essentia ImportError: libessentia.so: cannot open shared object file: No such file or directory

(I see the file in /usr/local/lib/)

opened by ErnestoAcc 14
Please add FreeBSD install instructions
The Installing Essentia page can have FreeBSD installation instructions: To install essentia's C++ library: pkg install essentia To install essentia's Python binding: pkg install py39-essentia

The FreeBSD ports are now available:

https://cgit.freebsd.org/ports/tree/audio/essentia/Makefile

https://cgit.freebsd.org/ports/tree/audio/py-essentia/Makefile
opened by yurivict 0
libessentia.so does not have a SONAME

When I build the Python binding in the FreeBSD ports framework it complains:

Error: /usr/local/lib/python3.9/site-packages/essentia/_essentia.cpython-39.so is linked to /usr/local/lib/libessentia.so which does not have a SONAME. audio/essentia needs to be fixed.

libessentia.so doesn't have a SONAME fields set.

opened by yurivict 0
using ios_simulator results in an empty lib !
Hello all

I'm on macOs (Ventura 13.0)

I have some difficulties to build essentia for ios-simulator actually I've made all the necessary glue, calling a simple essentia::init() to test the basis.

But XCode is telling me it can find any symbols And indeed, it appears that the resulting lib may be defectuous ?

when doing a ranlib, I've this bad message:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: for architecture: i386 file: build_ios/src/libessentia.a(essentiautil.cpp.1.o) has no symbols /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: for architecture: x86_64 file: build_ios/src/libessentia.a(essentiautil.cpp.1.o) has no symbols

I join the lib and the log if it can help

Thank you :)
opened by simdax 3

Creating a example for EffnetDiscogs

Hello all

I have to admit I'm not very familiar with AI, so I'm struggling to create a simple example that would work as the other tensorflow examples, musicnn or vggish, in CPP.

In python I have a result with this code:


audio = MonoLoader(filename="../data/raw/blues/blues.00000.wav", sampleRate=16000)()
model = TensorflowPredictEffnetDiscogs(graphFilename="../models/discogs-effnet-bs64-1.pb")
activations = model(audio)

#    [   INFO   ] TensorflowPredict: Successfully loaded graph file: `../models/discogs-effnet-bs64-1.pb`

activations_mean = np.mean(activations, axis=0)
top_n_idx = np.argsort(activations_mean)[::-1][0]

I've just copied these files which are all similar, but it does not seem to work for me sadly.

Anyone could help :) ? Thank you


#include <iostream>
#include <essentia/algorithmfactory.h>
#include <essentia/streaming/algorithms/poolstorage.h>
#include <essentia/scheduler/network.h>
#include "credit_libav.h"

using namespace std;
using namespace essentia;
using namespace essentia::streaming;
using namespace essentia::scheduler;


bool hasFlag(char** begin, char** end, const string& option) {
  return find(begin, end, option) != end;
}

string getArgument(char** begin, char** end, const string& option) {
  char** iter = find(begin, end, option);
  if (iter != end && ++iter != end) return *iter;

  return string();
}

void printHelp(string fileName) {
    cout << "Usage: " << fileName << " pb_graph audio_input output_json [--help|-h] [--list-nodes|-l] [--patchwise|-p] [[-output-node|-o] node_name]" << endl;
    cout << "  -h, --help: print this help" << endl;
    cout << "  -l, --list-nodes: list the nodes in the input graph (model)" << endl;
    cout << "  -p, --patchwise: write out patch-wise predctions (one per patch) instead of averaging them" << endl;
    cout << "  -o, --output-node: node (layer) name to retrieve from the graph (default: model/Sigmoid)" << endl;
    creditLibAV();
}

vector<string> flags({"-h", "--help",
                      "-l", "--list-nodes",
                      "-p", "--patchwise",
                      "-o", "--output-node"});


int main(int argc, char* argv[]) {
  // Sanity check for the command line options.
  for (char** iter = argv; iter < argv + argc; ++iter) {
    if (**iter == '-') {
      string flag(*iter);
      if (find(flags.begin(), flags.end(), flag) == flags.end()){
        cout << argv[0] << ": invalid option '" << flag << "'" << endl;
        printHelp(argv[0]);
        exit(1);
      }
    }
  }

  string outputLayer = "PartitionedCall";

  string graphName = argv[1];
  string audioFilename = argv[2];
  string outputFilename = argv[3];

  // rather to output the patch-wise predictions or to average them.
  const bool average = (hasFlag(argv, argv + argc, "--patchwise") ||
                        hasFlag(argv, argv + argc, "-p")) ? false : true;

  // register the algorithms in the factory(ies)
  essentia::init();

  Pool pool;
  Pool aggrPool;  // a pool for the the aggregated predictions
  Pool* poolPtr = &pool;

  /////// PARAMS //////////////
  Real sampleRate = 16000.0;

  AlgorithmFactory& factory = streaming::AlgorithmFactory::instance();

  Algorithm* audio = factory.create("MonoLoader",
                                    "filename", audioFilename,
                                    "sampleRate", sampleRate);

  Algorithm* tfp   = factory.create("TensorflowPredictEffnetDiscogs",
                                    "graphFilename", graphName,
                                    "output", outputLayer);
  // If the output layer is empty, we have already printed the list of nodes.
  // Exit now.
  if (outputLayer.empty()){
    essentia::shutdown();

    return 0;
  }

  /////////// CONNECTING THE ALGORITHMS ////////////////
  cout << "-------- connecting algos --------" << endl;

  audio->output("audio")     >>  tfp->input("signal");
  tfp->output("predictions") >>  PC(pool, "predictions");


  /////////// STARTING THE ALGORITHMS //////////////////
  cout << "-------- start processing " << audioFilename << " --------" << endl;

  // create a network with our algorithms...
  Network n(audio);
  // ...and run it, easy as that!
  n.run();

  if (average) {
    // aggregate the results
    cout << "-------- averaging the predictions --------" << endl;

    const char* stats[] = {"mean"};

    standard::Algorithm* aggr = standard::AlgorithmFactory::create("PoolAggregator",
                                                                  "defaultStats", arrayToVector<string>(stats));

    aggr->input("input").set(pool);
    aggr->output("output").set(aggrPool);
    aggr->compute();

    poolPtr = &aggrPool;

    delete aggr;
  }

  // write results to file
  cout << "-------- writing results to json file " << outputFilename << " --------" << endl;

  standard::Algorithm* output = standard::AlgorithmFactory::create("YamlOutput",
                                                                   "format", "json",
                                                                   "filename", outputFilename);
  output->input("pool").set(*poolPtr);
  output->compute();
  n.clear();

  delete output;
  essentia::shutdown();

  return 0;
}

compiling it and activating it like these ./build/src/examples/essentia_streaming_discogs test/models/effnetdiscogs/effnetdiscogs-bs64-1.pb test/audio/recorded/mozart_c_major_30sec.wav outpout.json

the result is empty :(

{
"metadata": {
    "version": {
        "essentia": "2.1-beta6-dev"
    }
}
}

Thank you very much :)

opened by simdax 1

Update static builds for Qt 5.15.6
Wishlist of TODOs before merge:

[ ] merge (https://github.com/MTG/gaia/pull/121) and update gaia version in build_config.sh accordingly.

[ ] check if full build for static examples works

builds
opened by dbogdanov 0

Releases(v2.1_beta5)

v2.1_beta5(Sep 5, 2019)
Essentia 2.1 beta5 is our current preliminary version of the forthcoming 2.1 release. This pre-release includes the following changes:

Algorithms updates and bug-fixes

Fix the slaneyMel scale implementation in MelBands and MFCC (#849). Introduced in 2.1-beta4, it was erroneously computing the HTK Mel scale. Set htkMel as the default scale to ensure backward compatibility with all previous versions of MelBands/MFCC.

New option unit_tri for triangle area normalization in MelBands, MFCC, and TriangularBands.

New parameter silenceThreshold in MFCC and GFCC. Set default threshold to 1e-10 (#543).

TriangularBands: faster unit-sum normalization and an improved check for insufficient spectrum resolution (#142).

ConstantQ and the related Chromagram and SpectrumCQ are reimplemented from scratch and now function correctly. The maxFrequency parameter is replaced by numberBins.

New negativeFrequencies parameter in FFTC to include negative frequencies in the output.

New normalize parameter for IFFT size normalization.

FFTC now supports KissFFT and Accelerate.

PoolAggregator: new aggregation method last to get the last value. Fix possible nan/inf values in kurtosis and skewness (#689). Apply aggregation for pool values that contain only one vector too.

New checkRange parameter in Trimmer and StereoTrimmer.

PitchFilter: improve consistency between input and output stream types (#674).

PitchMelodia: fix missing output pitchConfidence in streaming mode.

MultiPitchMelodia: peakFrameThreshold and peakFrameThreshold parameters now work correctly (they were overridden by hardcoded values).

New tolerance parameter in PitchYinFFT. When the pitch confidence is lower than the tolerance value the output pitch is set to 0. A tolerance of 1 disables this feature.

Fix occasional negative values output by Danceability (#483).

LoudnessEBUR128:

Fix memory leaks and warnings on empty input. Set a larger internal buffer size to avoid buffer resizes.

New parameter startFromZero to zero-center the first window for loudness estimation.

Fix a memory leak in AudioLoader.

BeatTrackerDegara output is now deterministic (#860).

ChordDetectionBeats: add new parameter chromaPick and fix a beat segment indexing bug in the case of very close consecutive beats.

New minPeakDistance parameter in PeakDetection.

Fix invalid memory access in PCA (#727).

Update Key and KeyExtractor algorithms with new pitch class profiles and new parameters for detuning correction and low-energy HPCP bin thresholding. Use the new bgate profile by default. Add spectral whitening step to KeyExtractor. Change output key naming. Add a new function equivalentKey to match between equivalent names.

Proper mutex implementation for all FFT* algorithms.

New algorithms

Invertible Constant-Q based on Non-Stationary Gabor frames: NSGConstantQ, NSGIConstantQ, NSGConstantQStreaming.

Chromaprinter (fingerprinting) wrapper for the Chromaprint library.

NNLSChroma and LogSpectrum (derived from the original NNLS Chroma code).

TriangularBarkBands (more configurable than BarkBands) and BFCC (bark-frequency cepstrum coefficients).

New algorithms for audio problems detection: ClickDetector, DiscontinuityDetector, FalseStereoDetector, GapsDetector, HumDetector, NoiseBurstDetector, SNR, SaturationDetector, StartStopCut, TruePeakDetector.

New algorithms for probabilistic Yin (pYIN) pitch estimation: PitchYinProbabilistic, PitchYinProbabilities, PitchYinProbabilitiesHMM.

StereoTrimmer and StereoMuxer.

Welch (power spectral density estimation).

New algorithm IFFTC for inverse complex STFT.

Histogram.

Updated music and sound feature extractors streaming_extractor_music and streaming_extractor_freesound. Both extractors are now also available as algorithms: MusicExtractor and FreesoundExtractor. New MusicExtractorSVM algorithm allows applying SVM models to the output of MusicExtractor.

Fix possible memory leaks in MusicExtractor

Proper logging for "out of memory" errors

Skip aggregation for some descriptors

Add audio length to metadata and remove end_time

Add number of audio channels to metadata (number_channels)

Better grouping of metadata related to audio analysis

Updated key/chords estimation parameters

Estimate key using three different key profiles (temperley, krumhansl, edma)

Updated descriptors in MusicExtractor:

New LoudnessEBU128 loudness descriptors

Add melbands128 high-resolution melbands

Compute hpcp_crest

Compute bpm_histogram

New stdev aggregate statistics in addition to var

Updated descriptors in FreesoundExtractor

Add melbands96 high-resolution melbands

Add stdev statistic

Remove frequency_bands

Do not output bpm_confidence when configured to use 'degara' for beat tracking

spectral_contrast and scvalleys are now called spectral_contrast_coeffs and spectral_contrast_valleys for consistency with MusicExtractor

startFrame and stopFrame are now called sound_start_frame and sound_stop_frame

New extractors

Add a new extractor for spectrograms and log-energy Mel-spectrograms (streaming_spectrogram).

Python bindings updates

Add support for Python 3.

Update all tutorials and code examples to Python 3.

New essentia.pyutils submodule provides useful functions for a number of use-cases (spectrograms, CQ-grams, batch processing with extractors, etc.)

Fix a memory bug in Pool on a isSingleValue check in Python.

Faster VECTOR_VECTOR_REAL conversion from Python types.

Build scripts updates

Add script for Python packaging (python.py) and wheels.

Travis CI and build scripts for manylinux wheels.

Update Waf to 2.0.10.

The code is now partly C++11.

Build flags for MSVC.

Fixes for cross-compilation with Mingw-w64.

Default --prefix=$VIRTUAL_ENV when inside a virtualenv.

Read PKG_CONFIG_PATH and add new flag --pkg-config-path for custom lib paths.

New flag --only-python to build Python extension separately from libessentia.

Link only to libessentia when building examples.

Generate a proper essentia.pc pkg-config file.

Static builds updates.

Replace LibAv with FFmpeg, build with muxers.

Update Taglib version to 1.11.1, build with zlib.

Update Gaia to 2.4.5.

Miscellaneous

Fix segfault in the Vamp plugin (#635, #371).

Add support for SingleVectorString to Pool.

Added support for Cephes Bessel functions via a 3rdparty library Cephes.

Updated documentation, tutorials, and examples including a significant web redesign.

Improve build scripts for documentation.

Every algorithm page now has links to related algorithms.

An updated list of research works using Essentia.

New python examples.

New QA scripts for audio problems detection and HPCPs.

A usual assortment of code cleanup, updated and expanded unit tests, and better logging (more informative log and exception messages).

Source code(tar.gz)
Source code(zip)
v2.1_beta4(May 23, 2018)
This pre-release includes the following changes:

Improved algorithms

AudioLoader now supports audio sources with multiple audio streams (new parameter 'audioStream')

PoolAggregator now outputs stdev in addition to var (#342)

SpectralContrast: Improve precision for computation of subband bin intervals

Danceability now also outputs a DFA exponent vector

HPCP can now optionally apply unit sum normalization (#348)

HPCP: 'splitFrequency' parameter is now called 'bandSplitFrequency'

LoudnessEBUR128: Warn on empty input in the streaming mode

Updates to Mel and ERB energy band algorithms

Add support for extracting MelBands and MFCCs 'the htk way'

Add support for DCT type III in DCT algorithm

New parameter 'dctType' in DCT, MFCC and GFCC

New 'liftering' parameter in DCT and MFCC

New parameters 'normalize', 'type', 'scale' and 'weighting' in MelBands and MFCC

New 'type' parameter in GFCC

New 'logType' parameter in MFCC, GFCC

New 'log' parameter in TriangularBands and MelBands

ERBBands: 'type' parameter value "energy" is now called "power"

TriangularBands is now faster

New algorithms

SpectrumToCent for computing cent scale from frequency bins

New algorithm IDCT for inverse DCT

New algorithm SpectrumCQ

Bug-fixes in algorithms:

MelBands and TriangularBands: Add checks for insufficient spectrum resolution (#142)

Fix PitchYin out of range error (#376)

Fix Inf values in OddToEvenHarmonicEnergyRatio

Fix reset() in LowLevelSpectralExtractor and LowLevelSpectralEqloudExtractor

Fix occasional exception in BeatsLoudness (#199)

Danceability: Fix NaN danceability value occurring on very short input signals

Fix memory leak in MelBands

Fix memory bug in Vibrato

SpectralContrast: Force non-zero 'lowFrequencyBound' parameter to avoid division by zero (#568)

AudioLoader: Fix memory bug on exceptions while opening an audio file in AudioLoader

Updates to Python wrapper:

FrameGenerator now inherits the default parameters from FrameCutter

FrameGenerator now has a new method frame_times() to compute frame positions in time

Fix array memory corruption when passing NumPy array views to Essentia algorithms (#240)

Fix memory deallocation for streaming algorithms to avoid a memory leak

Extractors:

Freesound extractor now stores all results in json

Logging:

Remove colors in log messages when piped to file; do not print colors on Windows

Build scripts updates:

Update waf to 1.9.5

Update script for computing algorithm dependencies

Code cleanup and unit tests updates

Re-designed and expanded documentation:

Updated installation instructions

Reorganized and improved Python tutorials. Notebook tutorials are now also rendered as html

Updated algorithm descriptions

Added examples of industrial applications and academic studies using Essentia

Source code(tar.gz)
Source code(zip)
v2.1_beta3(Sep 29, 2016)
This pre-release includes the following changes:

Build script updates:

Cross-compilation for iOS and Android

Support for javascript using Emscripten

Updated dependencies in static extractors (LibAv 11.2, Taglib 1.10)

Fixed cross-compilation for Windows

Homebrew formula for easy installation on OSX

Updated Debian packaging

All dependencies are now optional. Algorithms and examples relying on missing dependencies will be ignored.

New flags for building lightweight versions of Essentia

--lightweight=LIBS to specify dependencies to be included

--include-algos=ALGOS and --ignore-algos=ALGOS to specify algorithms to be included

New algorithms:

SuperFlux algorithm for real-time onset detection (SuperFluxExtractor, SuperFluxNovelty)

Algorithms for sound modeling

Overlap-add (OverlapAdd)

Sine model analysis/synthesis (SineModelAnal, SineModelSynth)

Sine subtraction (SineSubtraction)

Sinusoidal plus Residual model analysis/synthesis (SprModelAnal, SprModelSynth)

Melody Analysis (monophonic/predominant)

HarmonicMask

Signal resampling (ResampleFFT)

New pitch-related algorithms

Multi-pitch estimation in polyphonic music (MultiPitchKlapuri, MultiPitchMelodia)

Adaptation of Melodia algorithm for monophonic signals (PitchMelodia)

Yin pitch detection algorithm (PitchYin)

Pitch contour segmentation into notes (PitchContourSegmentation)

Vibrato detection (Vibrato)

BPM estimation on loops (PercivalEnhanceHarmonics, PercivalEvaluatePulseTrains, LoopBpmConfidence, LoopBpmEstimator, PercivalBpmEstimator)

STFT on complex inputs ( FFTC)

ConstantQ and Chromagram (still in experimental stage)

TriangularBands

Lightweight spectral centroid implementation (SpectralCentroidTime)

Chords detection on beat segments (ChordsDetectionBeats)

VectorRealAccumulator

Improved algorithms:

LoudnessEBUR128 algorithms are now finalized (includes bug-fixes)

FFT now supports KissFFT and Accelerate FFT libraries as an alternative to FFTW

New profiles for Key estimation (including profiles for electronic music)

New 'generalized' parameter in Autocorrelation algorithm

New 'scale' and 'shift' parameters in UnaryOperator algorithm

New 'normalized' parameter in Windowing algorithm

New 'inputSize' parameter in GFCC algorithm

Added support for 8kHz for EqualLoudness algorithm

LogAttackTime now outputs attack times

BpmHistogramDescriptors now outputs a complete histogram

ChordsDescriptors now throws exception on incorrect chords

Refactored AudioLoader and AudioWriter algorithms. Use libavresample, remove support for libswresample

Rename PitchFilterMakam to PitchFilter. Allow filtering negative energy values. Remove optional 'octaveFilter' parameter

Rename PredominantMelody algorithm to PredominantPitchMelodia

Bug-fixes:

Fix wrong behavior of HarmonicPeaks that was indirectly affecting results in HPCP, Key, Tristimulus and OddToEvenHarmonicEnergy

Fixed filter coefficients in BandReject and BandPass

Fixed weightings in NoveltyCurve

Different key profiles in Key streaming algorithm now work correctly

Bug fixes in Envelope, TonicIndianArtMusic, RhythmExtractor2013, PitchYinFFT, BpmHistogramDescriptors, ReplayGain streaming

Updated extractors (including Freesound extractor)

Improved documentation

Fresh new design

Algorithms are now organized by categories.

Improved and rewritten algorithm descriptions

New python examples and tutorials

More minor fixes, improvements and code cleanup

Updated unit tests. Audio files for tests are now hosted in a separate repository

Known issues:

Some unit tests fail (#316)

Source code(tar.gz)
Source code(zip)
v2.1_beta2(Mar 26, 2015)
Changes:

Build scripts updates:

New scripts for static builds on Linux, OSX and (cross-compilation) Windows

New flag --with-example to build only specific examples

New git commit SHA hash value accessible via Essentia library API for better versioning

Algorithm updates:

AudioLoader now outputs codec and bitrate, and computes md5 hash values over undecoded audio

MetadataReader now uses new TagLib 1.9 API and is able to read any tags

YamlInput now supports json

New Entropy algorithm

EffectiveDuration now accepts a threshold parameter

Fixed incorrect computation of onset rate in OnsetRate

New algorithm LoudnessEBUR128 for measuring loudness according to the EBU R128 standard (still in experimental stage)

New BinaryOperator algo

PitchYinFFT algorithm now includes peak interpolation

Revised and updated extractors:

Revised, refactored and expanded music extractor (streaming_extractor_music) including new functionality and descriptors

Updated Freesound extractor, including new descriptors

Some updates in core Essentia code

Updated documentation and examples

Bugfixes and unit tests updates

Dependencies: Libav 9, Taglib 1.9

Ubuntu/Debian Libav/Taglib compatibility:

Debian Jessie - the required package versions are already in the repository

Debian Wheezy - install libav/libtag1-dev packages from wheezy-backports repository

libav 6:10.1

libtag1-dev 1.9.1

Ubuntu Trusty (14.04 LTS), Utopic (14.10) and Vivid (15.04) - the required package versions are already in the repository

Source code(tar.gz)
Source code(zip)
v2.0.1(Feb 11, 2014)
Essentia 2.0.1:

Added pre-trained high-level classifier models for genres, moods, rhythm and instrumentation (to be used with streaming_extractor_archivemusic extractor, see accuracies here)

Fixed scheduler in streaming mode

Fixed compilation with clang/libc++/c++11

PitchYinFFT now supports parabolic interpolation

Updated Vamp plugin

Updated documentation and tutorials

Minor bugfixes, more unittests, etc.

For post-release bugfixes (including Ubuntu 14.04 compatibility) use the 2.0.1 branch.

Ubuntu/Debian Libav compatibility:

Debian Wheezy - libav 6:0.8.17

Ubuntu Precise (12.04 LTS) - libav 4:0.8.17

Ubuntu Trusty (14.04 LTS) - libav 6:9.18

Source code(tar.gz)
Source code(zip)
v2.0(Mar 31, 2015)
First release to be publicly available as free software released under AGPLv3

Refactoring of the core API

fix small API annoyances for the standard mode

streaming mode refactor. It is now much better defined, using sound computer science techniques (The visible network is a directed acyclic graph, the composites have better defined semantics, and the order of execution of the algorithms is the topological sort of the transitive reduction of the visible network after the composites have been expanded). In particular, the scheduler that runs the algorithms in the streaming mode is now a lot more correct, which permitted to clean all the small hacks that had accumulated in the algorithms themselves during the 1.x releases to compensate for the deficiencies of the initial scheduler.

New algorithms for onset detection, beat tracking and melody extraction

New and updated features extractors

Updated Vamp plugin

Much better documentation, more python examples

Bugfixes, more unittests, etc.

For post-release bugfixes use the 2.0 branch.

Ubuntu/Debian Libav compatibility:

Debian Wheezy - libav 6:0.8.17

Ubuntu Precise (12.04 LTS) - libav 4:0.8.17

Ubuntu Trusty (14.04 LTS) - libav 6:9.18

Source code(tar.gz)
Source code(zip)

Owner

Music Technology Group - Universitat Pompeu Fabra

Software tools developed by the MTG

GitHub Repository http://essentia.upf.edu

The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases.

The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases. The cross-section of the throat is less than the cross-section of the inlet pi

1 Dec 03, 2021

pedalboard is a Python library for adding effects to audio.

pedalboard is a Python library for adding effects to audio. It supports a number of common audio effects out of the box, and also allows the use of VST3® and Audio Unit plugin formats for third-party

3.9k Jan 02, 2023

Nayeli: cool telegram groups vc music project

Nayeli-music Nayeli 🥀 is cool telegram 🍎 groups vc music project 🎋 . Nayeli-music Nayeli Deployment 🎋 📲 Esy deploy 🐾️ Source Owner ♥️ ❄️ He is s

2 Dec 20, 2021

Extract the songs from your osu! libary into proper mp3 form, complete with metadata and album art!

osu-Extract Extract the songs from your osu! libary into proper mp3 form, complete with metadata and album art! Requirements python3 mutagen pillow Us

2 Mar 09, 2022

C++ library for audio and music analysis, description and synthesis, including Python bindings

Essentia Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license.

2.3k Jan 03, 2023

Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Auditory Slow-Fast This repository implements the model proposed in the paper: Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen, Slow-Fa

57 Dec 07, 2022

Code for paper 'Audio-Driven Emotional Video Portraits'.

Audio-Driven Emotional Video Portraits [CVPR2021] Xinya Ji, Zhou Hang, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu [Project] [Paper] G

197 Dec 31, 2022

AudioDVP:Photorealistic Audio-driven Video Portraits

AudioDVP This is the official implementation of Photorealistic Audio-driven Video Portraits. Major Requirements Ubuntu = 18.04 PyTorch = 1.2 GCC =

232 Jan 03, 2023

PianoPlayer - Automatic fingering generator for piano scores

571 Jan 02, 2023

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose We provide PyTorch implementations for our arxiv paper "Audio-dr

497 Jan 09, 2023

The official repository for Audio ALBERT

AALBERT Here is also the official repository of AALBERT, which is Pytorch lightning reimplementation of the paper, Audio ALBERT: A Lite Bert for Self-

55 Dec 11, 2022

A library for augmenting annotated audio data

muda A library for Musical Data Augmentation. muda package implements annotation-aware musical data augmentation, as described in the muda paper. The

214 Nov 22, 2022

All-In-One Digital Audio Workstation and Plugin Suite

How to install Windows Mac OS X Fedora Ubuntu How to Build Debian and Ubuntu Fedora All Other Linux Distros Mac OS X Windows What is MusiKernel? MusiK

111 Sep 21, 2021

A python program for visualizing MIDI files, and displaying them in a spiral layout

SpiralMusic_python A python program for visualizing MIDI files, and displaying them in a spiral layout For a hardware version using Teensy & LED displ

6 Nov 23, 2022

Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

upai-gst-dl-plugins Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline Introduction Thanks to the work done by @j

11 Dec 11, 2022

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

14 Nov 02, 2022

Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.

OMNIZART Omnizart is a Python library that aims for democratizing automatic music transcription. Given polyphonic music, it is able to transcribe pitc

1.3k Jan 08, 2023

C++ library for audio and music analysis, description and synthesis, including Python bindings

Related tags

Overview

Essentia

Installation

Quick start

Asking for help

Versions

How to contribute

Comments

Gaia2 detected! The following algorithms will be included: ['GaiaTransform']

I got this

The configuration failed

Releases(v2.1_beta5)

v2.1_beta5(Sep 5, 2019)

v2.1_beta4(May 23, 2018)

v2.1_beta3(Sep 29, 2016)

v2.1_beta2(Mar 26, 2015)

v2.0.1(Feb 11, 2014)

v2.0(Mar 31, 2015)

Owner

Music Technology Group - Universitat Pompeu Fabra

The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases.

pedalboard is a Python library for adding effects to audio.

Nayeli: cool telegram groups vc music project

Extract the songs from your osu! libary into proper mp3 form, complete with metadata and album art!

C++ library for audio and music analysis, description and synthesis, including Python bindings

Implementation of "Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021" in PyTorch

Code for paper 'Audio-Driven Emotional Video Portraits'.

AudioDVP:Photorealistic Audio-driven Video Portraits

PianoPlayer - Automatic fingering generator for piano scores

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

The official repository for Audio ALBERT

A library for augmenting annotated audio data

All-In-One Digital Audio Workstation and Plugin Suite

A python program for visualizing MIDI files, and displaying them in a spiral layout

Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.

Sequencer: Deep LSTM for Image Classification

GNU Radio – the Free and Open Software Radio Ecosystem

Pianote - An application that helps musicians practice piano ear training