kapre: Keras Audio Preprocessors

Last update: Dec 29, 2022

Overview

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

You can optimize DSP parameters
Your model deployment becomes much simpler and consistent.
Your code and model has less dependencies

vs. Your own implementation

Quick and easy!
Consistent with 1D/2D tensorflow batch shapes
Data format agnostic (channels_first and channels_last)
Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
Kapre layers have some extended APIs from the default tf.signals implementation such as..
- A perfectly invertible STFT and InverseSTFT pair
- Mel-spectrogram with more options
Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
The data loader simply loads audio signals and feed them into the model
In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

See the Jupyter notebook at the example folder

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}

Comments

Migrated functions to tf.keras

This PR addresses #52 by removing the dependency on keras and switching to tensorflow.keras

Proposed version is 0.1.6 because of pull request #56

In particular, #56 keeps the dependency on keras with from keras.utils.conv_utils import conv_output_length

opened by douglas125 27
Spectrogram?

I have an older version of Kapre that has time_frequency.Spectrogram, which is trainable.

However, the new version of Kapre doesn't have Spectrogram anymore. Why?

opened by turian 8
Melspectrogram cant be set 'trainable_fb=False'

Melspectrogram cant be set 'trainable_fb=False',after I set trainable_fb=False,trainable_kernel=False,but seems like it doesnot work.It is still trainable.

opened by zhh6 8
htk=true for mel frequencies

We noticed the current implenetation of the mel_frequencies function (based on Librosa) doesn't include the htk=True option, which is handy when training CNNs because then the frequency scale is fully logarithmic which, in principle, makes more sense for frequency invariant convolutional filters.

What was the motivation for removing this? Any chance it can be added?

opened by justinsalamon 8
Added parallel STFT implementation

Hi!

As I comented in #98, I added a parallel STFT implementation based on the map_fn function following the indications of @zaccharieramzi here.

I've added a use_parallel_stft param (disabled by default) that allows to use this function. I've put this param in as many functions as I can. I also added test cases for every function I can, including an specific test that checks that the result of the tf.signal.stft is equals to the result of the parallel_stft function.

Hope this could serve us well meanwhile tensorflow resolves its issues with the fft implementation.

opened by JPery 7
Amplitude-to-decibel conversion produces different results on different batches

Related to #16, I found another issue that contributes to different prediction results depending on batch size (and the batches themselves). In particular, it occurs when using converting spectrograms to decibels.

https://github.com/keunwoochoi/kapre/blob/master/kapre/backend_keras.py#L17

The maximum is taken over the entire tensor, instead of per example in the batch. This results in different normalization when the examples in a batch are different.
bug

opened by auroracramer 7
Inverse Spectrogram and Mel-Spectrogram Layer?

Namaste!

kapre has become an integral part of all my audio Deep Learning experiments. Powerful! Thanks for providing such a great software!

I was thinking... I guess it would make sense to have layers for inverse spectrogram and inverse mel-spectrogram. Thinking about Autoencoders, this would be even more powerful. I know that reconstructing samples from spectrograms is not the best, but it is possible to a certain degree.

What do you think about that feature request?

Best, Tristan

opened by AI-Guru 7
Hey! The input is too short!

Hi,

I'm encountering an assertion problem when calling your code with a Tensorflow backend.

input_shape = (44100,1)

Could this be a be a problem with "channels_first" / "channels_last"?

Best, Alex

opened by slychief 6
Pip?

It seems you were on pip, but are no longer. Is there anything I could do to help get kapre back on there? We want to use this library in a commercial application, and for our process pip packages are easier to support than a git repository.

opened by ff-rfeather 6

trainable_stft error

Following your example but missing layer definition trainable_stft or something, can you provide example with error resolution?

`# 6 channels (!), maybe 1-sec audio signal
input_shape = (6, 44100) 
sr = 44100
model = Sequential()
model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=src_shape,
                         border_mode='same', sr=sr, n_mels=128,
                         fmin=0.0, fmax=sr/2, power=1.0,
                         return_decibel=False, trainable_fb=False,
                         trainable_kernel=False
                         name='trainable_stft'))`

  File "<ipython-input-24-cea5588ddf1e>", line 13
    name='trainable_stft'))
       ^
SyntaxError: invalid syntax

opened by sildeag 6

`STFT` layer output shape deviates from `STFTTflite` layer in batch dimension
Use Case

I want to convert a STFT layer in my model to a STFTTflite to deploy it to my mobile device. In the documentation I found that another dimension is added to account for complex numbers. But I also encountered a behaviour that is not documented.

Expected Behaviour

input_shape = (2048, 1) # mono signal model = keras.models.Sequential() # TFLite incompatible model model.add(kapre.STFT(n_fft=1024, hop_length=512, input_shape=input_shape)) tflite_model = keras.models.Sequential() # TFLite compatible model tflite_model.add(kapre.STFTTflite(n_fft=1024, hop_length=512, input_shape=input_shape))

model has the output shape (None, 3, 513, 1). Therefore, tflite_model should have the output shape (None, 3, 513, 1, 2).

Observed Behaviour

The output shape of tflite_model is (1, 3, 513, 1, 2) instead of (None, 3, 513, 1, 2).

Problem Solution

If this behaviour is unwanted:

Change the model output format so that the batch dimension is correctly shaped.

Otherwise:

Explain in the documentation why the batch dimension is shaped to 1.

Explain in the documentation how to include this layer into models which expect the batch dimension to be shaped None.
opened by PhilippMatthes 5

Problem incorporating SpecAugument in the training process

Hi,

I'm trying to add a SpecAug layer in the training process of a CNN using the code below:


CLIP_DURATION = 5 
SAMPLING_RATE = 41000
NUM_CHANNELS = 1

INPUT_SHAPE = ((CLIP_DURATION * SAMPLING_RATE), NUM_CHANNELS)

melgram = get_melspectrogram_layer(input_shape = INPUT_SHAPE,
                          n_fft = 2048,
                          hop_length = 512,
                          return_decibel=True,
                          n_mels = 40,
                          mel_f_min = 500,
                          mel_f_max = 15000,
                          input_data_format='channels_last',
                          output_data_format='channels_last')

spec_augment = SpecAugment(freq_mask_param=5,
                          time_mask_param=10,
                          n_freq_masks=2,
                          n_time_masks=3,
                          mask_value=-100)   

model = Sequential()
model.add(melgram)
model.add(spec_augment)

The CNN summary looks like this:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 melspectrogram (Sequential)  (None, 397, 40, 1)       0         
                                                                 
 spec_augment_1 (SpecAugment  (None, 397, 40, 1)       0         
 )                                                               
                                                                 
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________

Compiling and fitting the model

model.compile(loss = 'sparse_categorical_crossentropy', optimizer='adam', metrics = 'accuracy')

early_stop = EarlyStopping(monitor='loss', patience=5)

reduce_LR = ReduceLROnPlateau(monitor="val_loss",factor=0.1,patience=4)

checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')

model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])

Then I get the following error:

Epoch 1/50
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-35-e58a056ab523>](https://localhost:8080/#) in <module>
      7 checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')
      8 
----> 9 model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])

6 frames
[/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py](https://localhost:8080/#) in tf___apply_masks_to_axis(self, x, axis, mask_param, n_masks)
     78                 try:
     79                     do_return = True
---> 80                     retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)
     81                 except:
     82                     do_return = False

TypeError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
        y_pred = self(x, training=True)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_filepzvfxhgz.py", line 63, in tf__call
        ag__.if_stmt((ag__.ld(training) in (None, False)), if_body_2, else_body_2, get_state_2, set_state_2, ('do_return', 'retval_'), 2)
    File "/tmp/__autograph_generated_filepzvfxhgz.py", line 58, in else_body_2
        retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (), dict(elems=ag__.ld(x), fn=ag__.ld(self)._apply_spec_augment, dtype=ag__.ld(tf).float32, fn_output_signature=ag__.ld(tf).float32), fscope)
    File "/tmp/__autograph_generated_filef27o6c1f.py", line 44, in tf___apply_spec_augment
        ag__.if_stmt((ag__.ld(self).n_time_masks >= 1), if_body_1, else_body_1, get_state_1, set_state_1, ('x',), 1)
    File "/tmp/__autograph_generated_filef27o6c1f.py", line 39, in if_body_1
        x = ag__.converted_call(ag__.ld(self)._apply_masks_to_axis, (ag__.ld(x),), dict(axis=ag__.ld(time_axis), mask_param=ag__.ld(self).time_mask_param, n_masks=ag__.ld(self).n_time_masks), fscope)
    File "/tmp/__autograph_generated_file3vip8w4x.py", line 80, in tf___apply_masks_to_axis
        retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)

    TypeError: Exception encountered when calling layer "spec_augment_1" (type SpecAugment).
    
    in user code:
    
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 299, in call  *
            elems=x, fn=self._apply_spec_augment, dtype=tf.float32, fn_output_signature=tf.float32
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 273, in _apply_spec_augment  *
            x = self._apply_masks_to_axis(
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 254, in _apply_masks_to_axis  *
            return tf.where(mask, self.mask_value, x)
    
        TypeError: Input 'e' of 'SelectV2' Op has type float32 that does not match type int32 of argument 't'.
    
    
    Call arguments received by layer "spec_augment_1" (type SpecAugment):
      • x=tf.Tensor(shape=(None, 397, 40, 1), dtype=float32)
      • training=True
      • kwargs=<class 'inspect._empty'>

The shape of X_train is

(2182, 205000, 1)

I'm using Tensorflow 2.9.2, and Python 3.7.15

When I remove the SpecAug layer everything runs fine. I've tested using only the melspec + a mobile net at the end and it runs smooth. The problem is apparently related to SpecAug layer.

Do you have any idea what could be going wrong here? I appreciate any guidance related to the problem. Best regards.

opened by nnbuainain 2

Full-integer quantization and kapre layers
I am training a model which includes the mel-spectrogram block from get_melspectrogram_layer() right after the input layer. Training goes well, and I am able to change the specific mel-spec-layers to their TFLite-counterparts (STFTTflite, MagnitudeTflite) afterwards. I have checked also that the model performs as well as before.

The model also perfoms as expected when converting the model to .tflite using dynamic range quantization. However, when using full-integer quantization, the model loses its accuracy (see (https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only).

I suppose the mel-spec starts to significantly differ as in full-integer quantization, the input values are projected to new range (int8). Is there any way to make it work with full-integer quantization?

I guess I need to separate the mel-spec-layer from the model as a preprocessing step in order to succeed with full-integer quantization, i.e., apply the input quantization to the output values of mel-spec layer. But then I would have to deploy two models to the edge device, where the input goes first to the mel-spec-block and then to the rest of the model (?).

I am using TensorFlow 2.7.0 and kapre 0.3.7.

Here is my code for testing the tflite-model:

preds = [] # Test and evaluate the TFLite-converted model on unseen test data for i, sample in enumerate(X_test_full_scaled): X = sample if input_details['dtype'] == np.int8: input_scale, input_zero_point = input_details["quantization"] X = sample / input_scale + input_zero_point X = X.reshape((1, 8000, 1)).astype(input_details["dtype"]) interpreter.set_tensor(input_index, X) interpreter.invoke() pred = interpreter.get_tensor(output_index) output_scale, output_zero_point = output_details['quantization'] if output_details['dtype'] == np.int8: pred = pred.astype(np.float32) pred = (pred - output_zero_point) * output_scale pred = np.argmax(pred, axis=1)[0] preds.append(pred) preds = np.array(preds)
opened by eppane 3
Calling Magnitude() and Phase() simultaneously

Hi,

I am looking to call Magnitude() and Phase() simultaneously for the same STFT input and concatenate the magnitude and phase before feeding into the convolution layers in my CNN sequential Keras model.

Is this possible?

Best,

Yang

opened by HsuanYang-Wang 1
about kapre.utiils

Hi, when i used "from kapre.utils import Normalization2D", I met this error which said No module named 'kapre.utils'. I see your package, and found that there is surely no utils.py. I am wondering how to slove it.

Best wishes, Daisy

opened by YiningWang2 1
Function missing in updated version

I noticed there is a functon "kapre.utils.Normalization2D" in the old version, while I cannot find it in the updated version. Why? Is there have any alternative functions?

opened by v3551G 1
trainable DSP parameters

hello contributers and community.

I love your repo! It's eases so much for me! Although having the precomputation in the model is already great I'd like to know how you can optimize DSP parameters. It looks like that this is a feature from old versions (e.g. 0.2) and by default I dont see any trainable params in this layer.

Could you please state if this is still available and how to use it?

happy hacking Paul

opened by bytosaur 2

Releases(Kapre-0.3.7)

Kapre-0.3.7(Jan 21, 2022)
Add SpecAugment layer

Source code(tar.gz)
Source code(zip)
Kapre-0.3.6(Nov 14, 2021)
bugfix (tflite)

Source code(tar.gz)
Source code(zip)
Kapre-0.3.5(Mar 18, 2021)
Add tflite-compatible stft layer

Source code(tar.gz)
Source code(zip)
Kapre-0.3.4(Sep 29, 2020)

Bugfix for get_window_fn()
Source code(tar.gz)
Source code(zip)
0.3.3(Sep 15, 2020)
kapre.augmentation is added

kapre.time_frequency.ConcatenateFrequencyMap is added

kapre.composed.get_frequency_aware_conv2d is added

In STFT and InverseSTFT, keyword arg window_fn is renamed to window_name and it expects string value, not function.

With this update, models with Kapre layers can be loaded with h5 file format.

kapre.backend.get_window_fn is added

Source code(tar.gz)
Source code(zip)

0.3.2(Aug 30, 2020)

- `kapre.signal.Frame` and `kapre.signal.Energy` are added
- `kapre.signal.LogmelToMFCC` is added
- `kapre.signal.MuLawEncoder` and `kapre.signal.MuLawDecoder` are added
- `kapre.composed.get_stft_magnitude_layer()` is added 
- doc is hosted at https://kapre.readthedocs.io/

Source code(tar.gz)
Source code(zip)

0.3.1(Aug 21, 2020)
InverseSTFT and etc.

Source code(tar.gz)
Source code(zip)
0.3.0(Aug 16, 2020)

Breaking and simplifying changes with Tensorflow 2.0 and more tests. Some features are removed. New layer - STFT(). New approach for more complicated representations - see kapre.composed.
Source code(tar.gz)
Source code(zip)
v0.1.8(May 18, 2020)

Added Delta layer
Source code(tar.gz)
Source code(zip)
kapre-master.zip(3.16 MB)
v0.1.7(Feb 20, 2020)

Source code(tar.gz)
Source code(zip)

Owner

Keunwoo Choi

MIR, machine learning, music recommendation.

GitHub Repository

Algorithmic and AI MIDI Drums Generator Implementation

8 Dec 30, 2022

Stream Music 🎵 𝘼 𝙗𝙤𝙩 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙥𝙡𝙖𝙮 𝙢𝙪𝙨𝙞𝙘 𝙤𝙣 𝙏𝙚𝙡𝙚𝙜𝙧𝙖𝙢 𝙂𝙧𝙤𝙪𝙥 𝙖𝙣𝙙 𝘾𝙝𝙖𝙣𝙣𝙚𝙡 𝙑𝙤𝙞𝙘𝙚 𝘾𝙝𝙖𝙩𝙨 𝘼𝙫𝙖𝙞𝙡?

Stream Music 🎵 𝘼 𝙗𝙤𝙩 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙥𝙡𝙖𝙮 𝙢𝙪𝙨𝙞𝙘 𝙤𝙣 𝙏𝙚𝙡𝙚𝙜𝙧𝙖𝙢 𝙂𝙧𝙤𝙪𝙥 𝙖𝙣𝙙 𝘾𝙝𝙖𝙣𝙣𝙚𝙡 𝙑𝙤𝙞𝙘𝙚 𝘾𝙝𝙖𝙩𝙨 𝘼𝙫𝙖𝙞𝙡?

15 Nov 12, 2022

Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.

OMNIZART Omnizart is a Python library that aims for democratizing automatic music transcription. Given polyphonic music, it is able to transcribe pitc

1.3k Jan 08, 2023

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Summary Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the pack

1k Jan 09, 2023

Free and Open Source Channel/Group Voice chat music player for telegram with button support saavn playback support.

A bot that can play music on Telegram Group and Channel Voice Chats

1 Oct 27, 2021

kapre: Keras Audio Preprocessors

Related tags

Overview

Kapre

Why Kapre?

vs. Pre-computation

vs. Your own implementation

Workflow with Kapre

Installation

API Documentation

One-shot example

Citation

Comments

Use Case

Expected Behaviour

Observed Behaviour

Problem Solution

Releases(Kapre-0.3.7)

Kapre-0.3.7(Jan 21, 2022)

Kapre-0.3.6(Nov 14, 2021)

Kapre-0.3.5(Mar 18, 2021)

Kapre-0.3.4(Sep 29, 2020)

0.3.3(Sep 15, 2020)

0.3.2(Aug 30, 2020)

0.3.1(Aug 21, 2020)

0.3.0(Aug 16, 2020)

v0.1.8(May 18, 2020)

v0.1.7(Feb 20, 2020)

Owner

Keunwoo Choi

Algorithmic and AI MIDI Drums Generator Implementation

Stream Music 🎵 𝘼 𝙗𝙤𝙩 𝙩𝙝𝙖𝙩 𝙘𝙖𝙣 𝙥𝙡𝙖𝙮 𝙢𝙪𝙨𝙞𝙘 𝙤𝙣 𝙏𝙚𝙡𝙚𝙜𝙧𝙖𝙢 𝙂𝙧𝙤𝙪𝙥 𝙖𝙣𝙙 𝘾𝙝𝙖𝙣𝙣𝙚𝙡 𝙑𝙤𝙞𝙘𝙚 𝘾𝙝𝙖𝙩𝙨 𝘼𝙫𝙖𝙞𝙡?

Omniscient Mozart, being able to transcribe everything in the music, including vocal, drum, chord, beat, instruments, and more.

convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

Welcome to Nexus. Your personal virtual assistant

Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

A GUI-based audio player with support for a large variety of formats

Real-Time Spherical Microphone Renderer for binaural reproduction in Python

Telegram Voice-Chat Bot Written In Python Using Pyrogram.

Algorithmic Multi-Instrumental MIDI Continuation Implementation

A Simple Script that will help you to Play / Change Songs with just your Voice

A python library for working with praat, textgrids, time aligned audio transcripts, and audio files.

Gateware for the Terasic/Arrow DECA board, to become a USB2 high speed audio interface

Use python MIDI to write some simple music

This is my voice assistant Patric!

Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm.

A python package for calculating the PESQ.

Guide & Examples to create deeplearning gstreamer plugins and use them in your pipeline

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Free and Open Source Channel/Group Voice chat music player for telegram with button support saavn playback support.