Start-to-finish tutorial for interactive music co-creation in PyTorch and Tensorflow.js

Overview

Interactive music co-creation with PyTorch and TensorFlow.js

This tutorial is a start-to-finish demonstration (click here for result) of building an interactive music co-creation system in two parts:

  1. Training a generative model of music in Python (via PyTorch)
  2. Deploying it in JavaScript for interaction (via TensorFlow.js)

This demonstration was prepared by Chris Donahue as part of an ISMIR 2021 tutorial on Designing generative models for interactive co-creation, co-organized by Anna Huang and Jon Gillick.

The example generative model we will train and deploy is Piano Genie (Donahue et al. 2019). Piano Genie allows anyone to improvise on the piano by mapping performances on a miniature 8-button keyboard to realistic performances on a full 88-key piano in real time. We train Piano Genie by autoencoding expert piano performances: an encoder maps 88-key piano performances into 8-button "button performances", and a decoder attempts to reconstruct the piano performance from the button performance. At interaction time, we replace the encoder with a user performing on the buttons. At a low-level, the decoder is an LSTM that operates on symbolic music data (i.e., MIDI), and is lightweight enough for real-time performance on mobile CPUs.

Part 1: Training in Python

Open In Colab

This part of the tutorial involves training a music generative model (Piano Genie) from scratch in PyTorch, which comes in the form of a self-contained Google Colab notebook. The instructions for this part are embedded in the Colab, and the model takes about an hour to train on Colab's free GPUs. The outputs of this part are: (1) a model checkpoint, and (2) serialized inputs and outputs for a test case, which we will use to check correctness of our JavaScript port in the next part.

Part 2: Deploying in JavaScript

Remix on Glitch

This part of the tutorial involves porting the trained generative model from PyTorch to TensorFlow.js, and hooking the model up to a simple UI to allow users to interact with the model. The final result is this simple web demo. The static files for this demo are located in the part-2-js-interaction directory.

We use JavaScript as the target language for interaction because, unlike Python, it allows for straightforward prototyping and sharing of interactive UIs. However, note that JavaScript is likely not the best target when building tools that musicians can integrate into their workflows. For this, you probably want to integrate with digital audio workstations through C++ plugin frameworks like VSTs or visual programming environments like Max/MSP.

At time of writing, TensorFlow.js is the most mature JavaScript framework for client-side inference of machine learning models. If your Python model is written in TensorFlow, porting it to TensorFlow.js will be much easier, or perhaps even automatic. However, if you prefer to use a different Python framework like PyTorch or JAX, porting to TensorFlow.js can be a bit tricky, but not insurmountable!

Porting weights from PyTorch to TensorFlow.js

At the end of Part 1, we exported our model parameters from PyTorch to a format that TensorFlow.js can recognize. For convenience, we've also included a relevant snippet here:

import pathlib
from tensorflowjs.write_weights import write_weights
import torch

# Load saved model params
d = torch.load("model.pt", map_location=torch.device("cpu"))

# Convert to simple dictionary of named numpy arrays
d = {k: v.numpy() for k, v in d.items()}

# Save in TensorFlow.js format
pathlib.Path("output").mkdir(exist_ok=True)
write_weights([[{"name": k, "data": v} for k, v in d.items()]], "output")

This snippet will produce output/group1-shard1of1.bin, a binary file containing the model parameters, and output/weights_manifest.json a JSON spec which informs TensorFlow.js how to unpack the binary file into a JavaScript Object. Both of these files must be hosted in the same directory when loading the weights with TensorFlow.js

Creating a test case

At the end of Part 1, we also created a test case—a pair of raw inputs to and outputs from our trained model. For convenience, we've also included a relevant snippet here:

import json
import torch

# Restore model from saved checkpoint
model = Model()
model.load_state_dict(torch.load("model.pt", map_location=device))
model.eval()

# Serialize a batch of inputs/outputs as JSON
with torch.no_grad():
    inputs = get_batch()
    outputs = model(inputs)
    test = {
        "inputs": inputs.cpu().numpy().tolist(),
        "outputs": outputs.cpu().numpy().tolist(),
    }
    with open("test.json", "w") as f:
        f.write(json.dumps(test))

This snippet will produce test.json, a JSON-encoded file containing serialized inputs and outputs for our model. We can use this later to check our ported model for correctness.

Redefining the model in TensorFlow.js

Next, we will write equivalent TensorFlow.js code to reimplement our PyTorch model. This is tricky, and will likely require unit testing (I have ported several models from Python to JavaScript and have yet to get it right on the first try) as well as a lot of poking around in the documentation.

One tip is to try to make the APIs for the Python and JavaScript models as similar as possible. Here is a side-by-side comparison between the reference PyTorch model (from the Colab notebook) and the TensorFlow.js equivalent (from part-2-js-interaction/modules.js):

PyTorch model TensorFlow.js equivalent
class PianoGenieDecoder(nn.Module):
    def __init__(self, rnn_dim=128, rnn_num_layers=2):
        super().__init__()
        self.rnn_dim = rnn_dim
        self.rnn_num_layers = rnn_num_layers
        self.input = nn.Linear(PIANO_NUM_KEYS + 3, rnn_dim)
        self.lstm = nn.LSTM(
            rnn_dim,
            rnn_dim,
            rnn_num_layers,
            batch_first=True,
            bidirectional=False,
        )
        self.output = nn.Linear(rnn_dim, 88)
    
    def init_hidden(self, batch_size, device=None):
        h = torch.zeros(self.rnn_num_layers, batch_size, self.rnn_dim, device=device)
        c = torch.zeros(self.rnn_num_layers, batch_size, self.rnn_dim, device=device)
        return (h, c)

    def forward(self, k_m1, t, b, h_0=None):
        # Encode input
        inputs = [
            F.one_hot(k_m1, PIANO_NUM_KEYS + 1),
            t.unsqueeze(dim=2),
            b.unsqueeze(dim=2),
        ]
        x = torch.cat(inputs, dim=2)

        # Project encoded inputs
        x = self.input(x)

        # Run RNN
        if h_0 is None:
            h_0 = self.init_hidden(k.shape[0], device=k.device)
        x, h_N = self.lstm(x, h_0)

        # Compute logits
        hat_k = self.output(x)

        return hat_k, h_N
class PianoGenieDecoder extends Module {
  constructor(rnnDim, rnnNumLayers) {
    super();
    this.rnnDim = rnnDim === undefined ? 128 : rnnDim;
    this.rnnNumLayers = rnnNumLayers === undefined ? 2 : rnnNumLayers;
  }

  initHidden(batchSize) {
    // NOTE: This allocates memory that must later be freed
    const c = [];
    const h = [];
    for (let i = 0; i < this.rnnNumLayers; ++i) {
      c.push(tf.zeros([batchSize, this.rnnDim], "float32"));
      h.push(tf.zeros([batchSize, this.rnnDim], "float32"));
    }
    return new LSTMHiddenState(c, h);
  }

  forward(kim1, ti, bi, him1) {
    // Encode input
    const inputs = [
      tf.oneHot(kim1, PIANO_NUM_KEYS + 1),
      tf.expandDims(ti, 1),
      tf.expandDims(bi, 1)
    ];
    let x = tf.concat(inputs, 1);

    // Project encoded inputs
    x = tf.add(
      tf.matMul(x, this._params["dec.input.weight"], false, true),
      this._params[`dec.input.bias`]
    );

    // Run RNN
    if (him1 === undefined || him1 === null) {
      him1 = this.initHidden(kim1.shape[0]);
    }
    const [hic, hih] = tf.multiRNNCell(this._cells, x, him1.c, him1.h);
    x = hih[this.rnnNumLayers - 1];
    const hi = new LSTMHiddenState(hic, hih);

    // Compute logits
    const hatki = tf.add(
      tf.matMul(x, this._params["dec.output.weight"], false, true),
      this._params[`dec.output.bias`]
    );

    return [hatki, hi];
  }
}

While these models have similar APIs, note that the PyTorch model takes as input entire sequences, i.e., tensors of shape [batch_size, seq_len], while the TensorFlow.js equivalent takes an individual timestep as input, i.e., tensors of shape [batch_size]. This is because in Python, we are passing as input full sequences of training data, while in JavaScript, we will be using the model in an on-demand fashion, passing in user inputs as they become available.

Note that this implementation makes use of several helpers, such as a Module class which mocks behavior in torch.nn.Module, and a factory method pyTorchLSTMCellFactory which handles subtle differences in the LSTM implementations between PyTorch and TensorFlow.js. See part-2-js-interaction/modules.js for full implementation.

Testing for correctness

Now that we have our model redefined in JavaScript, it is critical that we test it to ensure the behavior is identical to that of the original Python version. The easiest way to do this is to serialize a batch of inputs to and outputs from your Python model as JSON. Then, you can run those inputs through your JavaScript model and check if the outputs are identical (modulo some inevitable numerical precision error). Such a test case might look like this (from part-2-js-interaction/modules.js):

0.015) { console.log(totalErr); throw "Failed test"; } // Check for memory leaks him1.dispose(); decoder.dispose(); if (tf.memory().numBytes !== numBytesBefore) { throw "Memory leak"; } quantizer.dispose(); console.log("Passed test"); } ">
async function testPianoGenieDecoder() {
  const numBytesBefore = tf.memory().numBytes;

  // Create model
  const quantizer = new IntegerQuantizer();
  const decoder = new PianoGenieDecoder();
  await decoder.init();

  // Fetch test case
  const t = await fetch(TEST_CASE_URI).then(r => r.json());

  // Run test
  let totalErr = 0;
  let him1 = null;
  for (let i = 0; i < 128; ++i) {
    him1 = tf.tidy(() => {
      const kim1 = tf.tensor(t["input_keys"][i], [1], "int32");
      const ti = tf.tensor(t["input_dts"][i], [1], "float32");
      let bi = tf.tensor(t["input_buttons"][i], [1], "float32");
      bi = quantizer.discreteToReal(bi);
      const [khati, hi] = decoder.forward(kim1, ti, bi, him1);

      const expectedLogits = tf.tensor(
        t["output_logits"][i],
        [1, 88],
        "float32"
      );
      const err = tf.sum(tf.abs(tf.sub(khati, expectedLogits))).arraySync();
      totalErr += err;

      if (him1 !== null) him1.dispose();
      return hi;
    });
  }

  // Check equivalence to expected outputs
  if (isNaN(totalErr) || totalErr > 0.015) {
    console.log(totalErr);
    throw "Failed test";
  }

  // Check for memory leaks
  him1.dispose();
  decoder.dispose();
  if (tf.memory().numBytes !== numBytesBefore) {
    throw "Memory leak";
  }
  quantizer.dispose();

  console.log("Passed test");
}

Note that this function makes use of the tf.tidy wrapper. TensorFlow.js is unable to automatically manage memory when running on GPUs via WebGL, so best practices for writing TensorFlow.js code involve some amount of manual memory management. The tf.tidy wrapper makes this easy—any tensor that is allocated during (but not returned by) the wrapped function will be automatically freed when the wrapped function finishes. However, in this case we have two sets of tensors that must persist across model calls: the model's parameters and the RNN memory. Unfortunately, we will have to carefully and manually dispose of these to prevent memory leaks.

Hooking model up to simple UI

Now that we have ported and tested our model, we can finally have some fun and start building out the interactive elements! Our demo includes a simple HTML UI with 8 buttons, and a script which hooks the frontend up to the model. Our script makes use of the wonderful Tone.js library to quickly build out a polyphonic FM synthesizer. We also build a higher-level API around our low-level model port, to handle things like keeping track of state and sampling from a model's distribution.

While this is the stopping point of the tutorial, I would encourage you to experiment further. You're now at the point where the benefits of porting the model to JavaScript are clear: JavaScript makes it fairly straightforward to add additional functionality to enrich the interaction. For example, you could add an piano keyboard display to visualize Piano Genie's outputs, bind the space bar to act as a sustain pedal to Piano Genie, or you could use the WebMIDI API to output notes to a hardware synthesizer (hint: all of this functionality is built into Monica Dinculescu's official Piano Genie demo). The possibilities are endless!

End matter

Licensing info

This tutorial uses the MAESTRO dataset (Hawthorne et al. 2018), which is distributed under a CC BY-NC-SA 4.0 license. Because of the ShareAlike clause, the material in this tutorial is also distributed under that same license.

Acknowledgements

Thanks to Ian Simon and Sander Dieleman, co-authors on Piano Genie, and to Monica Dinculescu for creating the original Piano Genie demo.

Attribution

If this tutorial was useful to you, please consider citing this repository.

@software{donahue2021tutorial,
  author={Donahue, Chris and Huang, Cheng-Zhi Anna and Gillick, Jon},
  title={Interactive music co-creation with {PyTorch} and {TensorFlow.js}},
  url={https://github.com/chrisdonahue/music-cocreation-tutorial},
  year={2021}
}

If Piano Genie was useful to you, please consider citing our original paper.

@inproceedings{donahue2019genie,
  title={Piano Genie},
  author={Donahue, Chris and Simon, Ian and Dieleman, Sander},
  booktitle={Proceedings of the 24th ACM Conference on Intelligent User Interfaces},
  year={2019}
}
Owner
Chris Donahue
Postdoc @ Stanford CS. Machine learning for music, creativity, and humans.
Chris Donahue
A trusty face recognition research platform developed by Tencent Youtu Lab

Introduction TFace: A trusty face recognition research platform developed by Tencent Youtu Lab. It provides a high-performance distributed training fr

Tencent 956 Jan 01, 2023
Voxel Transformer for 3D object detection

Voxel Transformer This is a reproduced repo of Voxel Transformer for 3D object detection. The code is mainly based on OpenPCDet. Introduction We provi

173 Dec 25, 2022
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

TUCH This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright License fo

Lea Müller 45 Jan 07, 2023
Code for the paper "Location-aware Single Image Reflection Removal"

Location-aware Single Image Reflection Removal The shown images are provided by the datasets from IBCLN, ERRNet, SIR2 and the Internet images. The cod

72 Dec 08, 2022
VQMIVC - Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion (Interspeech

Disong Wang 262 Dec 31, 2022
Image processing in Python

scikit-image: Image processing in Python Website (including documentation): https://scikit-image.org/ Mailing list: https://mail.python.org/mailman3/l

Image Processing Toolbox for SciPy 5.2k Dec 31, 2022
Interpretable-contrastive-word-mover-s-embedding

Interpretable-contrastive-word-mover-s-embedding Paper Datasets Here is a Dropbox link to the datasets used in the paper: https://www.dropbox.com/sh/n

0 Nov 02, 2021
shufflev2-yolov5:lighter, faster and easier to deploy

shufflev2-yolov5: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size

pogg 1.5k Jan 05, 2023
Contrastive Language-Image Pretraining

CLIP [Blog] [Paper] [Model Card] [Colab] CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pair

OpenAI 11.5k Jan 08, 2023
Official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space

NeuralFusion This is the official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space. We provide code to train the proposed pipel

53 Jan 01, 2023
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.datasets: The raw text iterators for common NLP datasets torchtext.data: Some basic NLP building bloc

3.2k Jan 08, 2023
SatelliteNeRF - PyTorch-based Neural Radiance Fields adapted to satellite domain

SatelliteNeRF PyTorch-based Neural Radiance Fields adapted to satellite domain.

Kai Zhang 46 Nov 20, 2022
Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB) Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dez

3 Nov 19, 2022
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yu

UT-Austin Robot Perception and Learning Lab 63 Jan 03, 2023
Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI

EmotionUI Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI. demo screenshot (with RealSense) required packages Python = 3.6 num

Yang Jiao 2 Dec 23, 2021
Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)

Efficient Two-Step Networks for Temporal Action Segmentation This repository provides a PyTorch implementation of the paper Efficient Two-Step Network

8 Apr 16, 2022
Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

Chong Yu Quan 6 Dec 31, 2022
Apply a perspective transformation to a raster image inside Inkscape (no need to use an external software such as GIMP or Krita).

Raster Perspective Apply a perspective transformation to bitmap image using the selected path as envelope, without the need to use an external softwar

s.ouchene 19 Dec 22, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022
FindFunc is an IDA PRO plugin to find code functions that contain a certain assembly or byte pattern, reference a certain name or string, or conform to various other constraints.

FindFunc: Advanced Filtering/Finding of Functions in IDA Pro FindFunc is an IDA Pro plugin to find code functions that contain a certain assembly or b

213 Dec 17, 2022