MoCoGAN: Decomposing Motion and Content for Video Generation

Last update: Dec 18, 2022

Overview

MoCoGAN: Decomposing Motion and Content for Video Generation

This repository contains an implementation and further details of MoCoGAN: Decomposing Motion and Content for Video Generation by Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz.

CVPR Poster:

Representation

MoCoGAN is a generative model for videos, which generates videos from random inputs. It features separated representations of motion and content, offering control over what is generated. For example, MoCoGAN can generate the same object performing different actions, as well as the same action performed by different objects

Examples of generated videos

We trained MoCoGAN on the MUG Facial Expression Database to generate facial expressions. When fixing the content code and changing the motion code, it generated the same person performs different expressions. When fixing the motion code and changing the content code, it generated different people performs the same expression. In the figure shown below, each column has fixed identity, each row shows the same action:

We trained MoCoGAN on a human action dataset where content is represented by the performer, executing several actions. When fixing the content code and changing the motion code, it generated the same person performs different actions. When fixing the motion code and changing the content code, it generated different people performs the same action. Each pair of images represents the same action executed by different people:

We have collected a large-scale TaiChi dataset including 4.5K videos of TaiChi performers. Below are videos generated by MoCoGAN.

Training MoCoGAN

Please refer to a wiki page

Citation

If you use MoCoGAN in your research please cite our paper:

Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz, "MoCoGAN: Decomposing Motion and Content for Video Generation"

@inproceedings{Tulyakov:2018:MoCoGAN,
 title={{MoCoGAN}: Decomposing motion and content for video generation},
 author={Tulyakov, Sergey and Liu, Ming-Yu and Yang, Xiaodong and Kautz, Jan},
 booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
 pages = {1526--1535},
 year={2018}
}

MoCoGAN: Decomposing Motion and Content for Video Generation

Related tags

Overview

MoCoGAN: Decomposing Motion and Content for Video Generation

Representation

Examples of generated videos

Training MoCoGAN

Citation

Other implementations:

Owner

Sergey Tulyakov

Alphabetical Letter Recognition

RMNA: A Neighbor Aggregation-Based Knowledge Graph Representation Learning Model Using Rule Mining

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

Deep Learning segmentation suite designed for 2D microscopy image segmentation

Myia prototyping

Learning to Initialize Neural Networks for Stable and Efficient Training

Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

Code for the paper "Attention Approximates Sparse Distributed Memory"

Source code for TACL paper "KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation".

Official implementation of Deep Convolutional Dictionary Learning for Image Denoising.

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

《Deep Single Portrait Image Relighting》(ICCV 2019)

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Multivariate Time Series Transformer, public version

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

B-cos Networks: Attention is All we Need for Interpretability

PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

Does MAML Only Work via Feature Re-use? A Data Set Centric Perspective

One-line your code easily but still with the fun of doing so!