This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Overview

Orientation independent Möbius CNNs





This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Background (tl;dr)

All derivations and a detailed description of the models are found in Section 5 of our paper. What follows is an informal tl;dr, summarizing the central aspects of Möbius CNNs.

Feature fields on the Möbius strip: A key characteristic of the Möbius strip is its topological twist, making it a non-orientable manifold. Convolutional weight sharing on the Möbius strip is therefore only well defined up to a reflection of kernels. To account for the ambiguity of kernel orientations, one needs to demand that the kernel responses (feature vectors) transform in a predictable way when different orientations are chosen. Mathematically, this transformation is specified by a group representation ρ of the reflection group. We implement three different feature field types, each characterized by a choice of group representation:

  • scalar fields are modeled by the trivial representation. Scalars stay invariant under reflective gauge transformations:

  • sign-flip fields transform according to the sign-flip representation of the reflection group. Reflective gauge transformations negate the single numerical coefficient of a sign-flip feature:

  • regular feature fields are associated to the regular representation. For the reflection group, this implies 2-dimensional features whose two values (channels) are swapped by gauge transformations:

Reflection steerable kernels (gauge equivariance):

Convolution kernels on the Möbius strip are parameterized maps

whose numbers of input and output channels depend on the types of feature fields between which they map. Since a reflection of a kernel should result in a corresponding transformation of its output feature field, the kernel has to obey certain symmetry constraints. Specifically, kernels have to be reflection steerable (or gauge equivariant), i.e. should satisfy:

The following table visualizes this symmetry constraint for any pair of input and output field types that we implement:

Similar equivariance constraints are imposed on biases and nonlinearities; see the paper for more details.

Isometry equivariance: Shifts of the Möbius strip along itself are isometries. After one revolution (a shift by 2π), points on the strip do not return to themselves, but end up reflected along the width of the strip:

Such reflections of patterns are explained away by the reflection equivariance of the convolution kernels. Orientation independent convolutions are therefore automatically equivariant w.r.t. the action of such isometries on feature fields. Our empirical results, shown in the table below, confirm that this theoretical guarantee holds in practice. Conventional CNNs, on the other hand, are explicitly coordinate dependent, and are therefore in particular not isometry equivariant.

Implementation

Neural network layers are implemented in nn_layers.py while the models are found in models.py. All individual layers and all models are unit tested in unit_tests.py.

Feature fields: We assume Möbius strips with a locally flat geometry, i.e. strips which can be thought of as being constructed by gluing two opposite ends of a rectangular flat stripe together in a twisted way. Feature fields are therefore discretized on a regular sampling grid on a rectangular domain of pixels. Note that this choice induces a global gauge (frame field), which is discontinuous at the cut.

In practice, a neural network operates on multiple feature fields which are stacked in the channel dimension (a direct sum). Feature spaces are therefore characterized by their feature field multiplicities. For instance, one could have 10 scalar fields, 4 sign-flip fields and 8 regular feature fields, which consume in total channels. Denoting the batch size by , a feature space is encoded by a tensor of shape .

The correct transformation law of the feature fields is guaranteed by the coordinate independence (steerability) of the network layers operating on it.

Orientation independent convolutions and bias summation: The class MobiusConv implements orientation independent convolutions and bias summations between input and output feature spaces as specified by the multiplicity constructor arguments in_fields and out_fields, respectively. Kernels are as usual discretized by a grid of size*size pixels. The steerability constraints on convolution kernels and biases are implemented by allocating a reduced number of parameters, from which the symmetric (steerable) kernels and biases are expanded during the forward pass.

Coordinate independent convolutions rely furthermore on parallel transporters of feature vectors, which are implemented as a transport padding operation. This operation pads both sides of the cut with size//2 columns of pixels which are 1) spatially reflected and 2) reflection-steered according to the field types. The stripes are furthermore zero-padded along their width.

The forward pass operates then by:

  • expanding steerable kernels and biases from their non-redundant parameter arrays
  • transport padding the input field array
  • running a conventional Euclidean convolution

As the padding added size//2 pixels around the strip, the spatial resolution of the output field agrees with that of the input field.

Orientation independent nonlinearities: Scalar fields and regular feature fields are acted on by conventional ELU nonlinearities, which are equivariant for these field types. Sign-flip fields are processed by applying ELU nonlinearities to their absolute value after summing a learnable bias parameter. To ensure that the resulting fields are again transforming according to the sign-flip representation, we multiply them subsequently with the signs of the input features. See the paper and the class EquivNonlin for more details.

Feature field pooling: The module MobiusPool implements an orientation independent pooling operation with a stride and kernel size of two pixels, thus halving the fields' spatial resolution. Scalar and regular feature fields are pooled with a conventional max pooling operation, which is for these field types coordinate independent. As the coefficients of sign-flip fields negate under gauge transformations, they are pooled based on their (gauge invariant) absolute value.

While the pooling operation is tested to be exactly gauge equivariant, its spatial subsampling interferes inevitably with its isometry equivariance. Specifically, the pooling operation is only isometry equivariant w.r.t. shifts by an even number of pixels. Note that the same issue applies to conventional Euclidean CNNs as well; see e.g. (Azulay and Weiss, 2019) or (Zhang, 2019).

Models: All models are implemented in models.py. The orientation independent models, which differ only in their field type multiplicities but agree in their total number of channels, are implemented as class MobiusGaugeCNN. We furthermore implement conventional CNN baselines, one with the same number of channels and thus more parameters (α=1) and one with the same number of parameters but less channels (α=2). Since conventional CNNs are explicitly coordinate dependent they utilize a naive padding operation (MobiusPadNaive), which performs a spatial reflection of feature maps but does not apply the unspecified gauge transformation. The following table gives an overview of the different models:

Data - Möbius MNIST

We benchmark our models on Möbius MNIST, a simple classification dataset which consists of MNIST digits that are projected on the Möbius strip. Since MNIST digits are gray-scale images, they are geometrically identified as scalar fields. The size of the training set is by default set to 12000 digits, which agrees with the rotated MNIST dataset.

There are two versions of the training and test sets which consist of centered and shifted digits. All digits in the centered datasets occur at the same location (and the same orientation) of the strip. The isometry shifted digits appear at uniformly sampled locations. Recall that shifts once around the strip lead to a reflection of the digits as visualized above. The following digits show isometry shifted digits (note the reflection at the cut):

To generate the datasets it is sufficient to call convert_mnist.py, which downloads the original MNIST dataset via torchvision and saves the Möbius MNIST datasets in data/mobius_MNIST.npz.

Results

The models can then be trained by calling, for instance,

python train.py --model mobius_regular

For more options and further model types, consult the help message: python train.py -h

The following table gives an overview of the performance of all models in two different settings, averaged over 32 runs:

The setting "shifted train digits" trains and evaluates on isometry shifted digits. To test the isometry equivariance of the models, we train them furthermore on "centered train digits", testing them then out-of-distribution on shifted digits. As one can see, the orientation independent models generalize well over these unseen variations while the conventional coordinate dependent CNNs' performance deteriorates.

Dependencies

This library is based on Python3.7. It requires the following packages:

numpy
torch>=1.1
torchvision>=0.3

Logging via tensorboard is optional.

Owner
Maurice Weiler
AI researcher with a focus on geometric and equivariant deep learning. PhD candidate under the supervision of Max Welling. Master's degree in Physics.
Maurice Weiler
An implementation of based on pytorch and mmcv

FisherPruning-Pytorch An implementation of Group Fisher Pruning for Practical Network Compression based on pytorch and mmcv Main Functions Pruning f

Peng Lu 15 Dec 17, 2022
Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Clockwork VAEs in JAX/Flax Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported

Julius Kunze 26 Oct 05, 2022
Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation [Project website] [Paper] This project is a PyTorch i

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 6 Feb 28, 2022
ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers

ChatBot-Pytorch A GPT-2 ChatBot implemented using Pytorch and Huggingface-transf

ParZival 42 Dec 09, 2022
Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

One-Shot Voice Conversion with Weight Adaptive Instance Normalization By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain. This rep

31 Dec 07, 2022
Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline

vqvae_dwt_distiller.pytorch Simple improvement of VQVAE that allow to generate x2 sized images compared to baseline. It allows to generate 512x512 ima

Sergei Belousov 25 Jul 19, 2022
Official pytorch code for "APP: Anytime Progressive Pruning"

APP: Anytime Progressive Pruning Diganta Misra1,2,3, Bharat Runwal2,4, Tianlong Chen5, Zhangyang Wang5, Irina Rish1,3 1 Mila - Quebec AI Institute,2 L

Landskape AI 12 Nov 22, 2022
FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection arXi

59 Nov 29, 2022
SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation

SeqFormer: a Frustratingly Simple Model for Video Instance Segmentation SeqFormer SeqFormer: a Frustratingly Simple Model for Video Instance Segmentat

Junfeng Wu 298 Dec 22, 2022
Light-Head R-CNN

Light-head R-CNN Introduction We release code for Light-Head R-CNN. This is my best practice for my research. This repo is organized as follows: light

jemmy li 835 Dec 06, 2022
Automatic deep learning for image classification.

AutoDL AutoDL automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few line

wenqi 2 Oct 12, 2022
No Code AI/ML platform

NoCodeAIML No Code AI/ML platform - Community Edition Video credits: Uday Kiran Typical No Code AI/ML Platform will have features like drag and drop,

Bhagvan Kommadi 5 Jan 28, 2022
Spatial Action Maps for Mobile Manipulation (RSS 2020)

spatial-action-maps Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many ne

Jimmy Wu 27 Nov 30, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 160 Jan 07, 2023
Residual Pathway Priors for Soft Equivariance Constraints

Residual Pathway Priors for Soft Equivariance Constraints This repo contains the implementation and the experiments for the paper Residual Pathway Pri

Marc Finzi 13 Oct 12, 2022
NasirKhusraw - The TSP solved using genetic algorithm and show TSP path overlaid on a map of the Iran provinces & their capitals.

Nasir Khusraw : Travelling Salesman Problem The TSP solved using genetic algorithm. This project show TSP path overlaid on a map of the Iran provinces

J Brave 2 Sep 01, 2022
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 04, 2023
Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

7 Jun 22, 2022
Doosan robotic arm, simulation, control, visualization in Gazebo and ROS2 for Reinforcement Learning.

Robotic Arm Simulation in ROS2 and Gazebo General Overview This repository includes: First, how to simulate a 6DoF Robotic Arm from scratch using GAZE

David Valencia 12 Jan 02, 2023
基于AlphaPose的TensorRT加速

1. Requirements CUDA 11.1 TensorRT 7.2.2 Python 3.8.5 Cython PyTorch 1.8.1 torchvision 0.9.1 numpy 1.17.4 (numpy版本过高会出报错 this issue ) python-package s

52 Dec 06, 2022