Memoized coduals - Shows that it is possible to implement reverse mode autodiff using a variation on the dual numbers called the codual numbers

Overview

The dual numbers can do efficient autodiff!

The codual numbers are a simple method of doing automatic differentiation in reverse mode. They contrast with the dual numbers which provide an easy way of doing automatic differentiation in forward mode. The difference between the two modes is that sometimes one is faster than the other.

The folklore appears to be that forward mode autodiff is easy to implement because it can be done using the beautiful algebra of dual numbers, while the same is assumed to not be the case for reverse mode. This repository presents a counterargument that a variant of the dual numbers – called the codual numbers – can be used to represent an implementation of reverse mode autodiff that is just as elegant and terse as can be done for forward mode. This idea was first suggested by Sandro Magi (pseudonym: Naasking).

This implementation of the codual numbers differs from Sandro Magi’s by using simple memoisation to eliminate the exponential worst-case behaviour he encountered. In Magi’s original implementation, this idea seems obscured, largely because the code was more effectful and therefore the opportunity for memoisation was less apparent. The memoisation is achieved using only one additional line of code!

This implementation should be simpler and more transparent than Magi’s, I hope. It also suggests that Magi’s reasoning behind the term “codual numbers” is perhaps misleading.

Definition of dual number and codual number

The codual numbers are the set

\mathbb R \times \mathbb R,

while the codual numbers are a subset of

\mathbb R \times \mathbb R ^ {\mathbb R}

where the second component is always a linear map.

A notation that’s used to write a dual number is a + b \varepsilon, which stands for (a,b). Formally, \varepsilon^2 = 0 while \varepsilon \neq 0.

The codual numbers may be represented using exactly the same notation as the dual numbers. They are no different than the dual numbers, except in how they’re represented on a computer! Using lambda calculus notation (which I assume you are familiar with) any dual number (a,b) can be turned into the codual number (a, \lambda k. \,kb), and conversely every codual number (a,B) can be turned into the dual number (a,B(1)). The difference is merely one of data structure; we need a closure to represent the codual numbers.

The definition of an operation on the codual numbers can be inferred from its definition on the dual numbers. We demonstrate this using multiplication. For dual numbers, we may define multiplication by:

(a,a') \times (b,b') = (ab, ab' + ba').

For the codual numbers, we may use the correspondence (a,b') \mapsto (a, \lambda k. \,kb) to get:

(a,A) \times (b,B) = (ab, \lambda k. \,k\cdot(a\cdot B(1) + b\cdot A(1))),

where by “\cdot”, we mean multiplication of real numbers. Using the fact that A and B are linear maps, we can rearrange this to:

(a,A) \times (b,B) = (ab, \lambda k. \,B(ak) + A(bk))).

This is precisely how we define multiplication of codual numbers in the code.

Relationship with other autodiff strategies

It appears that there are three ways of doing reverse-mode autodiff, which correspond directly to the three “stages” of solving a problem using dynamic programming. See the table below:

Idea Example Corresponding autodiff algorithm
Unmemoised recursion Exhibit A Unmemoised coduals
Memoised recursion, or
top-down dynamic programming
Exhibit B Memoised coduals
Bottom-up dynamic programming Exhibit C Tape-based autodiff

This suggests that the tape-based approach can be derived from the coduals.

Exhibit A:

def fib(n):
    if n == 0 or n == 1:
        return n
    else:
        return fib(n-1) + fib(n-2)

Exhibit B:

from functools import cache

@cache
def fib(n):
    if n == 0 or n == 1:
        return n
    else:
        return fib(n-1) + fib(n-2)

Exhibit C:

def fib(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a
Owner
wlad
wlad
Code for "Neural 3D Scene Reconstruction with the Manhattan-world Assumption" CVPR 2022 Oral

News 05/10/2022 To make the comparison on ScanNet easier, we provide all quantitative and qualitative results of baselines here, including COLMAP, COL

ZJU3DV 365 Dec 30, 2022
DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

[ICLR'21] DARTS-: Robustly Stepping out of Performance Collapse Without Indicators [openreview] Authors: Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun

55 Nov 01, 2022
Multi Agent Path Finding Algorithms

MATP-solver Simulator collision check path step random initial states or given states Traditional method Seperate A* algorithem Confict-based Search S

30 Dec 12, 2022
Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

Active Learning with the Nvidia TLT Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT). In this tutorial, we will show you ho

Lightly 25 Dec 03, 2022
NeRF visualization library under construction

NeRF visualization library using PlenOctrees, under construction pip install nerfvis Docs will be at: https://nerfvis.readthedocs.org import nerfvis s

Alex Yu 196 Jan 04, 2023
Simulation-based performance analysis of server-less Blockchain-enabled Federated Learning

Blockchain-enabled Server-less Federated Learning Repository containing the files used to reproduce the results of the publication "Blockchain-enabled

Francesc Wilhelmi 9 Sep 27, 2022
Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

Graph Transformer - Pytorch Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2. This was recently used by bot

Phil Wang 97 Dec 28, 2022
Library for implementing reservoir computing models (echo state networks) for multivariate time series classification and clustering.

Framework overview This library allows to quickly implement different architectures based on Reservoir Computing (the family of approaches popularized

Filippo Bianchi 249 Dec 21, 2022
A PyTorch port of the Neural 3D Mesh Renderer

Neural 3D Mesh Renderer (CVPR 2018) This repo contains a PyTorch implementation of the paper Neural 3D Mesh Renderer by Hiroharu Kato, Yoshitaka Ushik

Daniilidis Group University of Pennsylvania 1k Jan 09, 2023
This repository is all about spending some time the with the original problem posed by Minsky and Papert

This repository is all about spending some time the with the original problem posed by Minsky and Papert. Working through this problem is a great way to begin learning computer vision.

Jaissruti Nanthakumar 1 Jan 23, 2022
[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation Getting Started Our codes are implemented and tested with pyth

ZiNiU WaN 176 Dec 15, 2022
A Python package for faster, safer, and simpler ML processes

Bender 🤖 A Python package for faster, safer, and simpler ML processes. Why use bender? Bender will make your machine learning processes, faster, safe

Otovo 6 Dec 13, 2022
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 03, 2023
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models This repository is the official implementation of the fol

DistributedML 41 Dec 06, 2022
Orthogonal Over-Parameterized Training

The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great impo

Weiyang Liu 11 Apr 18, 2022
Code for Multinomial Diffusion

Code for Multinomial Diffusion Abstract Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural ima

104 Jan 04, 2023
Microscopy Image Cytometry Toolkit

Cytokit Cytokit is a collection of tools for quantifying and analyzing properties of individual cells in large fluorescent microscopy datasets with a

Hammer Lab 106 Jan 06, 2023
Automated Melanoma Recognition in Dermoscopy Images via Very Deep Residual Networks

Introduction This repository contains the modified caffe library and network architectures for our paper "Automated Melanoma Recognition in Dermoscopy

Lequan Yu 47 Nov 24, 2022
Attention-driven Robot Manipulation (ARM) which includes Q-attention

Attention-driven Robotic Manipulation (ARM) This codebase is home to: Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation I

Stephen James 84 Dec 29, 2022
some classic model used to segment the medical images like CT、X-ray and so on

github_project This is a project for medical image segmentation. This project includes common medical image segmentation models such as U-net, FCN, De

2 Mar 30, 2022