Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

Overview

Self-Supervised Learning (SimCLR) with Biological Plausible Image Augmentations

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop Shared Visual Representations in Human and Machine Intelligence (SVRHM). OpenReviews

Is it possible that human learn their visual representations with a self-supervised learning framework similar to the machines? Popular self-supervised learning framework encourages the model to learn similar representations invariant to the augmentations of the images. Is it possible to learn good visual representation using the natural "image augmentations" available to our human visual system?

In this project, we reverse-engineered the key data augmentations that support the learned representation quality , namely random resized crop and blur. We hypothesized that saccade and foveation in our visual processes, is the equivalence of random crops and blur. We implement these biological plausible transformation of images and test if they could confer the same representation quality as those engineered ones.

Our experimental pipeline is based on the pytorch SimCLR implemented by sthalles and by Spijkervet. Our development supports our biologically inspired data augmentations, visualization and post hoc data analysis.

Usage

Colab Tutorials

  • Open In Colab Tutorial: Demo of Biological transformations
  • Open In Colab Tutorial: Augmentation pipeline applied to the STL10 dataset
  • Open In Colab Tutorial: Demo of Training STL10
  • Open In Colab Tutorial: Sample training and evaluation curves.

Local Testing

For running a quick demo of training, replace the $Datasets_path with the parent folder of stl10_binary (e.g. .\Datasets). You could download and extract STL10 from here. Replace $logdir with the folder to save all running logs and checkpoints, then you can use tensorboard --logdir $logdir to view the training process.

python run_magnif.py -data $Datasets_path -dataset-name stl10 --workers 16 --log_root $logdir\
	--ckpt_every_n_epocs 5 --epochs 100  --batch-size 256  --out_dim 256  \
	--run_label proj256_eval_magnif_cvr_0_05-0_35 --magnif \
	--cover_ratio 0.05 0.35  --fov_size 20  --K  20  --sampling_bdr 16 

Code has been tested on Ubuntu and Windows10 system.

Cluster Testing

For running in docker / on cluster, we used the following pytorch docker image pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.9. For settings for LSF Spectrum cluster, you can refer to scripts. These jobs are submitted via bsub < $name_of_script

To support multi-worker data-preprocessing, export LSF_DOCKER_SHM_SIZE=16g need to be set beforehand. Here is the example script for setting up an interactive environment to test out the code.

export LSF_DOCKER_SHM_SIZE=16g 
bsub -Is -M 32GB -q general-interactive -R 'gpuhost' -R  'rusage[mem=32GB]'  -gpu "num=1:gmodel=TeslaV100_SXM2_32GB" -a 'docker(pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.9)' /bin/bash

Multi-GPU training has not been tested.

Implementation

We implemented foveation in two ways: one approximating our perception, the other approximating the cortical representation of the image. In our perception, we can see with highest resolution at the fixation point, while the peripheral vision is blurred and less details could be recognized (Arturo; Simoncelli 2011). Moreover, when we change fixation across the image, the whole scene still feels stable without shifting. So we model this perception as a spatially varying blur of image as people classically did.

In contrast, from a neurobiological view, our visual cortex distorted the retinal input: a larger cortical area processes the input at fovea than that for periphery given the same image size. This is known as the cortical magnification. Pictorially, this is magnifying and over-representing the image around the fixation points. We model this transform with sampling the original image with a warpped grid.

These two different views of foveation (perceptual vs neurobiological) were implemented and compared as data augmentations in SimCLR.

Structure of Repo

  • Main command line interface
    • run.py Running baseline training pipeline without bio-inspired augmentations.
    • run_salcrop.py Running training pipeline with options for foveation transforms and saliency based sampling.
    • run_magnif.py Running training pipeline with options for foveation transforms and saliency based sampling.
  • data_aug\, implementation of our bio-inspired augmentations
  • posthoc\, analysis code for training result.
  • scripts\, scripts that run experiments on cluster.

Dependency

  • pytorch. Tested with version 1.7.1-1.10.0
  • kornia pip install kornia. Tested with version 0.3.1-0.6.1.
  • FastSal, we forked and modified a few lines of original to make it compatible with current pytorch 3.9 and torchvision.

Inquiries: [email protected]

Owner
Binxu
PhD student in System Neuro @PonceLab @Harvard, using generative models, CNN and optimization to understand brain Previously: Louis Tao
Binxu
TorchX: A PyTorch Extension Library for More Efficient Deep Learning

TorchX TorchX: A PyTorch Extension Library for More Efficient Deep Learning. @misc{torchx, author = {Ansheng You and Changxu Wang}, title = {T

Donny You 8 May 28, 2022
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

flow-dev 2 Aug 21, 2022
Tensor-based approaches for fMRI classification

tensor-fmri Using tensor-based approaches to classify fMRI data from StarPLUS. Citation If you use any code in this repository, please cite the follow

4 Sep 07, 2022
Multimodal Temporal Context Network (MTCN)

Multimodal Temporal Context Network (MTCN) This repository implements the model proposed in the paper: Evangelos Kazakos, Jaesung Huh, Arsha Nagrani,

Evangelos Kazakos 13 Nov 24, 2022
Code repository for our paper "Learning to Generate Scene Graph from Natural Language Supervision" in ICCV 2021

Scene Graph Generation from Natural Language Supervision This repository includes the Pytorch code for our paper "Learning to Generate Scene Graph fro

Yiwu Zhong 64 Dec 24, 2022
基于Paddle框架的arcface复现

arcface-Paddle 基于Paddle框架的arcface复现 ArcFace-Paddle 本项目基于paddlepaddle框架复现ArcFace,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待 参考项目: InsightFace Padd

QuanHao Guo 16 Dec 15, 2022
A quick recipe to learn all about Transformers

Transformers have accelerated the development of new techniques and models for natural language processing (NLP) tasks.

DAIR.AI 772 Dec 31, 2022
Fully convolutional deep neural network to remove transparent overlays from images

Fully convolutional deep neural network to remove transparent overlays from images

Marc Belmont 1.1k Jan 06, 2023
Code release for NeRF (Neural Radiance Fields)

NeRF: Neural Radiance Fields Project Page | Video | Paper | Data Tensorflow implementation of optimizing a neural representation for a single scene an

6.5k Jan 01, 2023
A super lightweight Lagrangian model for calculating millions of trajectories using ERA5 data

Easy-ERA5-Trck Easy-ERA5-Trck Galleries Install Usage Repository Structure Module Files Version iteration Easy-ERA5-Trck is a super lightweight Lagran

Zhenning Li 26 Nov 19, 2022
Deep Learning for 3D Point Clouds: A Survey (IEEE TPAMI, 2020)

🔥Deep Learning for 3D Point Clouds (IEEE TPAMI, 2020)

Qingyong 1.4k Jan 08, 2023
Implementation of paper "Self-supervised Learning on Graphs:Deep Insights and New Directions"

SelfTask-GNN A PyTorch implementation of "Self-supervised Learning on Graphs: Deep Insights and New Directions". [paper] In this paper, we first deepe

Wei Jin 85 Oct 13, 2022
[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

NYU-VPR This repository provides the experiment code for the paper Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymiza

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 22 Sep 28, 2022
Standalone pre-training recipe with JAX+Flax

Sabertooth Sabertooth is standalone pre-training recipe based on JAX+Flax, with data pipelines implemented in Rust. It runs on CPU, GPU, and/or TPU, b

Nikita Kitaev 26 Nov 28, 2022
Streamlit app demonstrating an image browser for the Udacity self-driving-car dataset with realtime object detection using YOLO.

Streamlit Demo: The Udacity Self-driving Car Image Browser This project demonstrates the Udacity self-driving-car dataset and YOLO object detection in

Streamlit 992 Jan 04, 2023
Explainable Zero-Shot Topic Extraction

Zero-Shot Topic Extraction with Common-Sense Knowledge Graph This repository contains the code for reproducing the results reported in the paper "Expl

D2K Lab 56 Dec 14, 2022
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

Junbin Xiao 50 Nov 24, 2022
Permeability Prediction Via Multi Scale 3D CNN

Permeability-Prediction-Via-Multi-Scale-3D-CNN Data: The raw CT rock cores are obtained from the Imperial Colloge portal. The CT rock cores are sub-sa

Mohamed Elmorsy 2 Jul 06, 2022
Crowd-sourced Annotation of Human Motion.

Motion Annotation Tool Live: https://motion-annotation.humanoids.kit.edu Paper: The KIT Motion-Language Dataset Installation Start by installing all P

Matthias Plappert 4 May 25, 2020
BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

BABEL is a large dataset with language labels describing the actions being performed in mocap sequences. BABEL labels about 43 hours of mocap sequences from AMASS [1] with action labels.

113 Dec 28, 2022