Are Convolutional Neural Networks or Transformers more like human vision?

This repository contains the code and fine-tuned models of popular Convolutional Neural Networks (CNNs) and the recently proposed Vision Transformer (ViT) on the augmented Imagenet dataset and the shape/texture bias tests run on the Stylized Imagenet dataset.

This work compares CNNs and the ViT against humans in terms of error consistency beyond traditional metrics. Through these tests, we were able to show that recently proposed self-attention based Transformer models have more human-like errors that traditional CNNs.

Colab

You can directly run tests on the results using a Google Colaboratory without needing to install anything on your local machine. Click "Open in Colab" below:

Developer

Shikhar Tuli. For any questions, comments or suggestions, please reach me at [email protected].

Cite this work

If you use our experimental results or fine-tuned models, please cite:

@article{tuli2021cogsci,
      title={Are Convolutional Neural Networks or Transformers more like human vision?}, 
      author={Shikhar Tuli and Ishita Dasgupta and Erin Grant and Thomas L. Griffiths},
      year={2021},
      eprint={2105.07197},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Study of human inductive biases in CNNs and Transformers.

Related tags

Overview

Are Convolutional Neural Networks or Transformers more like human vision?

Colab

Developer

Cite this work

Owner

Shikhar Tuli

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

Pytorch implementation of DeepMind's differentiable neural computer paper.

Large dataset storage format for Pytorch

Causal Imitative Model for Autonomous Driving

Fully Connected DenseNet for Image Segmentation

Surrogate-Assisted Genetic Algorithm for Wrapper Feature Selection

Fuzzing the Kernel Using Unicornafl and AFL++

N-gram models- Unsmoothed, Laplace, Deleted Interpolation

Bi-level feature alignment for versatile image translation and manipulation (Under submission of TPAMI)

Image Fusion Transformer

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

PyTorch implementation of the Crafting Better Contrastive Views for Siamese Representation Learning

Data cleaning, missing value handle, EDA use in this project

Revisiting Global Statistics Aggregation for Improving Image Restoration

Understanding Convolutional Neural Networks from Theoretical Perspective via Volterra Convolution

Unofficial implementation of the ImageNet, CIFAR 10 and SVHN Augmentation Policies learned by AutoAugment using pillow

This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

Vehicle Detection Using Deep Learning and YOLO Algorithm

Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data