Learning to Draw: Emergent Communication through Sketching

Overview

Learning to Draw: Emergent Communication through Sketching

This is the official code for the paper "Learning to Draw: Emergent Communication through Sketching".

ArXivPapers With CodeGetting StartedGame setupsModel setupDatasets

About

We demonstrate that it is possible for a communication channel based on line drawing to emerge between agents playing a visual referential communication game. Furthermore we show that with a simple additional self-supervised loss that the drawings the agent produces are interpretable by humans.

Getting started

You'll need to install the required dependencies listed in requirements.txt. This includes installing the differentiable rasteriser from the DifferentiableSketching repository, and the source version of https://github.com/pytorchbearer/torchbearer:

pip install git+https://github.com/jonhare/DifferentiableSketching.git
pip install git+https://github.com/pytorchbearer/torchbearer.git
pip install -r requirements.txt

Once the dependencies are installed, you can run the commgame.py script to train and test models:

python commgame.py train [args]
python commgame.py test [args]

For example, to train a pair of agents on the original game using the STL10 dataset (which will be downloaded if required), you would run:

python commgame.py train --dataset STL10 --output stl10-original-model --sigma2 5e-4 --nlines 20 --learning-rate 0.0001 --imagenet-weights --freeze-vgg --imagenet-norm --epochs 250 --invert --batch-size 100

The options --sigma2 and --nlines control the thickness and number of lines respectively. --imagenet-weights uses the standard pretrained imagenet vgg16 weights (use --sin-weights for stylized imagenet weights). Finally, --freeze-vgg freezes the backbone CNN, --imagenet-norm specifies to apply the imagenet normalisation to images (this should be used when using either imagenet or stylized imagenet weights), and --invert draws black strokes on a white canvas.

The training scripts compute a running communication rate in addition to loss and this is displayed as training progresses. After each epoch a validation pass is performed and images of the sketches and sender inputs and receiver targets are saved to the output directory along with a model snapshot. The output directory also contains a log file with the training and validation statistics per epoch.

Example commands to run the experiments in the paper are given in commands.md

Further details on commandline arguments are given below.

Game setups

All the setups involve a referential game where the reciever tries to select the "correct" image from a pool on the basis of a "sketch" provided by the sender. The primary measure of success is the communication rate. The different command line arguments to control the different game variants are listed in the following subsections:

Havrylov and Titov's Original Game Setup

Sender sees one image; Reciever sees many, where one is exactly the same as sender.

Number of reciever images (target + distractors) is controlled by the batch-size. Number of sender images per iteration can also be controlled for completeness, but defaults to the same as batch size (e.g. each forward pass with a batch plays all possible game combinations using each of the images as a target).

arguments:
--batch-size
[--sender-images-per-iter]

Object-oriented Game Setup (same)

Sender sees one image; Reciever sees many, where one is exactly the same as sender and the others are all of different classes.

arguments:
--object-oriented same
[--num-targets]
[--sender-images-per-iter]

Object-oriented Game Setup (different)

Sender sees one image; Reciever sees many, each of different classes; one of the images is the same class as the sender, but is a completely different image).

arguments:
--object-oriented different 
[--num-targets]
[--sender-images-per-iter]
[--random-transform-sender]

Model setup

Sender

The "sender" consists of a backbone VGG16 CNN which translates the input image into a latent vector and a "decoder" with an MLP that projects the latent representation from the backbone to a set of drawing commands that are differentiably rendered into an image which is sent to the "reciever".

The backbone can optionally be initialised with pretrained weight and also optionally frozen (except for the final linear projection). The backbone, including linear projection can be shared between sender and reciever (default) or separate (--separate_encoders).

arguments:
[--freeze-vgg]
[--imagenet-weights --imagenet-norm] 
[--sin-weights --imagenet-norm] 
[--separate_encoders]

Receiver

The "receiver" consists of a backbone CNN which is used to convert visual inputs (both the images in the pool and the sketch) into a latent vector which is then transformed into a different latent representation by an MLP. These projected latent vectors are used for prediction and in the loss as described below.

The actual backbone CNN model architecture will be the same as the sender's. The backbone can optionally share parameters with the "sender" agent. Alternatively it can be initialised with pre-trained weights, and also optionally frozen.

arguments:
[--freeze-vgg]
[--imagenet-weights --imagenet-norm]
[--separate_encoders]

Datasets

  • MNIST
  • CIFAR-10 / CIFAR-100
  • TinyImageNet
  • CelebA (--image-size to control size; default 64px)
  • STL-10
  • Caltech101 (training data is balanced by supersampling with augmentation)

Datasets will be downloaded to the dataset root directory (default ./data) as required.

arguments: 
--dataset {CIFAR10,CelebA,MNIST,STL10,TinyImageNet,Caltech101}  
[--dataset-root]

Citation

If you find this repository useful for your research, please cite our paper using the following.

  @@inproceedings{
  mihai2021learning,
  title={Learning to Draw: Emergent Communication through Sketching},
  author={Daniela Mihai and Jonathon Hare},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021},
  url={https://openreview.net/forum?id=YIyYkoJX2eA}
  }
Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]

Rank & Sort Loss for Object Detection and Instance Segmentation The official implementation of Rank & Sort Loss. Our implementation is based on mmdete

Kemal Oksuz 229 Dec 20, 2022
Code for Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

EMS-COLS-recourse Initial Code for Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions Folder structure: data folder contains raw an

Prateek Yadav 1 Nov 25, 2022
DeiT: Data-efficient Image Transformers

DeiT: Data-efficient Image Transformers This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient

Facebook Research 3.2k Jan 06, 2023
A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

CLEVR Dataset Generation This is the code used to generate the CLEVR dataset as described in the paper: CLEVR: A Diagnostic Dataset for Compositional

Facebook Research 503 Jan 04, 2023
Dogs classification with Deep Metric Learning using some popular losses

Tsinghua Dogs classification with Deep Metric Learning 1. Introduction Tsinghua Dogs dataset Tsinghua Dogs is a fine-grained classification dataset fo

QuocThangNguyen 45 Nov 09, 2022
InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You 1.3k Jan 09, 2023
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
A Kernel fuzzer focusing on race bugs

Razzer: Finding kernel race bugs through fuzzing Environment setup $ source scripts/envsetup.sh scripts/envsetup.sh sets up necessary environment var

Systems and Software Security Lab at Seoul National University (SNU) 328 Dec 26, 2022
GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model -- based on GPT-3, called GPT-Codex -- that is fine-tuned on publicly available code from GitHub.

2.3k Jan 09, 2023
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Documentation | FAQ | Release Notes | Roadmap | MACE Model Zoo | Demo | Join Us | 中文 Mobile AI Compute Engine (or MACE for short) is a deep learning i

Xiaomi 4.7k Dec 29, 2022
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

Microsoft 409 Jan 06, 2023
Code and data for paper "Deep Photo Style Transfer"

deep-photo-styletransfer Code and data for paper "Deep Photo Style Transfer" Disclaimer This software is published for academic and non-commercial use

Fujun Luan 9.9k Dec 29, 2022
Improving the robustness and performance of biomedical NLP models through adversarial training

RobustBioNLP Improving the robustness and performance of biomedical NLP models through adversarial training In this repository you can find suppliment

Milad Moradi 3 Sep 20, 2022
This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

Kaicheng Yang 3 Jun 03, 2022
Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop

Fight Detection from Still Images in the Wild Detecting fights from still images is an important task required to limit the distribution of social med

Şeymanur Aktı 10 Nov 09, 2022
Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer This repository contains code for our paper titled "When is BERT M

Princeton Natural Language Processing 9 Dec 23, 2022
Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis

Practical Blind Denoising via Swin-Conv-UNet and Data Synthesis [Paper] [Online Demo] The following results are obtained by our SCUNet with purely syn

Kai Zhang 312 Jan 07, 2023
Pytorch code for "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks".

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

Amirsina Torfi 114 Dec 18, 2022
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

Phil Wang 178 Dec 02, 2022
Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding

Rot-Pro : Modeling Transitivity by Projection in Knowledge Graph Embedding This repository contains the source code for the Rot-Pro model, presented a

Tewi 9 Sep 28, 2022