Learning to Draw: Emergent Communication through Sketching

Overview

Learning to Draw: Emergent Communication through Sketching

This is the official code for the paper "Learning to Draw: Emergent Communication through Sketching".

ArXivPapers With CodeGetting StartedGame setupsModel setupDatasets

About

We demonstrate that it is possible for a communication channel based on line drawing to emerge between agents playing a visual referential communication game. Furthermore we show that with a simple additional self-supervised loss that the drawings the agent produces are interpretable by humans.

Getting started

You'll need to install the required dependencies listed in requirements.txt. This includes installing the differentiable rasteriser from the DifferentiableSketching repository, and the source version of https://github.com/pytorchbearer/torchbearer:

pip install git+https://github.com/jonhare/DifferentiableSketching.git
pip install git+https://github.com/pytorchbearer/torchbearer.git
pip install -r requirements.txt

Once the dependencies are installed, you can run the commgame.py script to train and test models:

python commgame.py train [args]
python commgame.py test [args]

For example, to train a pair of agents on the original game using the STL10 dataset (which will be downloaded if required), you would run:

python commgame.py train --dataset STL10 --output stl10-original-model --sigma2 5e-4 --nlines 20 --learning-rate 0.0001 --imagenet-weights --freeze-vgg --imagenet-norm --epochs 250 --invert --batch-size 100

The options --sigma2 and --nlines control the thickness and number of lines respectively. --imagenet-weights uses the standard pretrained imagenet vgg16 weights (use --sin-weights for stylized imagenet weights). Finally, --freeze-vgg freezes the backbone CNN, --imagenet-norm specifies to apply the imagenet normalisation to images (this should be used when using either imagenet or stylized imagenet weights), and --invert draws black strokes on a white canvas.

The training scripts compute a running communication rate in addition to loss and this is displayed as training progresses. After each epoch a validation pass is performed and images of the sketches and sender inputs and receiver targets are saved to the output directory along with a model snapshot. The output directory also contains a log file with the training and validation statistics per epoch.

Example commands to run the experiments in the paper are given in commands.md

Further details on commandline arguments are given below.

Game setups

All the setups involve a referential game where the reciever tries to select the "correct" image from a pool on the basis of a "sketch" provided by the sender. The primary measure of success is the communication rate. The different command line arguments to control the different game variants are listed in the following subsections:

Havrylov and Titov's Original Game Setup

Sender sees one image; Reciever sees many, where one is exactly the same as sender.

Number of reciever images (target + distractors) is controlled by the batch-size. Number of sender images per iteration can also be controlled for completeness, but defaults to the same as batch size (e.g. each forward pass with a batch plays all possible game combinations using each of the images as a target).

arguments:
--batch-size
[--sender-images-per-iter]

Object-oriented Game Setup (same)

Sender sees one image; Reciever sees many, where one is exactly the same as sender and the others are all of different classes.

arguments:
--object-oriented same
[--num-targets]
[--sender-images-per-iter]

Object-oriented Game Setup (different)

Sender sees one image; Reciever sees many, each of different classes; one of the images is the same class as the sender, but is a completely different image).

arguments:
--object-oriented different 
[--num-targets]
[--sender-images-per-iter]
[--random-transform-sender]

Model setup

Sender

The "sender" consists of a backbone VGG16 CNN which translates the input image into a latent vector and a "decoder" with an MLP that projects the latent representation from the backbone to a set of drawing commands that are differentiably rendered into an image which is sent to the "reciever".

The backbone can optionally be initialised with pretrained weight and also optionally frozen (except for the final linear projection). The backbone, including linear projection can be shared between sender and reciever (default) or separate (--separate_encoders).

arguments:
[--freeze-vgg]
[--imagenet-weights --imagenet-norm] 
[--sin-weights --imagenet-norm] 
[--separate_encoders]

Receiver

The "receiver" consists of a backbone CNN which is used to convert visual inputs (both the images in the pool and the sketch) into a latent vector which is then transformed into a different latent representation by an MLP. These projected latent vectors are used for prediction and in the loss as described below.

The actual backbone CNN model architecture will be the same as the sender's. The backbone can optionally share parameters with the "sender" agent. Alternatively it can be initialised with pre-trained weights, and also optionally frozen.

arguments:
[--freeze-vgg]
[--imagenet-weights --imagenet-norm]
[--separate_encoders]

Datasets

  • MNIST
  • CIFAR-10 / CIFAR-100
  • TinyImageNet
  • CelebA (--image-size to control size; default 64px)
  • STL-10
  • Caltech101 (training data is balanced by supersampling with augmentation)

Datasets will be downloaded to the dataset root directory (default ./data) as required.

arguments: 
--dataset {CIFAR10,CelebA,MNIST,STL10,TinyImageNet,Caltech101}  
[--dataset-root]

Citation

If you find this repository useful for your research, please cite our paper using the following.

  @@inproceedings{
  mihai2021learning,
  title={Learning to Draw: Emergent Communication through Sketching},
  author={Daniela Mihai and Jonathon Hare},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021},
  url={https://openreview.net/forum?id=YIyYkoJX2eA}
  }
An Implementation of SiameseRPN with Feature Pyramid Networks

SiameseRPN with FPN This project is mainly based on HelloRicky123/Siamese-RPN. What I've done is just add a Feature Pyramid Network method to the orig

3 Apr 16, 2022
Image to Image translation, image generataton, few shot learning

Semi-supervised Learning for Few-shot Image-to-Image Translation [paper] Abstract: In the last few years, unpaired image-to-image translation has witn

yaxingwang 49 Nov 18, 2022
The end-to-end platform for building voice products at scale

Picovoice Made in Vancouver, Canada by Picovoice Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Goog

Picovoice 318 Jan 07, 2023
[ICCV 2021] Group-aware Contrastive Regression for Action Quality Assessment

CoRe Created by Xumin Yu*, Yongming Rao*, Wenliang Zhao, Jiwen Lu, Jie Zhou This is the PyTorch implementation for ICCV paper Group-aware Contrastive

Xumin Yu 31 Dec 24, 2022
SOTR: Segmenting Objects with Transformers [ICCV 2021]

SOTR: Segmenting Objects with Transformers [ICCV 2021] By Ruohao Guo, Dantong Niu, Liao Qu, Zhenbo Li Introduction This is the official implementation

186 Dec 20, 2022
🎁 3,000,000+ Unsplash images made available for research and machine learning

The Unsplash Dataset The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of

Unsplash 2k Jan 03, 2023
A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

張致強 14 Dec 02, 2022
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022
NaturalProofs: Mathematical Theorem Proving in Natural Language

NaturalProofs: Mathematical Theorem Proving in Natural Language NaturalProofs: Mathematical Theorem Proving in Natural Language Sean Welleck, Jiacheng

Sean Welleck 83 Jan 05, 2023
YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset, and represents Ultralytics open-source research int

阿才 73 Dec 16, 2022
A simple and extensible library to create Bayesian Neural Network layers on PyTorch.

Blitz - Bayesian Layers in Torch Zoo BLiTZ is a simple and extensible library to create Bayesian Neural Network Layers (based on whats proposed in Wei

Pi Esposito 722 Jan 08, 2023
Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

5 Steps to Speed Up Your Data-Analysis on a Single Core Material for my talk at the PyConDE & PyData Berlin 2022 Description Your data analysis pipeli

Jonathan Striebel 9 Dec 12, 2022
Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

MI-AOD Language: 简体中文 | English Introduction This is the code for Multiple Instance Active Learning for Object Detection (The PDF is not available tem

Tianning Yuan 269 Dec 21, 2022
Semantic Segmentation Architectures Implemented in PyTorch

pytorch-semseg Semantic Segmentation Algorithms Implemented in PyTorch This repository aims at mirroring popular semantic segmentation architectures i

Meet Shah 3.3k Dec 29, 2022
Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have underg

Nafis Ahmed 1 Dec 28, 2021
Residual Dense Net De-Interlace Filter (RDNDIF)

Residual Dense Net De-Interlace Filter (RDNDIF) Work in progress deep de-interlacer filter. It is based on the architecture proposed by Bernasconi et

Louis 7 Feb 15, 2022
A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection

Confluence: A Robust Non-IoU Alternative to Non-Maxima Suppression in Object Detection 1. 介绍 用以替代 NMS,在所有 bbox 中挑选出最优的集合。 NMS 仅考虑了 bbox 的得分,然后根据 IOU 来

44 Sep 15, 2022
Newt - a Gaussian process library in JAX.

Newt __ \/_ (' \`\ _\, \ \\/ /`\/\ \\ \ \\

AaltoML 0 Nov 02, 2021
Crosslingual Segmental Language Model

Crosslingual Segmental Language Model This repository contains the code from Multilingual unsupervised sequence segmentation transfers to extremely lo

C.M. Downey 1 Jun 13, 2022
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022