Malmo Collaborative AI Challenge - Team Pig Catcher

Overview

The Malmo Collaborative AI Challenge - Team Pig Catcher

Approach

The challenge involves 2 agents who can either cooperate or defect. The optimal policy, based on stag hunt [1], depends on the policy of the other agent. Not knowing the other agent's policy, the optimal solution is then based on modelling the other agent's policy. Similarly, the challenge can be considered a sequential social dilemma [2], as goals could change over time.

By treating the other agent as part of the environment, we can use model-free RL, and simply aim to maximise the reward of our agent. As a baseline we take a DRL algorithm - ACER [3] - and train it against the evaluation agent (which randomly uses a focused or random strategy every episode).

We chose to approach this challenge using hierarchical RL. We assume there are 2 subpolicies, one for each type of partner agent. To do so, we use option heads [4], whereby the agent has shared features, but separate heads for different subpolicies. In this case, ACER with 2 subpolicies has 2 Q-value heads and 2 policy heads. To choose which subpolicy to use at any given time, the agent also has an additional classifier head that is trained (using an oracle) to distinguish which option to use. Therefore, we ask the following questions:

  • Can the agent distinguish between the two possible behaviours of the evaluation agent?
  • Does the agent learn qualitatively different subpolicies?

Unfortunately, due to technical difficulties and time restrictions, we were unable to successfully train an agent. Full results and more details can be found in our video.

Design Decisions

For our baseline, we implemented ACER [3] in PyTorch based on reference code [5, 6]. In addition, we augmented the state that the agent receives with the previous action, reward and a step counter [7]. Our challenge entry augments the agent with option heads [4], and we aim to distinguish the different policies of the evaluation agent.

We also introduce a novel contribution - a batch version of ACER - which increases stability. We sample a batch of off-policy trajectories, and then truncate them to match the smallest.

Instructions

Dependencies:

Firstly, build the Malmo Docker image. Secondly, enable running Docker as a non-root user.

Run ACER with OMP_NUM_THREADS=1 python pc_main.py. The code automatically opens up Minecraft (Docker) instances.

Discussion

Team Pig Catcher Discussion Video

References

[1] Game Theory of Mind
[2] Multi-agent Reinforcement Learning in Sequential Social Dilemmas
[3] Sample Efficient Actor-Critic with Experience Replay
[4] Classifying Options for Deep Reinforcement Learning
[5] ikostrikov/pytorch-a3c
[6] pfnet/ChainerRL
[7] Learning to Navigate in Complex Environments



This repository contains the task definition and example code for the Malmo Collaborative AI Challenge. This challenge is organized to encourage research in collaborative AI - to work towards AI agents that learn to collaborate to solve problems and achieve goals. You can find additional details, including terms and conditions, prizes and information on how to participate at the Challenge Homepage.

Join the chat at https://gitter.im/malmo-challenge/Lobby license


Notes for challenge participants: Once you and your team decide to participate in the challenge, please make sure to register your team at our Registration Page. On the registration form, you need to provide a link to the GitHub repository that will contain your solution. We recommend that you fork this repository (learn how), and provide address of the forked repo. You can then update your submission as you make progress on the challenge task. We will consider the version of the code on branch master at the time of the submission deadline as your challenge submission. Your submission needs to contain code in working order, a 1-page description of your approach, and a 1-minute video that shows off your agent. Please see the challenge terms and conditions for further details.


Jump to:

Installation

Prerequisites

Minimal installation

pip install -e git+https://github.com/Microsoft/malmo-challenge#egg=malmopy

or

git clone https://github.com/Microsoft/malmo-challenge
cd malmo-challenge
pip install -e .

Optional extensions

Some of the example code uses additional dependencies to provide 'extra' functionality. These can be installed using:

pip install -e '.[extra1, extra2]'

For example to install gym and chainer:

pip install -e '.[gym]'

Or to install all extras:

pip install -e '.[all]'

The following extras are available:

  • gym: OpenAI Gym is an interface to a wide range of reinforcement learning environments. Installing this extra enables the Atari example agents in samples/atari to train on the gym environments. Note that OpenAI gym atari environments are currently not available on Windows.
  • tensorflow: TensorFlow is a popular deep learning framework developed by Google. In our examples it enables visualizations through TensorBoard.

Getting started

Play the challenge task

The challenge task takes the form of a mini game, called Pig Chase. Learn about the game, and try playing it yourself on our Pig Chase Challenge page.

Run your first experiment

See how to run your first baseline experiment on the Pig Chase Challenge page.

Next steps

Run an experiment in Docker on Azure

Docker is a virtualization platform that makes it easy to deploy software with all its dependencies. We use docker to run experiments locally or in the cloud. Details on how to run an example experiment using docker are in the docker README.

Resources

Few-Shot Graph Learning for Molecular Property Prediction

Few-shot Graph Learning for Molecular Property Prediction Introduction This is the source code and dataset for the following paper: Few-shot Graph Lea

Zhichun Guo 94 Dec 12, 2022
Look Who’s Talking: Active Speaker Detection in the Wild

Look Who's Talking: Active Speaker Detection in the Wild Dependencies pip install -r requirements.txt In addition to the Python dependencies, ffmpeg

Clova AI Research 60 Dec 08, 2022
Social Distancing Detector

Computer vision has opened up a lot of opportunities to explore into AI domain that were earlier highly limited. Here is an application of haarcascade classifier and OpenCV to develop a social distan

Ashish Pandey 2 Jul 18, 2022
Official repository of ICCV21 paper "Viewpoint Invariant Dense Matching for Visual Geolocalization"

Viewpoint Invariant Dense Matching for Visual Geolocalization: PyTorch implementation This is the implementation of the ICCV21 paper: G Berton, C. Mas

Gabriele Berton 44 Jan 03, 2023
Fermi Problems: A New Reasoning Challenge for AI

Fermi Problems: A New Reasoning Challenge for AI Fermi Problems are questions whose answer is a number that can only be reasonably estimated as a prec

AI2 15 May 28, 2022
DETReg: Unsupervised Pretraining with Region Priors for Object Detection

DETReg: Unsupervised Pretraining with Region Priors for Object Detection Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik

Amir Bar 283 Dec 27, 2022
Hypersearch weight debugging and losses tutorial

tutorial Activate tensorboard option Running TensorBoard remotely When working on a remote server, you can use SSH tunneling to forward the port of th

1 Dec 11, 2021
Code for AutoNL on ImageNet (CVPR2020)

Neural Architecture Search for Lightweight Non-Local Networks This repository contains the code for CVPR 2020 paper Neural Architecture Search for Lig

Yingwei Li 104 Aug 31, 2022
LineBoard - Python+React+MySQL-白板即時系統改善人群行為

LineBoard-白板即時系統改善人群行為 即時顯示實驗室的使用狀況,並遠端預約排隊,以此來改善人們的工作效率 程式架構 運作流程 使用者先至該實驗室網站預約

Bo-Jyun Huang 1 Feb 22, 2022
Hydra: an Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems

Hydra: An Extensible Fuzzing Framework for Finding Semantic Bugs in File Systems Paper Finding Semantic Bugs in File Systems with an Extensible Fuzzin

gts3.org (<a href=[email protected])"> 129 Dec 15, 2022
Hyperparameters tuning and features selection are two common steps in every machine learning pipeline.

shap-hypetune A python package for simultaneous Hyperparameters Tuning and Features Selection for Gradient Boosting Models. Overview Hyperparameters t

Marco Cerliani 422 Jan 08, 2023
RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real Time Video Interpolation arXiv | YouTube | Colab | Tutorial | Demo Table of Contents Introduction Collection Usage Evaluation Training and

hzwer 3k Jan 04, 2023
SBINN: Systems-biology informed neural network

SBINN: Systems-biology informed neural network The source code for the paper M. Daneker, Z. Zhang, G. E. Karniadakis, & L. Lu. Systems biology: Identi

Lu Group 15 Nov 19, 2022
General Assembly Capstone: NBA Game Predictor

Project 6: Predicting NBA Games Problem Statement Can I predict the results of NBA games from the back-half of a season from the opening half of the s

Adam Muhammad Klesc 1 Jan 14, 2022
A hybrid SOTA solution of LiDAR panoptic segmentation with C++ implementations of point cloud clustering algorithms. ICCV21, Workshop on Traditional Computer Vision in the Age of Deep Learning

ICCVW21-TradiCV-Survey-of-LiDAR-Cluster Motivation In contrast to popular end-to-end deep learning LiDAR panoptic segmentation solutions, we propose a

YimingZhao 103 Nov 22, 2022
PSANet: Point-wise Spatial Attention Network for Scene Parsing, ECCV2018.

PSANet: Point-wise Spatial Attention Network for Scene Parsing (in construction) by Hengshuang Zhao*, Yi Zhang*, Shu Liu, Jianping Shi, Chen Change Lo

Hengshuang Zhao 217 Oct 30, 2022
Use .csv files to record, play and evaluate motion capture data.

Purpose These scripts allow you to record mocap data to, and play from .csv files. This approach facilitates parsing of body movement data in statisti

21 Dec 12, 2022
tinykernel - A minimal Python kernel so you can run Python in your Python

tinykernel - A minimal Python kernel so you can run Python in your Python

fast.ai 37 Dec 02, 2022
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection Main requirements torch = 1.0 torchvision = 0.2.0 Python 3 Environm

15 Apr 04, 2022
Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation

Multi-Target Adversarial Frameworks for Domain Adaptation in Semantic Segmentation Paper Multi-Target Adversarial Frameworks for Domain Adaptation in

Valeo.ai 20 Jun 21, 2022