Spatial Action Maps for Mobile Manipulation (RSS 2020)

Overview

spatial-action-maps

Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many new improvements to the codebase, and while the focus is on multi-agent, it also supports single-agent training.


This code release accompanies the following paper:

Spatial Action Maps for Mobile Manipulation

Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Robotics: Science and Systems (RSS), 2020

Project Page | PDF | arXiv | Video

Abstract: Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work, we present "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location. Using ConvNets to infer spatial action maps from state images, action predictions are thereby spatially anchored on local visual features in the scene, enabling significantly faster learning of complex behaviors for mobile manipulation tasks with reinforcement learning. In our experiments, we task a robot with pushing objects to a goal location, and find that policies learned with spatial action maps achieve much better performance than traditional alternatives.

Installation

We recommend using a conda environment for this codebase. The following commands will set up a new conda environment with the correct requirements (tested on Ubuntu 18.04.3 LTS):

# Create and activate new conda env
conda create -y -n my-conda-env python=3.7
conda activate my-conda-env

# Install pytorch and numpy
conda install -y pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
conda install -y numpy=1.17.3

# Install pip requirements
pip install -r requirements.txt

# Install shortest path module (used in simulation environment)
cd spfa
python setup.py install

Quickstart

We provide four pretrained policies, one for each test environment. Use download-pretrained.sh to download them:

./download-pretrained.sh

You can then use enjoy.py to run a trained policy in the simulation environment.

For example, to load the pretrained policy for SmallEmpty, you can run:

python enjoy.py --config-path logs/20200125T213536-small_empty/config.yml

You can also run enjoy.py without specifying a config path, and it will find all policies in the logs directory and allow you to pick one to run:

python enjoy.py

Training in the Simulation Environment

The config/experiments directory contains template config files for all experiments in the paper. To start a training run, you can give one of the template config files to the train.py script. For example, the following will train a policy on the SmallEmpty environment:

python train.py config/experiments/base/small_empty.yml

The training script will create a log directory and checkpoint directory for the new training run inside logs/ and checkpoints/, respectively. Inside the log directory, it will also create a new config file called config.yml, which stores training run config variables and can be used to resume training or to load a trained policy for evaluation.

Simulation Environment

To explore the simulation environment using our proposed dense action space (spatial action maps), you can use the tools_click_agent.py script, which will allow you to click on the local overhead map to select actions and move around in the environment.

python tools_click_agent.py

Evaluation

Trained policies can be evaluated using the evaluate.py script, which takes in the config path for the training run. For example, to evaluate the SmallEmpty pretrained policy, you can run:

python evaluate.py --config-path logs/20200125T213536-small_empty/config.yml

This will load the trained policy from the specified training run, and run evaluation on it. The results are saved to an .npy file in the eval directory. You can then run jupyter notebook and navigate to eval_summary.ipynb to load the .npy files and generate tables and plots of the results.

Running in the Real Environment

We train policies in simulation and run them directly on the real robot by mirroring the real environment inside the simulation. To do this, we first use ArUco markers to estimate 2D poses of robots and cubes in the real environment, and then use the estimated poses to update the simulation. Note that setting up the real environment, particularly the marker pose estimation, can take a fair amount of time and effort.

Vector SDK Setup

If you previously ran pip install -r requirements.txt following the installation instructions above, the anki_vector library should already be installed. Run the following command to set up each robot you plan to use:

python -m anki_vector.configure

After the setup is complete, you can open the Vector config file located at ~/.anki_vector/sdk_config.ini to verify that all of your robots are present.

You can also run some of the official examples to verify that the setup procedure worked. For further reference, please see the Vector SDK documentation.

Connecting to the Vector

The following command will try to connect to all the robots in your Vector config file and keep them still. It will print out a message for each robot it successfully connects to, and can be used to verify that the Vector SDK can connect to all of your robots.

python vector_keep_still.py

Note: If you get the following error, you will need to make a small fix to the anki_vector library.

AttributeError: module 'anki_vector.connection' has no attribute 'CONTROL_PRIORITY_LEVEL'

Locate the anki_vector/behavior.py file inside your installed conda libraries. The full path should be in the error message. At the bottom of anki_vector/behavior.py, change connection.CONTROL_PRIORITY_LEVEL.RESERVE_CONTROL to connection.ControlPriorityLevel.RESERVE_CONTROL.


Sometimes the IP addresses of your robots will change. To update the Vector config file with the new IP addresses, you can run the following command:

python vector_run_mdns.py

The script uses mDNS to find all Vector robots on the local network, and will automatically update their IP addresses in the Vector config file. It will also print out the hostname, IP address, and MAC address of every robot found. Make sure zeroconf is installed (pip install zeroconf) or mDNS may not work well. Alternatively, you can just open the Vector config file at ~/.anki_vector/sdk_config.ini in a text editor and manually update the IP addresses.

Controlling the Vector

The vector_keyboard_controller.py script is adapted from the remote control example in the official SDK, and can be used to verify that you are able to control the robot using the Vector SDK. Use it as follows:

python vector_keyboard_controller.py --robot-index ROBOT_INDEX

The --robot-index argument specifies the robot you wish to control and refers to the index of the robot in the Vector config file (~/.anki_vector/sdk_config.ini). If no robot index is specified, the script will check all robots in the Vector config file and select the first robot it is able to connect to.

3D Printed Parts

The real environment setup contains some 3D printed parts. We used the Sindoh 3DWOX 1 3D printer to print them, but other printers should work too. We used PLA filament. All 3D model files are in the stl directory:

  • cube.stl: 3D model for the cubes (objects)
  • blade.stl: 3D model for the bulldozer blade attached to the front of the robot
  • board-corner.stl: 3D model for the board corners, which are used for pose estimation with ArUco markers

Running Trained Policies on the Real Robot

First see the aruco directory for instructions on setting up pose estimation with ArUco markers.

Once the setup is completed, make sure the pose estimation server is started before proceeding:

cd aruco
python server.py

The vector_click_agent.py script is analogous to tools_click_agent.py, and allows you to click on the local overhead map to control the real robot. The script is also useful for verifying that all components of the real environment setup are working correctly, including pose estimation and robot control. The simulation environment should mirror the real setup with millimeter-level precision. You can start it using the following command:

python vector_click_agent.py --robot-index ROBOT_INDEX

If the poses in the simulation do not look correct, you can restart the pose estimation server with the --debug flag to enable debug visualizations:

cd aruco
python server.py --debug

Once you have verified that manual control with vector_click_agent.py works, you can then run a trained policy using the vector_enjoy.py script. For example, to load the SmallEmpty pretrained policy, you can run:

python vector_enjoy.py --robot-index ROBOT_INDEX --config-path logs/20200125T213536-small_empty/config.yml

Citation

If you find this work useful for your research, please consider citing:

@inproceedings{wu2020spatial,
  title = {Spatial Action Maps for Mobile Manipulation},
  author = {Wu, Jimmy and Sun, Xingyuan and Zeng, Andy and Song, Shuran and Lee, Johnny and Rusinkiewicz, Szymon and Funkhouser, Thomas},
  booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
  year = {2020}
}
A tiny, friendly, strong baseline code for Person-reID (based on pytorch).

Pytorch ReID Strong, Small, Friendly A tiny, friendly, strong baseline code for Person-reID (based on pytorch). Strong. It is consistent with the new

Zhedong Zheng 3.5k Jan 08, 2023
A library for optimization on Riemannian manifolds

TensorFlow RiemOpt A library for manifold-constrained optimization in TensorFlow. Installation To install the latest development version from GitHub:

Oleg Smirnov 83 Dec 27, 2022
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 90 Dec 31, 2022
StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

Yunjey Choi 5.1k Jan 04, 2023
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022
On Generating Extended Summaries of Long Documents

ExtendedSumm This repository contains the implementation details and datasets used in On Generating Extended Summaries of Long Documents paper at the

Georgetown Information Retrieval Lab 76 Sep 05, 2022
Accelerated Multi-Modal MR Imaging with Transformers

Accelerated Multi-Modal MR Imaging with Transformers Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 torch==1.7.0 runstats==1.8.0 p

54 Dec 16, 2022
3D detection and tracking viewer (visualization) for kitti & waymo dataset

3D detection and tracking viewer (visualization) for kitti & waymo dataset

222 Jan 08, 2023
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

Flood Detection Challenge This repository contains code for our submission to the ETCI 2021 Competition on Flood Detection (Winning Solution #2). Acco

Siddha Ganju 108 Dec 28, 2022
Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Regression Transformer Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression . Development se

International Business Machines 27 Jan 05, 2023
Research on Tabular Deep Learning (Python package & papers)

Research on Tabular Deep Learning For paper implementations, see the section "Papers and projects". rtdl is a PyTorch-based package providing a user-f

Yura Gorishniy 510 Dec 30, 2022
PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning PyTorch code for our ACL 2020 paper "MART: Memory-Augmented Recur

Jie Lei 雷杰 151 Jan 06, 2023
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022
Python implementation of Project Fluent

Project Fluent This is a collection of Python packages to use the Fluent localization system. python-fluent consists of these packages: fluent.syntax

Project Fluent 155 Dec 28, 2022
An Straight Dilated Network with Wavelet for image Deblurring

SDWNet: A Straight Dilated Network with Wavelet Transformation for Image Deblurring(offical) 1. Introduction This repo is not only used for our paper(

FlyEgle 41 Jan 04, 2023
Code accompanying our paper Feature Learning in Infinite-Width Neural Networks

Empirical Experiments in "Feature Learning in Infinite-width Neural Networks" This repo contains code to replicate our experiments (Word2Vec, MAML) in

Edward Hu 37 Dec 14, 2022
Source code of the paper Meta-learning with an Adaptive Task Scheduler.

ATS About Source code of the paper Meta-learning with an Adaptive Task Scheduler. If you find this repository useful in your research, please cite the

Huaxiu Yao 16 Dec 26, 2022
Used to record WKU's utility bills on a regular basis.

WKU水电费小助手 一个用于定期记录WKU水电费的脚本 Looking for English Readme? 背景 由于WKU校园内的水电账单系统时常存在扣费延迟的现象,而补扣的费用缺乏令人信服的证明。不少学生为费用摸不着头脑,但也没有申诉的依据。为了更好地掌握水电费使用情况,留下一手证据,我开源

2 Jul 21, 2022
Source code for the plant extraction workflow introduced in the paper “Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision”

Plant extraction workflow Source code for the plant extraction workflow introduced in the paper "Agricultural Plant Cataloging and Establishment of a

Maurice Günder 0 Apr 22, 2022
unet for image segmentation

Implementation of deep learning framework -- Unet, using Keras The architecture was inspired by U-Net: Convolutional Networks for Biomedical Image Seg

zhixuhao 4.1k Dec 31, 2022