Spatial Action Maps for Mobile Manipulation (RSS 2020)

Overview

spatial-action-maps

Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many new improvements to the codebase, and while the focus is on multi-agent, it also supports single-agent training.


This code release accompanies the following paper:

Spatial Action Maps for Mobile Manipulation

Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Robotics: Science and Systems (RSS), 2020

Project Page | PDF | arXiv | Video

Abstract: Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work, we present "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location. Using ConvNets to infer spatial action maps from state images, action predictions are thereby spatially anchored on local visual features in the scene, enabling significantly faster learning of complex behaviors for mobile manipulation tasks with reinforcement learning. In our experiments, we task a robot with pushing objects to a goal location, and find that policies learned with spatial action maps achieve much better performance than traditional alternatives.

Installation

We recommend using a conda environment for this codebase. The following commands will set up a new conda environment with the correct requirements (tested on Ubuntu 18.04.3 LTS):

# Create and activate new conda env
conda create -y -n my-conda-env python=3.7
conda activate my-conda-env

# Install pytorch and numpy
conda install -y pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
conda install -y numpy=1.17.3

# Install pip requirements
pip install -r requirements.txt

# Install shortest path module (used in simulation environment)
cd spfa
python setup.py install

Quickstart

We provide four pretrained policies, one for each test environment. Use download-pretrained.sh to download them:

./download-pretrained.sh

You can then use enjoy.py to run a trained policy in the simulation environment.

For example, to load the pretrained policy for SmallEmpty, you can run:

python enjoy.py --config-path logs/20200125T213536-small_empty/config.yml

You can also run enjoy.py without specifying a config path, and it will find all policies in the logs directory and allow you to pick one to run:

python enjoy.py

Training in the Simulation Environment

The config/experiments directory contains template config files for all experiments in the paper. To start a training run, you can give one of the template config files to the train.py script. For example, the following will train a policy on the SmallEmpty environment:

python train.py config/experiments/base/small_empty.yml

The training script will create a log directory and checkpoint directory for the new training run inside logs/ and checkpoints/, respectively. Inside the log directory, it will also create a new config file called config.yml, which stores training run config variables and can be used to resume training or to load a trained policy for evaluation.

Simulation Environment

To explore the simulation environment using our proposed dense action space (spatial action maps), you can use the tools_click_agent.py script, which will allow you to click on the local overhead map to select actions and move around in the environment.

python tools_click_agent.py

Evaluation

Trained policies can be evaluated using the evaluate.py script, which takes in the config path for the training run. For example, to evaluate the SmallEmpty pretrained policy, you can run:

python evaluate.py --config-path logs/20200125T213536-small_empty/config.yml

This will load the trained policy from the specified training run, and run evaluation on it. The results are saved to an .npy file in the eval directory. You can then run jupyter notebook and navigate to eval_summary.ipynb to load the .npy files and generate tables and plots of the results.

Running in the Real Environment

We train policies in simulation and run them directly on the real robot by mirroring the real environment inside the simulation. To do this, we first use ArUco markers to estimate 2D poses of robots and cubes in the real environment, and then use the estimated poses to update the simulation. Note that setting up the real environment, particularly the marker pose estimation, can take a fair amount of time and effort.

Vector SDK Setup

If you previously ran pip install -r requirements.txt following the installation instructions above, the anki_vector library should already be installed. Run the following command to set up each robot you plan to use:

python -m anki_vector.configure

After the setup is complete, you can open the Vector config file located at ~/.anki_vector/sdk_config.ini to verify that all of your robots are present.

You can also run some of the official examples to verify that the setup procedure worked. For further reference, please see the Vector SDK documentation.

Connecting to the Vector

The following command will try to connect to all the robots in your Vector config file and keep them still. It will print out a message for each robot it successfully connects to, and can be used to verify that the Vector SDK can connect to all of your robots.

python vector_keep_still.py

Note: If you get the following error, you will need to make a small fix to the anki_vector library.

AttributeError: module 'anki_vector.connection' has no attribute 'CONTROL_PRIORITY_LEVEL'

Locate the anki_vector/behavior.py file inside your installed conda libraries. The full path should be in the error message. At the bottom of anki_vector/behavior.py, change connection.CONTROL_PRIORITY_LEVEL.RESERVE_CONTROL to connection.ControlPriorityLevel.RESERVE_CONTROL.


Sometimes the IP addresses of your robots will change. To update the Vector config file with the new IP addresses, you can run the following command:

python vector_run_mdns.py

The script uses mDNS to find all Vector robots on the local network, and will automatically update their IP addresses in the Vector config file. It will also print out the hostname, IP address, and MAC address of every robot found. Make sure zeroconf is installed (pip install zeroconf) or mDNS may not work well. Alternatively, you can just open the Vector config file at ~/.anki_vector/sdk_config.ini in a text editor and manually update the IP addresses.

Controlling the Vector

The vector_keyboard_controller.py script is adapted from the remote control example in the official SDK, and can be used to verify that you are able to control the robot using the Vector SDK. Use it as follows:

python vector_keyboard_controller.py --robot-index ROBOT_INDEX

The --robot-index argument specifies the robot you wish to control and refers to the index of the robot in the Vector config file (~/.anki_vector/sdk_config.ini). If no robot index is specified, the script will check all robots in the Vector config file and select the first robot it is able to connect to.

3D Printed Parts

The real environment setup contains some 3D printed parts. We used the Sindoh 3DWOX 1 3D printer to print them, but other printers should work too. We used PLA filament. All 3D model files are in the stl directory:

  • cube.stl: 3D model for the cubes (objects)
  • blade.stl: 3D model for the bulldozer blade attached to the front of the robot
  • board-corner.stl: 3D model for the board corners, which are used for pose estimation with ArUco markers

Running Trained Policies on the Real Robot

First see the aruco directory for instructions on setting up pose estimation with ArUco markers.

Once the setup is completed, make sure the pose estimation server is started before proceeding:

cd aruco
python server.py

The vector_click_agent.py script is analogous to tools_click_agent.py, and allows you to click on the local overhead map to control the real robot. The script is also useful for verifying that all components of the real environment setup are working correctly, including pose estimation and robot control. The simulation environment should mirror the real setup with millimeter-level precision. You can start it using the following command:

python vector_click_agent.py --robot-index ROBOT_INDEX

If the poses in the simulation do not look correct, you can restart the pose estimation server with the --debug flag to enable debug visualizations:

cd aruco
python server.py --debug

Once you have verified that manual control with vector_click_agent.py works, you can then run a trained policy using the vector_enjoy.py script. For example, to load the SmallEmpty pretrained policy, you can run:

python vector_enjoy.py --robot-index ROBOT_INDEX --config-path logs/20200125T213536-small_empty/config.yml

Citation

If you find this work useful for your research, please consider citing:

@inproceedings{wu2020spatial,
  title = {Spatial Action Maps for Mobile Manipulation},
  author = {Wu, Jimmy and Sun, Xingyuan and Zeng, Andy and Song, Shuran and Lee, Johnny and Rusinkiewicz, Szymon and Funkhouser, Thomas},
  booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
  year = {2020}
}
Code for Domain Adaptive Video Segmentation via Temporal Consistency Regularization in ICCV 2021

Domain Adaptive Video Segmentation via Temporal Consistency Regularization Updates 08/2021: check out our domain adaptation for sematic segmentation p

36 Dec 12, 2022
Code for "Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification", ECCV 2020 Spotlight

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification Implementation of "Learning From Multiple Experts: Se

27 Nov 05, 2022
Enabling dynamic analysis of Legacy Embedded Systems in full emulated environment

PENecro This project is based on "Enabling dynamic analysis of Legacy Embedded Systems in full emulated environment", published on hardwear.io USA 202

Ta-Lun Yen 10 May 17, 2022
Facestar dataset. High quality audio-visual recordings of human conversational speech.

Facestar Dataset Description Existing audio-visual datasets for human speech are either captured in a clean, controlled environment but contain only a

Meta Research 87 Dec 21, 2022
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 08, 2022
MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

2 Jan 29, 2022
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

Peter Hase 19 Aug 21, 2022
Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

snc4onnx Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools 1.

Katsuya Hyodo 8 Oct 13, 2022
Next-Best-View Estimation based on Deep Reinforcement Learning for Active Object Classification

next_best_view_rl Setup Clone the repository: git clone --recurse-submodules ... In 'third_party/zed-ros-wrapper': git checkout devel Install mujoco `

Christian Korbach 1 Feb 15, 2022
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR)

Ilya Kostrikov 3k Dec 31, 2022
Geometric Deep Learning Extension Library for PyTorch

Documentation | Paper | Colab Notebooks | External Resources | OGB Examples PyTorch Geometric (PyG) is a geometric deep learning extension library for

Matthias Fey 16.5k Jan 08, 2023
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Object Detection and Instance Segmentation.

Swin Transformer for Object Detection This repo contains the supported code and configuration files to reproduce object detection results of Swin Tran

Swin Transformer 1.4k Dec 30, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Tr

Sber AI 230 Dec 31, 2022
Very Deep Convolutional Networks for Large-Scale Image Recognition

pytorch-vgg Some scripts to convert the VGG-16 and VGG-19 models [1] from Caffe to PyTorch. The converted models can be used with the PyTorch model zo

Justin Johnson 217 Dec 05, 2022
Gauge equivariant mesh cnn

Geometric Mesh CNN The code in this repository is an implementation of the Gauge Equivariant Mesh CNN introduced in the paper Gauge Equivariant Mesh C

50 Dec 18, 2022
DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene.

DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene. We achieve NeRF-comparable novel-view synthesis quality with super-fast convergence.

sunset 709 Dec 31, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

364 Dec 14, 2022
Easy to use Python camera interface for NVIDIA Jetson

JetCam JetCam is an easy to use Python camera interface for NVIDIA Jetson. Works with various USB and CSI cameras using Jetson's Accelerated GStreamer

NVIDIA AI IOT 358 Jan 02, 2023
Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite.

tflite2tensorflow Generate saved_model, tfjs, tf-trt, EdgeTPU, CoreML, quantized tflite and .pb from .tflite. 1. Supported Layers No. TFLite Layer TF

Katsuya Hyodo 214 Dec 29, 2022
Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks

SSTNet Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks(ICCV2021) by Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, Kui J

83 Nov 29, 2022