RGB-stacking 🛑 🟩 🔷 for robotic manipulation

Overview

RGB-stacking 🛑 🟩 🔷 for robotic manipulation

BLOG | PAPER | VIDEO

Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes,
Alex X. Lee*, Coline Devin*, Yuxiang Zhou*, Thomas Lampe*, Konstantinos Bousmalis*, Jost Tobias Springenberg*, Arunkumar Byravan, Abbas Abdolmaleki, Nimrod Gileadi, David Khosid, Claudio Fantacci, Jose Enrique Chen, Akhil Raju, Rae Jeong, Michael Neunert, Antoine Laurens, Stefano Saliceti, Federico Casarini, Martin Riedmiller, Raia Hadsell, Francesco Nori.
In Conference on Robot Learning (CoRL), 2021.

The RGB environment

This repository contains an implementation of the simulation environment described in the paper "Beyond Pick-and-Place: Tackling robotic stacking of diverse shapes". Note that this is a re-implementation of the environment (to remove dependencies on internal libraries). As a result, not all the features described in the paper are available at this point. Noticeably, domain randomization is not included in this release. We also aim to provide reference performance metrics of trained policies on this environment in the near future.

In this environment, the agent controls a robot arm with a parallel gripper above a basket, which contains three objects — one red, one green, and one blue, hence the name RGB. The agent's task is to stack the red object on top of the blue object, within 20 seconds, while the green object serves as an obstacle and distraction. The agent controls the robot using a 4D Cartesian controller. The controlled DOFs are x, y, z and rotation around the z axis. The simulation is a MuJoCo environment built using the Modular Manipulation (MoMa) framework.

Corresponding method

The RGB-stacking paper "Beyond Pick-and-Place: Tackling robotic stacking of diverse shapes" also contains a description and thorough evaluation of our initial solution to both the 'Skill Mastery' (training on the 5 designated test triplets and evaluating on them) and the 'Skill Generalization' (training on triplets of training objects and evaluating on the 5 test triplets). Our approach was to first train a state-based policy in simulation via a standard RL algorithm (we used MPO) followed by interactive distillation of the state-based policy into a vision-based policy (using a domain randomized version of the environment) that we then deployed to the robot via zero-shot sim-to-real transfer. We finally improved the policy further via offline RL based on data collected from the sim-to-real policy (we used CRR). For details on our method and the results please consult the paper.

Installing and visualizing the environment

Please ensure that you have a working MuJoCo200 installation and a valid MuJoCo licence.

  1. Clone this repository:

    git clone https://github.com/deepmind/rgb_stacking.git
    cd rgb_stacking
  2. Prepare a Python 3 environment - venv is recommended.

    python3 -m venv rgb_stacking_venv
    source rgb_stacking_venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Run the environment viewer:

    python -m rgb_stacking.main

Step 2-4 can also be done by running the run.sh script:

./run.sh

Specifying the object triplet

The default environment will load with Triplet 4 (see Sect. 3.2.1 in the paper). If you wish to use a different triplet you can use the following commands:

from rgb_stacking import environment

env = environment.rgb_stacking(object_triplet=NAME_OF_SET)

The possible NAME_OF_SET are:

  • rgb_test_triplet{i} where i is one of 1, 2, 3, 4, 5: Loads test triplet i.
  • rgb_test_random: Randomly loads one of the 5 test triplets.
  • rgb_train_random: Triplet comprised of blocks from the training set.
  • rgb_heldout_random: Triplet comprised of blocks from the held-out set.

For more information on the blocks and the possible options, please refer to the rgb_objects repository.

Specifying the observation space

By default, the observations exposed by the environment are only the ones we used for training our state-based agents. To use another set of observations please use the following code snippet:

from rgb_stacking import environment

env = environment.rgb_stacking(
    observations=environment.ObservationSet.CHOSEN_SET)

The possible CHOSEN_SET are:

  • STATE_ONLY: Only the state observations, used for training expert policies from state in simulation (stage 1).
  • VISION_ONLY: Only image observations.
  • ALL: All observations.
  • INTERACTIVE_IMITATION_LEARNING: Pair of image observations and a subset of proprioception observations, used for interactive imitation learning (stage 2).
  • OFFLINE_POLICY_IMPROVEMENT: Pair of image observations and a subset of proprioception observations, used for the one-step offline policy improvement (stage 3).

Real RGB-Stacking Environment: CAD models and assembly instructions

The CAD model of the setup is available in onshape.

We also provide the following documents for the assembly of the real cell:

  • Assembly instructions for the basket.
  • Assembly instructions for the robot.
  • Assembly instructions for the cell.
  • The bill of materials of all the necessary parts.
  • A diagram with the wiring of cell.

The RGB-objects themselves can be 3D-printed using the STLs available in the rgb_objects repository.

Citing

If you use rgb_stacking in your work, please cite the accompanying paper:

@inproceedings{lee2021rgbstacking,
    title={Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes},
    author={Alex X. Lee and
            Coline Devin and
            Yuxiang Zhou and
            Thomas Lampe and
            Konstantinos Bousmalis and
            Jost Tobias Springenberg and
            Arunkumar Byravan and
            Abbas Abdolmaleki and
            Nimrod Gileadi and
            David Khosid and
            Claudio Fantacci and
            Jose Enrique Chen and
            Akhil Raju and
            Rae Jeong and
            Michael Neunert and
            Antoine Laurens and
            Stefano Saliceti and
            Federico Casarini and
            Martin Riedmiller and
            Raia Hadsell and
            Francesco Nori},
    booktitle={Conference on Robot Learning (CoRL)},
    year={2021},
    url={https://openreview.net/forum?id=U0Q8CrtBJxJ}
}
Owner
DeepMind
DeepMind
Codes for CVPR2021 paper "PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization"

PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization (CVPR 2021) This is the official implementation of PW

Intelligent Robotics and Machine Vision Lab 42 Dec 18, 2022
[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

TransMaS This repository is the official pytorch implementation of the following paper: NIPS2021 Mixed Supervised Object Detection by TransferringMask

BCMI 49 Jul 27, 2022
PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

1-bit Wide ResNet PyTorch implementation of training 1-bit Wide ResNets from this paper: Training wide residual networks for deployment using a single

Sergey Zagoruyko 122 Dec 07, 2022
Vision-Language Pre-training for Image Captioning and Question Answering

VLP This repo hosts the source code for our AAAI2020 work Vision-Language Pre-training (VLP). We have released the pre-trained model on Conceptual Cap

Luowei Zhou 373 Jan 03, 2023
1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Lihe Yang 209 Jan 01, 2023
Open-Set Recognition: A Good Closed-Set Classifier is All You Need

Open-Set Recognition: A Good Closed-Set Classifier is All You Need Code for our paper: "Open-Set Recognition: A Good Closed-Set Classifier is All You

194 Jan 03, 2023
Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

Virginia Tech Vision and Learning Lab 27 Nov 22, 2022
TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

Aritra Roy Gosthipaty 23 Dec 24, 2022
Agent-based model simulator for air quality and pandemic risk assessment in architectural spaces

Agent-based model simulation for air quality and pandemic risk assessment in architectural spaces. User Guide archABM is a fast and open source agent-

Vicomtech 10 Dec 05, 2022
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

This repository contains the SystemVerilog RTL, C++, HLS (Intel FPGA OpenCL to wrap RTL code) and Python needed to reproduce the numerical results in

Facebook Research 373 Dec 31, 2022
The official re-implementation of the Neurips 2021 paper, "Targeted Neural Dynamical Modeling".

Targeted Neural Dynamical Modeling Note: This is a re-implementation (in Tensorflow2) of the original TNDM model. We do not plan to further update the

6 Oct 05, 2022
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

RNN-for-Joint-NLU Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Kim SungDong 194 Dec 28, 2022
Code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

Biomedical Entity Linking This repo provides the code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Res

Tuan Manh Lai 24 Oct 24, 2022
For holding anime-related object classification and detection models

Animesion An end-to-end framework for anime-related object classification, detection, segmentation, and other models. Update: 01/22/2020. Due to time-

Edwin Arkel Rios 72 Nov 30, 2022
Preprossing-loan-data-with-NumPy - In this project, I have cleaned and pre-processed the loan data that belongs to an affiliate bank based in the United States.

Preprossing-loan-data-with-NumPy In this project, I have cleaned and pre-processed the loan data that belongs to an affiliate bank based in the United

Dhawal Chitnavis 2 Jan 03, 2022
Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

Manli 3 Oct 15, 2022
Detecting drunk people through thermal images using Deep Learning (CNN)

Drunk Detection CNN Detecting drunk people through thermal images using Deep Learning (CNN) Dataset We used thermal images provided by Electronics Lab

Giacomo Ferretti 3 Oct 27, 2022
A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions

A distributed, plug-n-play algorithm for multi-robot applications with a priori non-computable objective functions Kapoutsis, A.C., Chatzichristofis,

Athanasios Ch. Kapoutsis 5 Oct 15, 2022
Learning Versatile Neural Architectures by Propagating Network Codes

Learning Versatile Neural Architectures by Propagating Network Codes Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang,

Mingyu Ding 36 Dec 06, 2022
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022