Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Overview

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Trevor Ablett, Daniel (Yifan) Zhai, Jonathan Kelly

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’21)

Paper website: https://papers.starslab.ca/multiview-manipulation/
arXiv paper: https://arxiv.org/abs/2104.13907
DOI: https://doi.org/10.1109/IROS51168.2021.9636440


This work was motivated by a relatively simple question: will increasingly popular end-to-end visuomotor policies work on a mobile manipulator, where the angle of the base will not be repeatable from one execution of a task to another? We conducted a variety of experiments to show that, naively, policies trained on fixed-base data with imitation learning do not generalize to various poses, and also generate multiview datasets and corresponding multiview policies to remedy the problem.

This repository contains the source code for reproducing our results and plots.

Requirements

We have only tested in python 3.7. Our simulated environments use pybullet, and our training code uses TensorFlow 2.x, specifically relying on our manipulator-learning package. All requirements (for simulated environments) are automatically installed by following Setup below.

Our policies also use the groups argument in TensorFlow Conv2d, which requires a GPU.

Setup

Preliminary note on TensorFlow install

This repository uses TensorFlow with GPU support, which can of course can be a bit of a pain to install. If you already have it installed, ignore this message. Otherwise, we have found the following procedure to work:

  1. Install conda.
  2. Create a new conda env to use for this work and activate it.
  3. Run the following to install a version of TensorFlow that may work with Conda
conda install cudatoolkit cudnn
pip install tensorflow==2.6.* tensorflow-probability==0.14

Now you can continue with the regular installation.

Regular Installation

Clone this repository and install in your python environment with pip.

git clone [email protected]:utiasSTARS/multiview-manipulation.git && cd multiview-manipulation
pip install -e .

A Note on Environment Names

The simulated environments that we use are all available in our manipulator-learning package and are called:

  • ThingLiftXYZImage
  • ThingLiftXYZMultiview
  • ThingStackSameImageV2
  • ThingStackSameMultiviewV2
  • ThingPickAndInsertSucDoneImage
  • ThingPickAndInsertSucDoneMultiview
  • ThingDoorImage
  • ThingDoorMultiview

The real environments we use with our mobile manipulator will, of course, be harder to reproduce, but were generated using our thing-gym-ros repository and are called:

  • ThingRosPickAndInsertCloser6DOFImageMB
  • ThingRosDrawerRanGrip6DOFImageMB
  • ThingRosDoorRanGrip6DOFImage
  • ThingRosDoorRanGrip6DOFImageMB

Running and Training Behavioural Cloning (BC) policies

The script in this repository can actually train and test (multiple)policies all in one shot.

  1. Choose one of:

    1. Train and test policies all at once. Download and uncompress any of the simulated expert data (generated using an HTC Vive hand tracker) from this Google Drive Folder.
    2. Generate policies using the procedure outlined in the following section.
    3. Download policies from this Google Drive Folder. We'll assume that you downloaded ThingDoorMultiview_bc_models.zip.

    If you choose i., your folder structure should be:

     .
     └── multiview-manipulation/
         ├── multiview_manipulation/
         └── data/
             ├── bc_models/
             └── demonstrations/
                 ├── ThingDoorMultiview/
                     ├── depth/
                     ├── img/
                     ├── data.npz
                     └── data_swp.npz
    

    If you choose ii. or iii., your folder structure should be:

    .
    └── multiview-manipulation/
        ├── multiview_manipulation/
        └── data/
            └── bc_models/
                ├── ThingDoorMultiview_25_trajs_1/
                ├── ThingDoorMultiview_25_trajs_2/
                ├── ThingDoorMultiview_25_trajs_3/
                ├── ThingDoorMultiview_25_trajs_4/
                ├── ThingDoorMultiview_25_trajs_5/   
                ├── ThingDoorMultiview_50_trajs_1/   
                └── ...   
    
  2. Modify the following options in multiview_manipulation/policies/test_policies.py to match your system and selected data:

    • main_data_dir: top level data directory (default: data)
    • bc_models_dir: top level trained BC models directory (default: bc_models)
    • expert_data_dir: top level expert data directory (default: demonstrations, only required if option i. above was selected).
  3. Change the following options to choose whether you want to test policies in a different environment from which they were trained in (e.g., as stated in the paper, you can test a ThingDoorMultiview policy in both ThingDoorMultiview and ThingDoorImage):

    • env_name: environment to test policy in
    • policy_env_name: name of environment that data for policy was generated from.
  4. Modify the options for choosing which policies to train/test:

    • bc_ckpts_num_traj: The different number of trajectories to use for training/trained policies (default: range(200, 24, -25))
    • seeds: Which seeds to use (default: [1, 2, 3, 4, 5])
  5. Run the script:

python multiview_manipulation/policies/test_policies.py
  1. Your results will show up in data/bc_results/{env_name}_{env_seed}_{experiment_name}.

Training policies with Behavioural Cloning (BC) only

  1. Download and uncompress any of simulated expert data from this Google Drive Folder. We'll assume that you downloaded ThingDoorMultiview.tar.gz and uncompressed it as ThingDoorMultiview.

  2. Modify the following options in multiview_manipulation/policies/gen_policies.py to match your system and selected data:

    • bc_models_dir: top level directory for trained BC models (default: data/bc_models)
    • expert_data_dir: top level directory for expert data (default: data/demonstrations)
    • dataset_dir: the name of the directory containing depth/, img/, data.npz and data_swp.npz.
    • env_str: The string corresponding to the name of the environment (only used for the saved BC policy name)

    For example, if you're using the default folder structure, your setup should look like this:

    .
    └── multiview-manipulation/
        ├── multiview_manipulation/
        └── data/
            ├── bc_models/
            └── demonstrations/
                ├── ThingDoorMultiview/
                    ├── depth/
                    ├── img/
                    ├── data.npz
                    └── data_swp.npz
    
  3. Modify the options for choosing which policies to train:

    • bc_ckpts_num_traj: The different number of trajectories to use for training policies (default: range(25, 201, 25))
    • seeds: Which seeds to train for (default: [1, 2, 3, 4, 5])
  4. Run the file:

python multiview_manipulation/policies/gen_policies.py
  1. Your trained policies will show up in individual folders under the bc_models folder as {env_str}_{num_trajs}_trajs_{seed}/.

Collecting Demonstrations

All of our demonstrations were collected using the collect_demos.py file from the manipulator-learning package and an HTC Vive Hand Tracker. To collect demonstrations, you would use, for example:

git clone [email protected]:utiasSTARS/manipulator-learning.git && cd manipulator-learning
pip install -e .
pip install -r device_requirements.txt
python manipulator_learning/learning/imitation/collect_demos.py --device vr --directory demonstrations --demo_name ThingDoorMultiview01 --environment ThingDoorMultiview

You can also try using the keyboard with:

python manipulator_learning/learning/imitation/collect_demos.py --device keyboard --directory demonstrations --demo_name ThingDoorMultiview01 --environment ThingDoorMultiview

More instructions can be found in the manipulator-learning README.

Real Environments

Although it would be nearly impossible to exactly reproduce our results with our real environments, the code we used for generating our real environments can be found in our thing-gym-ros repository.

Citation

If you use this in your work, please cite:

@inproceedings{2021_Ablett_Seeing,
    address = {Prague, Czech Republic},
    author = {Trevor Ablett and Yifan Zhai and Jonathan Kelly},
    booktitle = {Proceedings of the {IEEE/RSJ} International Conference on Intelligent Robots and Systems {(IROS'21)}},
    date = {2021-09-27/2021-10-01},
    month = {Sep. 27--Oct. 1},
    site = {https://papers.starslab.ca/multiview-manipulation/},
    title = {Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations},
    url = {http://arxiv.org/abs/2104.13907},
    video1 = {https://youtu.be/oh0JMeyoswg},
    year = {2021}
}
Owner
STARS Laboratory
We are the Space and Terrestrial Autonomous Robotic Systems Laboratory at the University of Toronto
STARS Laboratory
A Python module for parallel optimization of expensive black-box functions

blackbox: A Python module for parallel optimization of expensive black-box functions What is this? A minimalistic and easy-to-use Python module that e

Paul Knysh 426 Dec 08, 2022
Py-faster-rcnn - Faster R-CNN (Python implementation)

py-faster-rcnn has been deprecated. Please see Detectron, which includes an implementation of Mask R-CNN. Disclaimer The official Faster R-CNN code (w

Ross Girshick 7.8k Jan 03, 2023
Official code repository for the EMNLP 2021 paper

Integrating Visuospatial, Linguistic and Commonsense Structure into Story Visualization PyTorch code for the EMNLP 2021 paper "Integrating Visuospatia

Adyasha Maharana 23 Dec 19, 2022
Convex optimization for fun and profit.

CFMM Optimal Routing This repository contains the code needed to generate the figures used in the paper Optimal Routing for Constant Function Market M

Guillermo Angeris 183 Dec 29, 2022
Neural Scene Flow Prior (NeurIPS 2021 spotlight)

Neural Scene Flow Prior Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey Will appear on Thirty-fifth Conference on Neural Information Processing Syste

Lilac Lee 85 Jan 03, 2023
Data reduction pipeline for KOALA on the AAT.

KOALA KOALA, the Kilofibre Optical AAT Lenslet Array, is a wide-field, high efficiency, integral field unit used by the AAOmega spectrograph on the 3.

4 Sep 26, 2022
Adaptive FNO transformer - official Pytorch implementation

Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers This repository contains PyTorch implementation of the Adaptive Fourier Neu

NVIDIA Research Projects 77 Dec 29, 2022
A Python parser that takes the content of a text file and then reads it into variables.

Text-File-Parser A Python parser that takes the content of a text file and then reads into variables. Input.text File 1. What is your ***? 1. 18 -

Kelvin 0 Jul 26, 2021
A Strong Baseline for Image Semantic Segmentation

A Strong Baseline for Image Semantic Segmentation Introduction This project is an open source semantic segmentation toolbox based on PyTorch. It is ba

Clark He 49 Sep 20, 2022
Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"

Time-Sensitive-QA The repo contains the dataset and code for NeurIPS2021 (dataset track) paper Time-Sensitive Question Answering dataset. The dataset

wenhu chen 35 Nov 14, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

49 Nov 23, 2022
Recreate CenternetV2 based on MMDET.

Introduction This project is trying to Recreate CenternetV2 based on MMDET, which is proposed in paper Probabilistic two-stage detection. This project

25 Dec 09, 2022
An efficient 3D semantic segmentation framework for Urban-scale point clouds like SensatUrban, Campus3D, etc.

An efficient 3D semantic segmentation framework for Urban-scale point clouds like SensatUrban, Campus3D, etc.

Zou 33 Jan 03, 2023
NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

NFT-Price-Prediction-CNN - Using visual feature extraction, prices of NFTs are predicted via CNN (Alexnet and Resnet) architectures.

5 Nov 03, 2022
An original implementation of "MetaICL Learning to Learn In Context" by Sewon Min, Mike Lewis, Luke Zettlemoyer and Hannaneh Hajishirzi

MetaICL: Learning to Learn In Context This includes an original implementation of "MetaICL: Learning to Learn In Context" by Sewon Min, Mike Lewis, Lu

Meta Research 141 Jan 07, 2023
For the paper entitled ''A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining''

Summary This is the source code for the paper "A Case Study and Qualitative Analysis of Simple Cross-Lingual Opinion Mining", which was accepted as fu

1 Nov 10, 2021
ONNX-PackNet-SfM: Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Python scripts for performing monocular depth estimation using the PackNet-SfM model in ONNX

Ibai Gorordo 14 Dec 09, 2022
Self-training with Weak Supervision (NAACL 2021)

This repo holds the code for our weak supervision framework, ASTRA, described in our NAACL 2021 paper: "Self-Training with Weak Supervision"

Microsoft 148 Nov 20, 2022
Model Zoo for MindSpore

Welcome to the Model Zoo for MindSpore In order to facilitate developers to enjoy the benefits of MindSpore framework, we will continue to add typical

MindSpore 226 Jan 07, 2023
Relaxed-machines - explorations in neuro-symbolic differentiable interpreters

Relaxed Machines Explorations in neuro-symbolic differentiable interpreters. Baby steps: inc_stop Libraries JAX Haiku Optax Resources Chapter 3 (∂4: A

Nada Amin 6 Feb 02, 2022