Deep Multimodal Neural Architecture Search

Last update: Dec 21, 2022

Related tags

Overview

MMNas: Deep Multimodal Neural Architecture Search

This repository corresponds to the PyTorch implementation of the MMnas for visual question answering (VQA), visual grounding (VGD), and image-text matching (ITM) tasks.

Prerequisites

Software and Hardware Requirements

You may need a machine with at least 4 GPU (>= 8GB), 50GB memory for VQA and VGD and 150GB for ITM and 50GB free disk space. We strongly recommend to use a SSD drive to guarantee high-speed I/O.

You should first install some necessary packages.

Install Python >= 3.6
Install Cuda >= 9.0 and cuDNN
Install PyTorch >= 0.4.1 with CUDA (Pytorch 1.x is also supported).

Install SpaCy and initialize the GloVe as follows:

$ pip install -r requirements.txt
$ wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
$ pip install en_vectors_web_lg-2.1.0.tar.gz

Dataset Preparations

Please follow the instructions in dataset_setup.md to download the datasets and features.

Search

To search an optimal architecture for a specific task, run

$ python3 search_[vqa|vgd|vqa].py

At the end of each searching epoch, we will output the optimal architecture (choosing operators with largest architecture weight for every block) accroding to current architecture weights. When the optimal architecture doesn't change for several continuous epochs, you can kill the searching process manually.

Training

The following script will start training network with the optimal architecture that we've searched by MMNas:

$ python3 train_[vqa|vgd|itm].py --RUN='train' --ARCH_PATH='./arch/train_vqa.json'

To add：

--VERSION=str, e.g.--VERSION='mmnas_vqa' to assign a name for your this model.
--GPU=str, e.g.--GPU='0, 1, 2, 3' to train the model on specified GPU device.
--NW=int, e.g.--NW=8 to accelerate I/O speed.

--RESUME to start training with saved checkpoint parameters.
--ARCH_PATH can use the different searched architectures.

If you want to evaluate an architecture that you got from seaching stage, for example, it's the output architecture at the 50-th searching epoch for vqa model, you can run

$ python3 train_vqa.py --RUN='train' --ARCH_PATH='[PATH_TO_YOUR_SEARCHING_LOG]' --ARCH_EPOCH=50

Validation and Testing

Offline Evaluation

It's convenient to modify follows args: --RUN={'val', 'test'} --CKPT_PATH=[Your Model Path] to Run val or test Split.

Example:

$ python3 train_vqa.py --RUN='test' --CKPT_PATH=[Your Model Path] --ARCH_PATH=[Searched Architecture Path]

Online Evaluation (ONLY FOR VQA)

Test Result files will stored in ./logs/ckpts/result_test/result_train_[Your Version].json

You can upload the obtained result file to Eval AI to evaluate the scores on test-dev and test-std splits.

Pretrained Models

We provide the pretrained models in pretrained_models.md to reproduce the experimental results in our paper.

Citation

If this repository is helpful for your research, we'd really appreciate it if you could cite the following paper:

@article{yu2020mmnas,
  title={Deep Multimodal Neural Architecture Search},
  author={Yu, Zhou and Cui, Yuhao and Yu, Jun and Wang, Meng and Tao, Dacheng and Tian, Qi},
  journal={Proceedings of the 28th ACM International Conference on Multimedia},
  pages = {3743--3752},
  year={2020}
}

Deep Multimodal Neural Architecture Search

Related tags

Overview

MMNas: Deep Multimodal Neural Architecture Search

Prerequisites

Software and Hardware Requirements

Dataset Preparations

Search

Training

Validation and Testing

Offline Evaluation

Online Evaluation (ONLY FOR VQA)

Pretrained Models

Citation

Owner

Vision and Language Group@ MIL

Keras-tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation（Unfinished）

Python based framework for Automatic AI for Regression and Classification over numerical data.

For holding anime-related object classification and detection models

Official Pytorch implementation for Deep Contextual Video Compression, NeurIPS 2021

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

public repo for ESTER dataset and modeling (EMNLP'21)

AVD Quickstart Containerlab

YOLO-v5 기반 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adaptive Cruise Control 기능 구현

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Neural Network Libraries

ISNAS-DIP: Image Specific Neural Architecture Search for Deep Image Prior [CVPR 2022]

Sub-tomogram-Detection - Deep learning based model for Cyro ET Sub-tomogram-Detection

Automatically download the cwru data set, and then divide it into training data set and test data set

ProMP: Proximal Meta-Policy Search

DeepGNN is a framework for training machine learning models on large scale graph data.

A Free and Open Source Python Library for Multiobjective Optimization

Python 3 module to print out long strings of text with intervals of time inbetween

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Dynamic Environments with Deformable Objects (DEDO)

시각 장애인을 위한 스마트 지팡이에 활용될 딥러닝 모델 (DL Model Repo)