Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Last update: Dec 05, 2022

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Code for the MMCoref task of the SIMMC 2.0 dataset.
Pretrained vision-language models adapted from Transformers-VQA.
Zero-shot visual feature extraction using CLIP and BUTD.
Zero-shot non-visual prefab feature (flattened into strings) extraction using BERT and SBERT.

Dependencies

requirements.txt

Download the data and pretrained/trained model checkpoints

Data: Put the data in ./data. Unpack all image in ./data/all_images and all scene.jsons (including teststd split) in ./data/simmc2_scene_jsons_dstc10_public/public.
Pretrained models: Checkpoints in ./pretrained and ./model/Transformers-VQA-master/models/pretrained. Download links in placeholder.txt in these folders.
Trained models: Checkpints in ./trained. Download from ./trained/placeholder.txt

Preprocess

Convert json files ~~using ./scripts/converter.py~~ *Currently not working. (Someone managed to lose the latest converter.py.) Download the processed data instead.
Get BERT/SBERT embeddings of non-visual prefab features using ./scripts/{get_KB_embedding, get_KB_embedding_SBERT, get_KB_embedding_no_duplicate}.py
Get CLIP/BUTD embeddigns for images using scripts ./scripts/get-visual-features-{CLIP, RCNN}.ipynb
Or just download everything from ./processed/placeholder.txt

Train

Under ./sh/train. See the arguments for used input.

Inference and evaluate

Under ./sh/infer_eval (devtest split) and ./sh/infer_eval_dev (dev split)
Outputs at ./output (same format as the original dialogue json).
Logits at ./output/logit {dialogue_idx: {round_idx: [[logit, label], ...]}}
run ./scripts/output_filter_error.py to select and reformat error cases.

Ensemble

cd script python ensemble --method optuna

output saved to output/logit/blended_devtest.json

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Dependencies

Download the data and pretrained/trained model checkpoints

Preprocess

Train

Inference and evaluate

Ensemble

Owner

Yichen (William) Huang

YOLOv3 in PyTorch > ONNX > CoreML > TFLite

[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

Pytorch implementation of AREL

Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

🚩🚩🚩

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)

PyTorch implementation of the ExORL: Exploratory Data for Offline Reinforcement Learning

PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

Learning from graph data using Keras

MVP Benchmark for Multi-View Partial Point Cloud Completion and Registration

This is an example implementation of the paper "Cross Domain Robot Imitation with Invariant Representation".

Convert scikit-learn models to PyTorch modules

[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

Open source code for the paper of Neural Sparse Voxel Fields.

Learning with Subset Stacking

Platform-agnostic AI Framework 🔥

Segmentation Training Pipeline

Vehicle speed detection with python