Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Last update: Dec 05, 2022

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Code for the MMCoref task of the SIMMC 2.0 dataset.
Pretrained vision-language models adapted from Transformers-VQA.
Zero-shot visual feature extraction using CLIP and BUTD.
Zero-shot non-visual prefab feature (flattened into strings) extraction using BERT and SBERT.

Dependencies

requirements.txt

Download the data and pretrained/trained model checkpoints

Data: Put the data in ./data. Unpack all image in ./data/all_images and all scene.jsons (including teststd split) in ./data/simmc2_scene_jsons_dstc10_public/public.
Pretrained models: Checkpoints in ./pretrained and ./model/Transformers-VQA-master/models/pretrained. Download links in placeholder.txt in these folders.
Trained models: Checkpints in ./trained. Download from ./trained/placeholder.txt

Preprocess

Convert json files ~~using ./scripts/converter.py~~ *Currently not working. (Someone managed to lose the latest converter.py.) Download the processed data instead.
Get BERT/SBERT embeddings of non-visual prefab features using ./scripts/{get_KB_embedding, get_KB_embedding_SBERT, get_KB_embedding_no_duplicate}.py
Get CLIP/BUTD embeddigns for images using scripts ./scripts/get-visual-features-{CLIP, RCNN}.ipynb
Or just download everything from ./processed/placeholder.txt

Train

Under ./sh/train. See the arguments for used input.

Inference and evaluate

Under ./sh/infer_eval (devtest split) and ./sh/infer_eval_dev (dev split)
Outputs at ./output (same format as the original dialogue json).
Logits at ./output/logit {dialogue_idx: {round_idx: [[logit, label], ...]}}
run ./scripts/output_filter_error.py to select and reformat error cases.

Ensemble

cd script python ensemble --method optuna

output saved to output/logit/blended_devtest.json

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Dependencies

Download the data and pretrained/trained model checkpoints

Preprocess

Train

Inference and evaluate

Ensemble

Owner

Yichen (William) Huang

Python Single Object Tracking Evaluation

Implementation of the paper "Generating Symbolic Reasoning Problems with Transformer GANs"

Public Models considered for emotion estimation from EEG

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

RGB-stacking 🛑 🟩 🔷 for robotic manipulation

Implement face detection, and age and gender classification, and emotion classification.

[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

This project provides an unsupervised framework for mining and tagging quality phrases on text corpora with pretrained language models (KDD'21).

Vector.ai assignment

Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Official pytorch implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Replication package for the manuscript "Using Personality Detection Tools for Software Engineering Research: How Far Can We Go?" submitted to TOSEM

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Defending graph neural networks against adversarial attacks (NeurIPS 2020)

an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

A tensorflow/keras implementation of StyleGAN to generate images of new Pokemon.