Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Last update: Dec 20, 2022

Related tags

Overview

VidLanKD

Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal.

Setup

# Create python environment (optional)
conda create -n vidlankd python=3.7

# Install python dependencies
pip install -r requirements.txt

To speed up the training, we use mixed precision with Apex.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dataset Preparation

Text Dataset

We provide scripts to obtain datasets "wiki103" and "wiki".

Wiki103, a seleted subset of English Wikipedia.

bash data/wiki103/get_data_cased.bash

English Wikipedia. The scripts are modified from XLM.

bash data/wiki/get_data_cased.bash en

Video Dataset

Howto100m where you can download official captions and videos features.

Video Features Extraction Code

To be updated.

We extracted our 2D-level video features with ResNet152 from torchvision.
We extracted our 3D-level video features with 3D-RexNext.

Downstream tasks

GLUE dataset

Download dataset

python download_glue_data.py --data_dir data/glue --tasks all

Training

Teacher model pre-training

# bash scripts/small_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/small_vlm_howto100m.bash 0,1,2,3 howto100m_bert_small_vokenhinge
# bash scripts/base_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/base_vlm_howto100m.bash 0,1,2,3 howto100m_bert_base_vokenhinge

Knowledge transfer to student model

# bash scripts/small_vlm_wiki103.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/small_vlm_wiki103.bash 0,1,2,3 howto100m_bert_small_vokenhinge/checkpoint-epoch0019 wiki103_bert_small_vokenmmd
# bash scripts/base_vlm_wiki.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/base_vlm_wiki.bash 0,1,2,3 howto100m_bert_base_vokenhinge/checkpoint-epoch0019 wiki_bert_base_vokenmmd

Finetuning on GLUE tasks

# bash scripts/run_glue_at_epoch.bash $GPUS $NumTrainEpochs $SNAP_PATH                        
bash scripts/run_glue_at_epoch.bash 0,1,2,3 3 snap/vlm/wiki103_bert_small_vokenmmd/checkpoint-epoch0019

Acknowledgements

Part of the code is built based on vokenization, huggingface transformers, and facebook faiss.

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Related tags

Overview

VidLanKD

Setup

Dataset Preparation

Text Dataset

Video Dataset

Video Features Extraction Code

Downstream tasks

GLUE dataset

Training

Acknowledgements

Owner

Zineng Tang

Multi-View Radar Semantic Segmentation

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).

Code for "Single-view robot pose and joint angle estimation via render & compare", CVPR 2021 (Oral).

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Text-Based Ideal Points

Augmented Traffic Control: A tool to simulate network conditions

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

FinEAS: Financial Embedding Analysis of Sentiment 📈

Convert onnx models to pytorch.

Open CV - Convert a picture to look like a cartoon sketch in python

A micro-game "flappy bird".

Official implementation of the Neurips 2021 paper Searching Parameterized AP Loss for Object Detection.

Codebase for INVASE: Instance-wise Variable Selection - 2019 ICLR

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

All supplementary material used by me while TA-ing CS3244: Machine Learning

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Do Smart Glasses Dream of Sentimental Visions? Deep Emotionship Analysis for Eyewear Devices

IMBENS: class-imbalanced ensemble learning in Python.