Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Last update: Apr 14, 2022

Related tags

Text Data & NLP AppleLM

Overview

AppleLM

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles (TASLP 2022)

Setup

This implementation is based on Transformers.

Preparation

Download GLUE datasets

The datasets can be downloaded automatically. Please refer to https://github.com/nyu-mll/GLUE-baselines

git clone https://github.com/nyu-mll/GLUE-baselines.git
python download_glue_data.py --data_dir glue_data --tasks all

It is recommended to put the folder glue_data to data/. The architecture looks like:

AppleLM
└───data
│   └───glue_data
│       │   CoLA/
│       │   MRPC/
│       │   ...

Visual Features

Pre-extracted visual features can be downloaded from Google Drive borrowed from the repo Multi30K.

The features are used in image embedding layer for indexing. Extract train-resnet50-avgpool.npy and put it in the data/ folder.

Training & Evaluate

export GLUE_DIR=data/glue_data/
export CUDA_VISIBLE_DEVICES="0"
export TASK_NAME=CoLA
python ./examples/run_glue_visual-tfidf_att.py \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --task_name $TASK_NAME \
    --do_eval \
    --do_lower_case \
    --data_dir $GLUE_DIR/$TASK_NAME \
    --max_seq_length 128 \
    --per_gpu_eval_batch_size=32   \
    --per_gpu_train_batch_size=16   \
    --learning_rate 1e-5 \
    --eval_all_checkpoints \
    --save_steps 500 \
    --max_steps 5336 \
    --warmup_steps 320 \
    --image_dir data/train.lc.norm.tok.en \
    --image_embedding_file data/train-resnet50-avgpool.npy \
    --num_img 3 \
    --tfidf 5 \
    --image_merge att-gate \
    --stopwords_dir data/stopwords-en.txt \
    --output_dir experiments/CoLA_bert_wwm

Reference

Please kindly cite this paper in your publications if it helps your research:

@ARTICLE{zhang2022which,
  author={Zhang, Zhuosheng and Yu, Haojie and Zhao, Hai and Utiyama, Masao},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles}, 
  year={2022},
  volume={30},
  number={},
  pages={49-59},
  doi={10.1109/TASLP.2021.3130972}
}

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Related tags

Overview

AppleLM

Setup

Preparation

Training & Evaluate

Reference

Owner

Zhuosheng Zhang

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

Plugin repository for Macast

Text Classification Using LSTM

End-to-end MLOps pipeline of a BERT model for emotion classification.

Need: Image Search With Python

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

New Modeling The Background CodeBase

I can help you convert your images to pdf file.

This library is testing the ethics of language models by using natural adversarial texts.

Crowd sourced training data for Rasa NLU models

OpenChat: Opensource chatting framework for generative models

LCG T-TEST USING EUCLIDEAN METHOD

Python library for parsing resumes using natural language processing and machine learning

Deduplication is the task to combine different representations of the same real world entity.

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

scikit-learn wrappers for Python fastText.

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

Shellcode antivirus evasion framework

Convolutional 2D Knowledge Graph Embeddings resources

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)