Grading tools for Advanced NLP (11-711)

Installation

You'll need docker and unzip to use this repo. For docker, visit the official guide to get started. For unzip, you can install it on ubuntu via sudo apt-get install unzip.

Install the python package by

git clone https://github.com/ProKil/anlp-grading-tools
cd anlp-grading-tools
pip install -e .

Usage

To evaluate your code, you'll need to change the environment variables in test.sh.

ANLP_TMP_DIR: mkdir a new folder, e.g. mkdir tmp, and point this variable to the absolute path of the tmp folder.

SUBMISSION_DIR: this should point to the folder containing your submission zip file. Note that the toolkit will automatically evaluate all zip files in the folder.

SCORES_DIR: this should point to an empty folder. Your score will be logged in a text file there.

DATA_DIR: this should point to the data folder of minnn-assignment. Please copy the original minnn-assignment/classifier.py to minnn-assignment/data/classifier_orig.py to test if your code can be executed with the original classifier.

Example code to prepare the folders:

mkdir tmp
mkdir scores
cp -r path/to/minnn-assignment/data ./
cp path/to/minnn-assignment/classifier.py data/classifier_orig.py
mkdir submission
cp your/submission.zip submission

Now you can evaluate your code through bash test.sh, after which your scores are at SCORES_DIR/andrewid. It is normal to get 0s for the last two (correct labels for the imdb test set are not available), but you should get reasonable accuracies for the first two (~40).

Troubleshooting

You may find writing files inside ANLP_TMP_DIR and SCORE_DIR requiring permission. You can either use sudo or log into docker through docker run -v FOLDER_TO_WRITE:/mnt -it --entrypoint /bin/bash anlp and cd /mnt to write those files.
You may experience other permission issues with docker. Please refer to this page to use docker without sudo.

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Related tags

Overview

Grading tools for Advanced NLP (11-711)

Installation

Usage

Troubleshooting

Owner

Hao Zhu

A Python/Pytorch app for easily synthesising human voices

NLP Text Classification

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

The RWKV Language Model

Textpipe: clean and extract metadata from text

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

A versatile token stream for handwritten parsers.

Must-read papers on improving efficiency for pre-trained language models.

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Score-Based Point Cloud Denoising (ICCV'21)

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

A repo for open resources & information for people to succeed in PhD in CS & career in AI / NLP

A very simple framework for state-of-the-art Natural Language Processing (NLP)

ReCoin - Restoring our environment and businesses in parallel

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

OpenAI CLIP text encoders for multiple languages!

A multi-voice TTS system trained with an emphasis on quality