TFIDF-based QA system for AIO2 competition

Last update: Feb 19, 2022

Related tags

Overview

AIO2 TF-IDF Baseline

This is a very simple question answering system, which is developed as a lightweight baseline for AIO2 competition.

In the training stage, the model builds a sparse matrix of TF-IDF features from the questions in training dataset. In the inference stage, the model predicts answers of unseen questions by finding the most similar training question to the input by computing dot product scores of TF-IDF features.

Therefore, in principle, the model cannot predict answers unseen in the training data.

Steps to experiment with the model

Install requirements

$ pip install -r requirements.txt

Train

$ python train.py \
--train_file <data dir>/aio_02_train.jsonl \
--output_dir model \
--pos_list 名詞 \
--stop_words でしょ う \
--max_features 10000

Predict

$ python predict.py \
--model_dir model \
--test_file <data dir>/aio_02_dev_unlabeled_v1.0.jsonl \
--prediction_file <output dir>/predictions.jsonl

Building Docker image

$ docker build -t aio2-tfidf-baseline .

Test locally:

:/app/input" -v ":/app/output" aio2-tfidf-baseline bash ./submission.sh input/aio_02_dev_unlabeled_v1.0.jsonl output/predictions.jsonl "> $ docker run --rm -v ":/app/input" -v ":/app/output" aio2-tfidf-baseline bash ./submission.sh input/aio_02_dev_unlabeled_v1.0.jsonl output/predictions.jsonl 

Save the docker image to file:

$ docker save aio2-tfidf-baseline | gzip > aio2-tfidf-baseline.tar.gz

License

The codes in this repository are open-sourced under MIT License.

TFIDF-based QA system for AIO2 competition

Related tags

Overview

AIO2 TF-IDF Baseline

Steps to experiment with the model

Install requirements

Train

Predict

Building Docker image

License

Owner

Masatoshi Suzuki

Multiple implementations for abstractive text summurization , using google colab

Easy, fast, effective, and automatic g-code compression!

Tensorflow implementation of paper: Learning to Diagnose with LSTM Recurrent Neural Networks.

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

NVDA, the free and open source Screen Reader for Microsoft Windows

Reading Wikipedia to Answer Open-Domain Questions

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Interpretable Models for NLP using PyTorch

PyTranslator é simultaneamente um editor e tradutor de texto com diversos recursos e interface feito com coração e 100% em Python

A Paper List for Speech Translation

Speech Recognition Database Management with python

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

Write Python in Urdu - اردو میں کوڈ لکھیں

The training code for the 4th place model at MDX 2021 leaderboard A.

The official repository of the ISBI 2022 KNIGHT Challenge

Lyrics generation with GPT2-based Transformer

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.