Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Last update: Dec 16, 2022

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

ACL2021 Findings

Usage

0. Prepare environment

Requirements:

python==3.6
tensorflow-gpu==1.13.1
scipy==1.5.2
scikit-learn==0.23.2

1. Prepare data

Download preprocessed datasets from Google Drive and extract files to the path ./data.

2. Run the model

python main.py --data_dir ./data/{dataset} --output_dir ./output

3. Evaluation

topic coherence: coherence score.

topic diversity:

python utils/TU.py --data_path {path of topic word file}

Citation

If you are interested in our work, please cite as

@inproceedings{wu2021discovering,
    title = "Discovering Topics in Long-tailed Corpora with Causal Intervention",
    author = "Wu, Xiaobao  and
    Li, Chunping  and
    Miao, Yishu",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.15",
    doi = "10.18653/v1/2021.findings-acl.15",
    pages = "175--185",
}

Other related works

EMNLP2020 Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

NLPCC2020 Learning Multilingual Topics with Neural Variational Inference

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Related tags

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

Usage

0. Prepare environment

1. Prepare data

2. Run the model

3. Evaluation

Citation

Other related works

Owner

Xiaobao Wu

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

Transformer training code for sequential tasks

Official PyTorch implementation of SegFormer

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

This simple Python program calculates a love score based on your and your crush's full names in English

Blazing fast language detection using fastText model

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

A BERT-based reverse-dictionary of Korean proverbs

Free and Open Source Machine Translation API. 100% self-hosted, offline capable and easy to setup.

scikit-learn wrappers for Python fastText.

Maha is a text processing library specially developed to deal with Arabic text.

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

Accurately generate all possible forms of an English word e.g "election" --> "elect", "electoral", "electorate" etc.

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

Augmenty is an augmentation library based on spaCy for augmenting texts.

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Contains descriptions and code of the mini-projects developed in various programming languages

DaCy: The State of the Art Danish NLP pipeline using SpaCy

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022