Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Last update: Dec 16, 2022

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

ACL2021 Findings

Usage

0. Prepare environment

Requirements:

python==3.6
tensorflow-gpu==1.13.1
scipy==1.5.2
scikit-learn==0.23.2

1. Prepare data

Download preprocessed datasets from Google Drive and extract files to the path ./data.

2. Run the model

python main.py --data_dir ./data/{dataset} --output_dir ./output

3. Evaluation

topic coherence: coherence score.

topic diversity:

python utils/TU.py --data_path {path of topic word file}

Citation

If you are interested in our work, please cite as

@inproceedings{wu2021discovering,
    title = "Discovering Topics in Long-tailed Corpora with Causal Intervention",
    author = "Wu, Xiaobao  and
    Li, Chunping  and
    Miao, Yishu",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.15",
    doi = "10.18653/v1/2021.findings-acl.15",
    pages = "175--185",
}

Other related works

EMNLP2020 Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder

NLPCC2020 Learning Multilingual Topics with Neural Variational Inference

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention.

Related tags

Overview

Code for Discovering Topics in Long-tailed Corpora with Causal Intervention

Usage

0. Prepare environment

1. Prepare data

2. Run the model

3. Evaluation

Citation

Other related works

Owner

Xiaobao Wu

Implementation of Multistream Transformers in Pytorch

Natural language computational chemistry command line interface.

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Two-stage text summarization with BERT and BART

It analyze the sentiment of the user, whether it is postive or negative.

Text editor on python to convert english text to malayalam(Romanization/Transiteration).

Extracting Summary Knowledge Graphs from Long Documents

Black for Python docstrings and reStructuredText (rst).

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

🎐 a python library for doing approximate and phonetic matching of strings.

Chinese real time voice cloning (VC) and Chinese text to speech (TTS).

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Search for documents in a domain through Google. The objective is to extract metadata

Deduplication is the task to combine different representations of the same real world entity.

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Image2pcl - Enter the metaverse with 2D image to 3D projections

Ελληνικά νέα (Python script) / Greek News Feed (Python script)

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

TFIDF-based QA system for AIO2 competition