One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Last update: Dec 11, 2022

Related tags

Deep Learning DMRST_Parser

Overview

Introduction

One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".
Users can apply it to parse the input text from scratch, and get the EDU segmentations and the parsed tree structure.
The model supports both sentence-level and document-level RST discourse parsing.
This repo and the pre-trained model is only for research use.

Package Requirements

pytorch==1.7.1
transformers==4.8.2

Supported Languages

We trained and evaluated the model with the multilingual collection of RST discourse treebanks, and it natively supports 6 languages: English, Portuguese, Spanish, German, Dutch, Basque. Interested users can also try other languages.

Data Format

[Input] InputSentence: The input document/sentence, and the raw text will be tokenizaed and encoded by the xlm-roberta-base language backbone. '|| ' denotes the EDU boundary positions.
- Although the report, || which has released || before the stock market opened, || didn't trigger the 190.58 point drop in the Dow Jones Industrial Average, || analysts said || it did play a role in the market's decline. ||
[Output] EDU_Breaks: The indices of the EDU boundary tokens, including the last word of the sentence.
- [2, 5, 10, 22, 24, 33]
[Output] tree_parsing_output: The model outputs of the discourse parsing tree follow this format.
- (1:Satellite=Contrast:4,5:Nucleus=span:6) (1:Nucleus=Same-Unit:3,4:Nucleus=Same-Unite:4) (5:Satellite=Attribution:5,6:Nucleus=span:6) (1:Satellite=span:1,2:Nucleus=Elaboration:3) (2:Nucleus=span:2,3:Satellite=Temporal:3)

How to use it for parsing

Put the text paragraph to the file ./data/text_for_inference.txt.
Run the script MUL_main_Infer.py to obtain the RST parsing result. See the script for detailed model output.
We recommend users to run the parser on a GPU-equipped environment.

Citation

@article{liu2021dmrst,
  title={DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing},
  author={Liu, Zhengyuan and Shi, Ke and Chen, Nancy F},
  journal={arXiv preprint arXiv:2110.04518},
  year={2021}
}

@inproceedings{liu2020multilingual,
  title={Multilingual Neural RST Discourse Parsing},
  author={Liu, Zhengyuan and Shi, Ke and Chen, Nancy},
  booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
  pages={6730--6738},
  year={2020}
}

One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Related tags

Overview

Introduction

Package Requirements

Supported Languages

Data Format

How to use it for parsing

Citation

Owner

seq-to-mind

Neural Articulated Radiance Field

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Opinionated code formatter, just like Python's black code formatter but for Beancount

The final project of "Applying AI to 2D Medical Imaging Data" of "AI for Healthcare" nanodegree - Udacity.

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

Statistical-Rethinking-with-Python-and-PyMC3 - Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

Continual Learning of Long Topic Sequences in Neural Information Retrieval

PyTorch implementation of UPFlow (unsupervised optical flow learning)

[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)

You can draw the corresponding bounding box into the image and save it according to the result file (txt format) run by the tracker.

iris - Open Source Photos Platform Powered by PyTorch

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

Code for LIGA-Stereo Detector, ICCV'21

CLIP (Contrastive Language–Image Pre-training) for Italian

The self-supervised goal reaching benchmark introduced in Discovering and Achieving Goals via World Models

This repository is for Competition for ML_data class

PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA