Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

Last update: Aug 10, 2022

Overview

korean extractive summarization

2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

Leaderboard

Notice

Text Summarization with Pretrained Encoders에 나오는 bertsumext모델(extractive summarization을 위해 bert위에 추가적으로 inter-sentence 레이어를 얹은구조)의 bert를 klue/roberta-large모델로 대체하여 구성하였음.
uoneway님 KoBertSum 레포지토리를 기반으로 만들어짐.
수정된 부분 - pytorch 1.1 ->pytorch 1.7.1버전 지원하도록 수정.
수정된 부분 - transformers 4.0 버전 지원하도록 수정, klue/roberta-large 포팅
수정된 부분 - 불필요한 부분 삭제 or 수정

Process

Environment Setting

pip install -r requirements.txt
python src/others/install_mecab.py # mecab설치

Preprocess( ./ext/data/raw/train.jsonl, ./ext/data/raw/test.jsonl이 있어야 함)

python main.py -task make_data -n_cpus 5

Train

python main.py -task train -target_summary_sent abs -visible_gpus 0

Validation(path에 있는 모델파일 전부 validation하는 코드임.)

python main.py -task valid -model_path 1209_1236

Test and submission 파일 생성

python main.py -task test -test_from 1209_1236/model_step_500.pt -visible_gpus 0
cd ext/results/
python get_submission.py -filename result_1209_1236_step_500.candidate.jsonl

포함되지 않은 부분

대회에선, ensemble 이용해서 rouge-L 53.15 -> 53.5 으로 끌어올렸는데, 간단하니까 필요하신 분들은 구현해서 사용하시면 성능향상에 도움이 될 듯.
추가로 데이터셋 폼(jsonl각 line)은 이렇게 구성됨(세줄요약 데이터셋)

{"category": "none", "id": 0, "article_original": ["","","","",""], "extractive": [2, 3, 4], "abstractive": "", "extractive_sents": ["", "", ""]}

Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

Related tags

Overview

korean extractive summarization

Leaderboard

Notice

Process

포함되지 않은 부분

Reference

Owner

Converts python code into c++ by using OpenAI CODEX.

LSTC: Boosting Atomic Action Detection with Long-Short-Term Context

Code for text augmentation method leveraging large-scale language models

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

Deep Learning Topics with Computer Vision & NLP

Comprehensive-E2E-TTS - PyTorch Implementation

Ask for weather information like a human

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Implementation of "Adversarial purification with Score-based generative models", ICML 2021

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

Code for "Generative adversarial networks for reconstructing natural images from brain activity".

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

OpenChat: Opensource chatting framework for generative models

A full spaCy pipeline and models for scientific/biomedical documents.

TruthfulQA: Measuring How Models Imitate Human Falsehoods

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Finetune gpt-2 in google colab