For making Tagtog annotation into csv dataset

Last update: Dec 28, 2021

Overview

tagtog_relation_extraction

for making Tagtog annotation into csv dataset

How to Use

On Tagtog

1. Go to Project > Downloads
2. Download all documents, using the button below

On Local

1. Place folders and files according to the structure specified below:

tagtog_relation_extraction
├──main.py
├──util.py
├──.gitignore
├──README.md
├──requirements.txt
└──Your_download_file_Name
   ├──annotations-legend.json
   ├──ann.json
   |  └──master
   |     └──pool/
   ├──plain.html
   |  └──pool/
   ├──guidelines.md
   └──README.md

2. Install other required packages

tqdm==4.62.3
pandas==1.1.5
beautifulsoup4==4.10.0

$ pip install -r $ROOT/tagtog_relation_extraction/requirements.txt

3. Run

$ python main.py --path Your_download_file_Name

Result

1. Dataset file (dataset.csv)

csv file with rows in KLUE dataset format
example:

sentence: 가장 가능성이 높은 새 대안은 플랑크 상수를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 원자질량단위이다
sub_tag: {'word': '원자질량단위', 'start_idx': 85, 'end_idx': 90, 'type': 'POH'}
obj_tag: {'word': '플랑크 상수', 'start_idx': 17, 'end_idx': 22, 'type': 'POH'}
label: POH:no_relation'

2. File for checking answers (answer_check.csv)

csv file desgined for checking entity taggings and labels
example:

sentence: 가장 가능성이 높은 새 대안은 
   
    를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 
    
     이다	
sub_tag: POH
obj_tag: POH
label: POH:no_relation

Restrictions

Entity labels should follow the following form

SUBJ-{ENT_TYPE}-{RELATION_NAME}
OBJ-{ENT_TYPE}-{RELATION_NAME}

If this is not the case you might need some revision on the util.py file

For making Tagtog annotation into csv dataset

Related tags

Overview

tagtog_relation_extraction

How to Use

On Tagtog

On Local

Result

Restrictions

Owner

hyeong

Data-sets from the survey and analysis

Titanic data analysis for python

An easy-to-use feature store

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

SparseLasso: Sparse Solutions for the Lasso

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

Semi-Automated Data Processing

Cleaning and analysing aggregated UK political polling data.

Probabilistic reasoning and statistical analysis in TensorFlow

Project under the certification "Data Analysis with Python" on FreeCodeCamp

Python data processing, analysis, visualization, and data operations

Common bioinformatics database construction

A simplified prototype for an as-built tracking database with API

In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

Developed for analyzing the covariance for OrcVIO

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

.npy, .npz, .mtx converter.

Fast, flexible and easy to use probabilistic modelling in Python.

ELFXtract is an automated analysis tool used for enumerating ELF binaries

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems