For making Tagtog annotation into csv dataset

Last update: Dec 28, 2021

Overview

tagtog_relation_extraction

for making Tagtog annotation into csv dataset

How to Use

On Tagtog

1. Go to Project > Downloads
2. Download all documents, using the button below

On Local

1. Place folders and files according to the structure specified below:

tagtog_relation_extraction
├──main.py
├──util.py
├──.gitignore
├──README.md
├──requirements.txt
└──Your_download_file_Name
   ├──annotations-legend.json
   ├──ann.json
   |  └──master
   |     └──pool/
   ├──plain.html
   |  └──pool/
   ├──guidelines.md
   └──README.md

2. Install other required packages

tqdm==4.62.3
pandas==1.1.5
beautifulsoup4==4.10.0

$ pip install -r $ROOT/tagtog_relation_extraction/requirements.txt

3. Run

$ python main.py --path Your_download_file_Name

Result

1. Dataset file (dataset.csv)

csv file with rows in KLUE dataset format
example:

sentence: 가장 가능성이 높은 새 대안은 플랑크 상수를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 원자질량단위이다
sub_tag: {'word': '원자질량단위', 'start_idx': 85, 'end_idx': 90, 'type': 'POH'}
obj_tag: {'word': '플랑크 상수', 'start_idx': 17, 'end_idx': 22, 'type': 'POH'}
label: POH:no_relation'

2. File for checking answers (answer_check.csv)

csv file desgined for checking entity taggings and labels
example:

sentence: 가장 가능성이 높은 새 대안은 
   
    를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 
    
     이다	
sub_tag: POH
obj_tag: POH
label: POH:no_relation

Restrictions

Entity labels should follow the following form

SUBJ-{ENT_TYPE}-{RELATION_NAME}
OBJ-{ENT_TYPE}-{RELATION_NAME}

If this is not the case you might need some revision on the util.py file

For making Tagtog annotation into csv dataset

Related tags

Overview

tagtog_relation_extraction

How to Use

On Tagtog

On Local

Result

Restrictions

Owner

hyeong

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

A set of procedures that can realize covid19 virus detection based on blood.

Spaghetti: an open-source Python library for the analysis of network-based spatial data

Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data

Vectorizers for a range of different data types

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

General Assembly's 2015 Data Science course in Washington, DC

Python Package for DataHerb: create, search, and load datasets.

Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Intercepting proxy + analysis toolkit for Second Life compatible virtual worlds

Convert monolithic Jupyter notebooks into Ploomber pipelines.

This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

Analysis scripts for QG equations

Bigdata Simulation Library Of Dream By Sandman Books

Ejercicios Panda usando Pandas

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

Data imputations library to preprocess datasets with missing data

Jupyter notebooks for the book "The Elements of Statistical Learning".