open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Overview

Open-Information-Extraction-System

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

码源分析

基于LTP依存句法分析(DP, dependency parsing)的中文开放信息抽取系统(rule-based)。

  • 增加并列关系、左附加关系、右附加关系等(递归实现);
  • 这里的依存句法分析只适合简单短句,过长句子、口语化句子dp效果不好会很影响下游抽取。

结果展示(部分)

{
    "ques": "郑州是那个省的",
    "answer": [
        "河南"
    ],
    "desc": "郑州是河南省省会城市,周边有洛阳、开封、新郑、新密、许昌等城市",
    "SPO": [
        [
            "郑州",
            "",
            "那个省"
        ]
    ]
},
{
    "ques": "格林童话《灰姑娘》中,灰姑娘参加舞会时所做的车是由哪种植物变成的?",
    "answer": [
        "南瓜"
    ],
    "desc": "这时,有一位仙女出现了,帮助她摇身一变成为高贵的千金小姐,并将老鼠变成马夫,南瓜变成马车,又变了一套漂亮的衣服和一双水晶(玻璃)鞋给灰姑娘穿上。",
    "SPO": [
        [
            "灰姑娘",
            "参加",
            "舞会"
        ],
        [
            "灰姑娘",
            "参加",
            "舞会"
        ],
        [
            "做车",
            "",
            "变成"
        ]
    ]
 },
 {
    "ques": "中国农历的哪个节气有着北方吃饺子、南方吃汤圆的习俗?",
    "answer": [
        "冬至"
    ],
    "desc": "在冬至节,中国北方有冬至日吃饺子的习俗,南方某些地方有冬至日吃汤圆、粉糍粑的习俗,传说在汉朝的医圣张仲景体念家乡乡民在寒冬中工作的辛苦,在冬至那天利用羊肉等祛寒的药材包在面皮中,作成耳朵的样子,给乡民们治病补身,这个药方的名字...",
    "SPO": [
        [
            "中国农历哪个节气",
            "有着",
            "吃饺子习俗"
        ],
        [
            "北方",
            "",
            "饺子"
        ],
        [
            "南方",
            "",
            "汤圆"
        ]
    ]
}

资源&依赖

Owner
macropodus
A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

MONEYBALL - ChatBot Module: 4006CEM, Class: B, Group: 5 Contributors: Jonas Djondo Roshan Kc Cole Samson Daniel Rodrigues Ihteshaam Naseer Kind remind

Jonas Djondo 1 Nov 18, 2021
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
Knowledge Management for Humans using Machine Learning & Tags

HyperTag helps humans intuitively express how they think about their files using tags and machine learning. Represent how you think using tags. Find what you look for using semantic search for your t

Ravn Tech, Inc. 166 Jan 07, 2023
Pipelines de datos, 2021.

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi. Stack princip

Rodolfo Ferro 8 May 19, 2022
TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset.

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (S

InstaDeep Ltd 72 Dec 09, 2022
Mastering Transformers, published by Packt

Mastering Transformers This is the code repository for Mastering Transformers, published by Packt. Build state-of-the-art models from scratch with adv

Packt 195 Jan 01, 2023
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning English | 中文 ❗ Now we provide inferencing code and pre-training models

164 Jan 02, 2023
FactSumm: Factual Consistency Scorer for Abstractive Summarization

FactSumm: Factual Consistency Scorer for Abstractive Summarization FactSumm is a toolkit that scores Factualy Consistency for Abstract Summarization W

devfon 83 Jan 09, 2023
Simple program that translates the name of files into English

Simple program that translates the name of files into English. Useful for when editing/inspecting programs that were developed in a foreign language.

0 Dec 22, 2021
Python utility library for compositing PDF documents with reportlab.

pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s

Michael Gale 1 Jan 06, 2022
Build Text Rerankers with Deep Language Models

Reranker is a lightweight, effective and efficient package for training and deploying deep languge model reranker in information retrieval (IR), question answering (QA) and many other natural languag

Luyu Gao 140 Dec 06, 2022
A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022
Script and models for clustering LAION-400m CLIP embeddings.

clustering-laion400m Script and models for clustering LAION-400m CLIP embeddings. Models were fit on the first million or so image embeddings. A subje

Peter Baylies 22 Oct 04, 2022
An implementation of WaveNet with fast generation

pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. Features Automatic creation of a dataset (t

Vincent Herrmann 858 Dec 27, 2022
A sentence aligner for comparable corpora

About Yalign is a tool for extracting parallel sentences from comparable corpora. Statistical Machine Translation relies on parallel corpora (eg.. eur

Machinalis 128 Aug 24, 2022
PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding This repository contains the official PyTorch implementation of th

Xiao Xu 26 Dec 14, 2022
NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

Nishant Sharma 1 Jun 05, 2022
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Spanish Language Models 💃🏻 Corpora 📃 Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models 🤖 RoBERTa-base BNE: https://huggingface.co

PlanTL-SANIDAD 203 Dec 20, 2022
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 07, 2022
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong, Jaehyeon Kim, Jaekyoung Bae In our paper, we p

Jungil Kong 1.1k Jan 02, 2023