端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Last update: Jan 08, 2023

Related tags

Text Data & NLP SPACES

Overview

SPACES

端到端的长文本摘要模型（法研杯2020司法摘要赛道）。

博客介绍：https://kexue.fm/archives/8046

含义

我们将我们的模型称为SPACES，它正好是科学空间的域名之一（https://spaces.ac.cn），具体含义如下：

S：Sparse Softmax；
P：Pretrained Language Model；
A：Abstractive；
C：Copy Mechanism；
E：Extractive；
S：Special Words。

顾名思义，这是一个以词为单位的、包含预训练和Copy机制的“抽取-生成”式摘要模型，里边包含了一些我们对文本生成技术的最新研究成果。

运行

实验环境：tensorflow 1.14 + keras 2.3.1 + bert4keras 0.9.7

(如果是Windows，请用bert4keras>=0.9.8)

首先请在snippets.py中修改相关路径配置，然后再执行下述代码。

训练代码：

#! /bin/bash

python extract_convert.py
python extract_vectorize.py

for ((i=0; i<15; i++));
    do
        python extract_model.py $i
    done

python seq2seq_convert.py
python seq2seq_model.py

预测代码

from final import *
summary = predict(text, topk=3)
print(summary)

交流

QQ交流群：808623966，微信群请加机器人微信号spaces_ac_cn

链接

博客：https://kexue.fm
追一：https://zhuiyi.ai/
预训练模型：https://github.com/ZhuiyiTechnology/pretrained-models
WoBERT：https://github.com/ZhuiyiTechnology/WoBERT

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Related tags

Overview

SPACES

含义

运行

交流

链接

Owner

苏剑林(Jianlin Su)

Pipeline for chemical image-to-text competition

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

Material for GW4SHM workshop, 16/03/2022.

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Speach Recognitions

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Tool which allow you to detect and translate text.

Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset

A fast and easy implementation of Transformer with PyTorch.

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

[KBS] Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks

Few-shot Natural Language Generation for Task-Oriented Dialog

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NeMo: a toolkit for conversational AI

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Auto-researching tool generating word documents.