2021搜狐校园文本匹配算法大赛baseline

Last update: Sep 06, 2022

Related tags

Text Data & NLP sohu2021-baseline

Overview

sohu2021-baseline

2021搜狐校园文本匹配算法大赛baseline

简介

分享了一个搜狐文本匹配的baseline，主要是通过条件LayerNorm来增加模型的多样性，以实现同一模型处理不同类型的数据、形成不同输出的目的。

线下验证集F1约0.74，线上测试集F1约0.73。预训练模型是RoFormer，也欢迎对比其他预训练模型的效果。

测试环境：tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.5，如果在其他环境组合下报错，请根据错误信息自行调整代码。

详情请看：https://kexue.fm/archives/8337

交流

QQ交流群：808623966，微信群请加机器人微信号spaces_ac_cn

Owner

苏剑林(Jianlin Su)

科学爱好者

GitHub Repository

A natural language modeling framework based on PyTorch

Overview PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapi

6.4k Dec 27, 2022

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

端到端的长文本摘要模型（法研杯2020司法摘要赛道）

334 Jan 08, 2023

Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

29 Dec 01, 2022

An assignment on creating a minimalist neural network toolkit for CS11-747

minnn by Graham Neubig, Zhisong Zhang, and Divyansh Kaushik This is an exercise in developing a minimalist neural network toolkit for NLP, part of Car

63 Dec 29, 2022

Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

2 Jan 09, 2022

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti

20 Dec 14, 2022

This repo contains simple to use, pretrained/training-less models for speaker diarization.

PyDiar This repo contains simple to use, pretrained/training-less models for speaker diarization. Supported Models Binary Key Speaker Modeling Based o

12 Jan 20, 2022

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

19 Feb 17, 2022

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Table of contents Introduction Using BARTpho with fairseq Using BARTpho with transformers Notes BARTpho: Pre-trained Sequence-to-Sequence Models for V

58 Dec 23, 2022

Community and sentiment analysis based on tweets

The project has set itself the goal of analyzing the thoughts and interaction of Italian users through the social posts expressed through the Twitter platform on the day of the entry into force of th

3 Nov 17, 2022

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022

Datasets of Automatic Keyphrase Extraction

This repository contains 20 annotated datasets of Automatic Keyphrase Extraction made available by the research community. Following are the datasets and the original papers that proposed them. If yo

163 Dec 23, 2022

A NLP program: tokenize method, PoS Tagging with deep learning

IRIS NLP SYSTEM A NLP program: tokenize method, PoS Tagging with deep learning Report Bug · Request Feature Table of Contents About The Project Built

7 Dec 13, 2022

Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

8 Sep 04, 2021

Understand Text Summarization and create your own summarizer in python

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent

1 Oct 18, 2022

A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her

0 Aug 04, 2021

2021搜狐校园文本匹配算法大赛baseline

Related tags

Overview

sohu2021-baseline

简介

交流

Owner

苏剑林(Jianlin Su)

A natural language modeling framework based on PyTorch

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Blue Brain text mining toolbox for semantic search and structured information extraction

An assignment on creating a minimalist neural network toolkit for CS11-747

Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

This repo contains simple to use, pretrained/training-less models for speaker diarization.

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Community and sentiment analysis based on tweets

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Datasets of Automatic Keyphrase Extraction

A NLP program: tokenize method, PoS Tagging with deep learning

Code for lyric-section-to-comment generation based on huggingface transformers.

Understand Text Summarization and create your own summarizer in python

A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

PG-19 Language Modelling Benchmark

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".