CoSENT、STS、SentenceBERT

Last update: Dec 07, 2022

Related tags

Text Data & NLP CoSENT_Pytorch

Overview

CoSENT_Pytorch

比Sentence-BERT更有效的句向量方案

参考: https://github.com/bojone/CoSENT
对应博客：https://kexue.fm/archives/8847
孟子预训练模型: https://github.com/Langboat/Mengzi

实验结果

实验效果来了。预训练模型用的是孟子, 学习率2e-5,batch_size=64,等价苏神代码中的batch_size=32. 只用了训练集训练，然后在测试集上做测试。分别训练了5个epoch，使用斯皮尔曼系数评价

BQ数据集上的效果:

Epoch:0 | corr: 0.710114
Epoch:1 | corr: 0.722789
Epoch:2 | corr: 0.714183
Epoch:3 | corr: 0.713727
Epoch:4 | corr: 0.712173

LCQMC数据集上的效果:

Epoch:0 | corr: 0.779130
Epoch:1 | corr: 0.785519
Epoch:2 | corr: 0.786981
Epoch:3 | corr: 0.785071
Epoch:4 | corr: 0.784286

苏神的结果: train训练、test测试：

	ATEC	BQ	LCQMC	PAWSX	STS-B	Avg
BERT+CoSENT	49.74	72.38	78.69	60.00	80.14	68.19
Sentence-BERT	46.36	70.36	78.72	46.86	66.41	61.74
RoBERTa+CoSENT	50.81	71.45	79.31	61.56	81.13	68.85
Sentence-RoBERTa	48.29	69.99	79.22	44.10	72.42	62.80

最终结果比苏神略低。在BQ上增幅明显，在LCQMC数据集上略低。其他数据的实验随后补上。

Owner

肖路微信公众号: AI炼丹师

GitHub Repository

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any

1 Apr 28, 2022

Tool to check whether a GCP bucket is public or not.

Tool to check publicly accessible GCP bucket. Blog https://justm0rph3u5.medium.com/gcp-inspector-auditing-publicly-exposed-gcp-bucket-ac6cad55618c Wha

7 Nov 24, 2022

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

1 Jun 05, 2022

A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

1.2k Dec 18, 2022

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Arabic-Phonetic-Output You can input the phonetic version of any Arabic text her

1 Dec 30, 2021

State of the art faster Natural Language Processing in Tensorflow 2.0 .

tf-transformers: faster and easier state-of-the-art NLP in TensorFlow 2.0 ****************************************************************************

74 Dec 05, 2022

Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

J.A.R.V.I.S Kindly consider starring this repository if you like the program :-) What/Who is J.A.R.V.I.S? J.A.R.V.I.S is an chatbot written that is bu

50 Dec 31, 2022

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

CheckList This repository contains code for testing NLP Models as described in the following paper: Beyond Accuracy: Behavioral Testing of NLP models

1.8k Dec 28, 2022

DeepPavlov Tutorials

DeepPavlov tutorials DeepPavlov: Sentence Classification with Word Embeddings DeepPavlov: Transfer Learning with BERT. Classification, Tagging, QA, Ze

28 Sep 13, 2022

GSoC'2021 | TensorFlow implementation of Wav2Vec2

GSoC'2021 | TensorFlow implementation of Wav2Vec2

73 Nov 28, 2022

Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

16 Feb 24, 2022

nlpcommon is a python Open Source Toolkit for text classification.

nlpcommon nlpcommon, Python Text Tool. Guide Feature Install Usage Dataset Contact Cite Reference Feature nlpcommon is a python Open Source

3 May 29, 2022

SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings This repository contains the code and pre-trained models for our paper SimCSE: Simple Contr

2.5k Jan 07, 2023

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

77.3k Jan 03, 2023

Ray-based parallel data preprocessing for NLP and ML.

Wrangl Ray-based parallel data preprocessing for NLP and ML. pip install wrangl # for latest pip install git+https://github.com/vzhong/wrangl See exa

33 Dec 27, 2022

Weaviate demo with the text2vec-openai module

Weaviate demo with the text2vec-openai module This repository contains an example of how to use the Weaviate text2vec-openai module. When using this d

11 Nov 11, 2022

Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

483 Jan 02, 2023

2021语言与智能技术竞赛：机器阅读理解任务

LICS2021 MRC 1. 项目&任务介绍本项目基于官方给定的baseline（DuReader-Checklist-BASELINE）进行二次改造，对整个代码框架做了简单的重构，对核心网络结构添加了注释，解耦了数据读取的模块，并添加了阈值确认的功能，一些小的细节也做了改进。本次任务为202

29 Dec 05, 2022

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

pySBD: Python Sentence Boundary Disambiguation (SBD) pySBD - python Sentence Boundary Disambiguation (SBD) - is a rule-based sentence boundary detecti

549 Jan 06, 2023

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba

48 Dec 06, 2022