SimCSE在中文任务上的简单实验

Related tags

MiscellaneousSimCSE
Overview

SimCSE 中文测试

SimCSE在常见中文数据集上的测试,包含ATECBQLCQMCPAWSXSTS-B共5个任务。

介绍

文件

- utils.py  工具函数
- eval.py  评测主文件

评测

命令格式:

python eval.py [model_type] [pooling] [task_name] [dropout_rate]

使用例子:

python eval.py BERT cls ATEC 0.3

其中四个参数必须传入,含义分别如下:

- model_type: 模型,必须是['BERT', 'RoBERTa', 'WoBERT', 'RoFormer', 'BERT-large', 'RoBERTa-large', 'SimBERT', 'SimBERT-tiny', 'SimBERT-small']之一;
- pooling: 池化方式,必须是['first-last-avg', 'last-avg', 'cls', 'pooler']之一;
- task_name: 评测数据集,必须是['ATEC', 'BQ', 'LCQMC', 'PAWSX', 'STS-B']之一;
- dropout_rate: 浮点数,dropout的比例,如果为0则不dropout;

环境

测试环境:tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.5,如果在其他环境组合下报错,请根据错误信息自行调整代码。

下载

Google官方的两个BERT模型:

关于语义相似度数据集,可以从数据集对应的链接自行下载,也可以从作者提供的百度云链接下载。

其中senteval_cn目录是评测数据集汇总,senteval_cn.zip是senteval目录的打包,两者下其一就好。

相关

交流

QQ交流群:808623966,微信群请加机器人微信号spaces_ac_cn

Owner
苏剑林(Jianlin Su)
科学爱好者
苏剑林(Jianlin Su)
The-White-Noise-Project - The project creates noise intentionally

The-White-Noise-Project High quality audio matters everywhere, even in noise. Be

Ali Hakim Taşkıran 1 Jan 02, 2022
This speeds up PyCharm's package index processes and avoids CPU & memory overloading

This speeds up PyCharm's package index processes and avoids CPU & memory overloading

1 Feb 09, 2022
Tool to automate the enumeration of a website (CTF)

had4ctf Tool to automate the enumeration of a website (CTF) DISCLAIMER: THE TOOL HAS BEEN DEVELOPED SOLELY FOR EDUCATIONAL PURPOSE ,I WILL NOT BE LIAB

Had 2 Oct 24, 2021
SciPy library main repository

SciPy SciPy (pronounced "Sigh Pie") is an open-source software for mathematics, science, and engineering. It includes modules for statistics, optimiza

SciPy 10.7k Jan 09, 2023
Prototype application for GCM bias-correction and downscaling

dodola Prototype application for GCM bias-correction and downscaling This is an unstable prototype. This is under heavy development. Features Nothing!

Climate Impact Lab 9 Dec 27, 2022
The repository is about 100+ python programming exercise problem discussed, explained, and solved in different ways

Break The Ice With Python A journey of 100+ simple yet interesting problems which are explained, solved, discussed in different pythonic ways Introduc

Abdullah Al Masud Tushar 2.2k Jan 04, 2023
Package pyVHR is a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG)

Package pyVHR (short for Python framework for Virtual Heart Rate) is a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG)

PHUSE Lab 261 Jan 03, 2023
Build Xmas cards with user inputs

Automatically build Xmas cards with user inputs

Anand 9 Jan 25, 2022
A program that analyzes data from inertia measurement units installeed in aircraft and generates g-exceedance curves

A program that analyzes data from inertia measurement units installeed in aircraft and generates g-exceedance curves

Pooya 1 Nov 23, 2021
An app about keyboards, originating from the design of u/Sonnenschirm

keebapp-backend An app about keyboards, originating from the design of u/Sonnenschirm Setup Firstly, ensure that the environment for python is install

8 Sep 04, 2022
ColabFold / AlphaFold2_advanced on your local PC (or macOS)

LocalColabFold ColabFold / AlphaFold2_advanced on your local PC (or macOS) Installation For Linux Make sure curl and wget commands are already install

Yoshitaka Moriwaki 207 Dec 22, 2022
Curses frontend for Canto daemon

Canto Curses The curses (text) client for canto-daemon. Canto-daemon is required to work and is found at: http://github.com/themoken/canto-next Requir

Jack Miller 86 Dec 28, 2022
Modern robots.txt Parser for Python

Robots Exclusion Protocol Parser for Python Robots.txt parsing in Python. Goals Fetching -- helper utilities for fetching and parsing robots.txts, inc

Moz 176 Dec 16, 2022
A simple and easy to use Python's PIP configuration manager, similar to the Arch Linux's Java manager.

PIPCONF - The PIP configuration manager If you need to manage multiple configurations containing indexes and trusted hosts for PIP, this project was m

João Paulo Carvalho 11 Nov 30, 2022
A simple spyware in python.

Spyware-Python- Dependencies: Python 3.x OpenCV PyAutoGUI PyMongo (for mongodb connection) Flask (Web Server) Ngrok (helps us push our fla

Abubakar 3 Sep 07, 2022
Collection of Beginner to Intermediate level Python scripts contributed by members and participants.

Hacktoberfest2021-Python Hello there! This repository contains a 'Collection of Beginner to Intermediate level Python projects', created specially for

12 May 25, 2022
Completed task 1 and task 2 at LetsGrowMore as a data science intern.

LetsGrowMore-Internship Completed task 1 and task 2 at LetsGrowMore as a data science intern. Task 1- Task 2- Creating a Decision Tree classifier and

Sanjyot Panure 1 Jan 16, 2022
A Web app to Cross-Seed torrents in Deluge/qBittorrent/Transmission

SeedCross A Web app to Cross-Seed torrents in Deluge/qBittorrent/Transmission based on CrossSeedAutoDL Require Jackett Deluge/qBittorrent/Transmission

ccf2012 76 Dec 19, 2022
What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space

What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space Introduction: Environment: Python3.6.5, PyTorch1.5.0 Dataset: CIFAR-10, Image

8 Mar 23, 2022
A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

Windel Bouwman 277 Dec 26, 2022