A PyTorch implementation of unsupervised SimCSE

Last update: Dec 23, 2022

Overview

A PyTorch implementation of unsupervised SimCSE

SimCSE: Simple Contrastive Learning of Sentence Embeddings

1. 用法

无监督训练

python train_unsup.py ./data/news_title.txt ./path/to/huggingface_pretrained_model

详细参数

python train_unsup.py -h

相似文本检索测试

python test_unsup.py

query title:
基金亏损路未尽 后市看法仍偏谨慎

sim title:
基金亏损路未尽 后市看法仍偏谨慎
海通证券：私募对后市看法偏谨慎
连塑基本面不容乐观 后市仍有下行空间
基金谨慎看待后市行情
稳健投资者继续保持观望 市场走势还未明朗
下半年基金投资谨慎乐观
华安基金许之彦：下半年谨慎乐观
楼市主导 期指后市不容乐观
基金公司谨慎看多明年市
前期乐观预期被否 基金重归谨慎

STS-B数据集训练和测试

中文STS-B数据集，详情见这里

# 训练
python train_unsup.py ./data/STS-B/cnsd-sts-train_unsup.txt

# 验证
python eval_unsup.py

模型	STS-B dev	STS-B test
hfl/chinese-bert-wwm-ext	0.3326	0.3209
simcse	0.7499	0.6909

与苏剑林的实验结果接近，BERT-P1是0.3465，SIMCSE是0.6904

A PyTorch implementation of unsupervised SimCSE

Related tags

Overview

A PyTorch implementation of unsupervised SimCSE

1. 用法

无监督训练

相似文本检索测试

STS-B数据集训练和测试

2. 参考

Owner

Improved Fitness Optimization Landscapes for Sequence Design

Improving Transferability of Representations via Augmentation-Aware Self-Supervision

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

Unsupervised Image Generation with Infinite Generative Adversarial Networks

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

More than a hundred strange attractors

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

Natural Intelligence is still a pretty good idea.

Official implementation for "Image Quality Assessment using Contrastive Learning"

Adaptation through prediction: multisensory active inference torque control

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

Human Pose Detection on EdgeTPU

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

A numpy-based implementation of RANSAC for fundamental matrix and homography estimation. The degeneracy updating and local optimization components are included and optional.

This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

[CVPR 2022] Pytorch implementation of "Templates for 3D Object Pose Estimation Revisited: Generalization to New objects and Robustness to Occlusions" paper

某学校选课系统GIF验证码数据集 + Baseline模型 + 上下游相关工具

PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners