QQ Browser 2021 AI Algorithm Competition Track 1 1st Place Program

Overview

2021_QQ_AIAC_Tack1_1st

QQ浏览器2021AI算法大赛赛道一 第1名 方案

paper :

环境

python==3.7.10
torch==1.7.1
transformers==4.5.1
pretrain 需要显存>=24GB 内存>=100GB

数据下载

(1) 视频数据集
视频数据集在官网下载 https://algo.browser.qq.com/
预期主办方会开源数据集,开源后会将地址补上
下载后放到 ./input/data 文件夹
tag_list 为标签的 top1w,官方 baseline 中提供,放到同一文件夹

(2) 预训练模型
预训练模型使用了 https://huggingface.co/hfl/chinese-roberta-wwm-ext-large
请使用 python3 -u download_pretrain_model.py 下载

步骤代码

(1) 预训练 + finetune
脚本命令:sh train.sh
时间算力:单模在 1 张 a100 上大约需要 pretrain(2 day),finetune(2 hour)
输出文件:每个单模的 checkpoint 保存在 jobN/model_finetune_1.pth
备注:各个单模间没有前后依赖关系,每个任务需要一张单卡,有多卡可以并行训练各个单模

(2) 代码结构说明
download_pretrain_model.py : 下载预训练模型的脚本
ensemble.py : 融合的脚本
job1-job6 : 六个模型训练任务,其文件结构完全一致,各 job 之间主要差别在预训练设置上
注:job1在赛后额外补充了一些代码注释
jobN/pretrain.py 预训练脚本
jobN/finetune.py finetune脚本
jobN/data 数据预处理部分,包含 dataset、mask token 等
jobN/config 包含 pretrain 与 finetune 的一些超参配置
jobN/qqmodel/qq_uni_model.py 模型定义

简介

简要介绍的 ppt 请参考 Introduction.pdf

模型简介

多模态模型结构与参数量和 Bert-large 一致,
layer=24, hidden_size=1024, num_attention_heads=16。
其输入为[CLS] Video_frame [SEP] Video_title [SEP]。
frame_feature 通过 fc 降维为 1024 维,与 text 的 emb 拼接。
Input_emb -> TransformerEncoder * 24 -> Pooling -> Fc -> Video_emb

预训练

预训练采用了 Tag classify, Mask language model, Mask frame model 三个任务

(1) Video tag classify 任务
tag 为人工标注的视频标签,pointwise 和 pairwise 数据集合中提供。
和官方提供的 baseline 一致,我们采用了出现频率前1w 的tag 做多标签分类任务。
Bert 最后一层的 [CLS] -> fc 得到 tag 的预测标签,与真实标签计算 BCE loss

(2) Mask language model 任务
与常见的自然语言处理 mlm 预训练方法相同,对 text 随机 15% 进行 mask,预测 mask 词。
多模态场景下,结合视频的信息预测 mask 词,可以有效融合多模态信息。

(3) Mask frame model 任务
对 frame 的随机 15% 进行 mask,mask 采用了全 0 的向量填充。
考虑到 frame 为连续的向量,难以类似于 mlm 做分类任务。
借鉴了对比学习思路,希望 mask 的预测帧在整个 batch 内的所有帧范围内与被 mask 的帧尽可能相似。
采用了 Nce loss,最大化 mask 帧和预测帧的互信息

(4) 多任务联合训练
预训练任务的 loss 采用了上述三个任务 loss 的加权和,
L = L(tag) * 1250 / 3 + L(mlm) / 3.75 + L(mfm) / 9
tag 梯度量级比较小,因此乘以了较大的权重。
注:各任务合适的权重对下游 finetune 的效果影响比较大。

(5) 预训练 Setting
初始化:bert 初始化权重来自于在中文语料预训练过的开源模型 https://huggingface.co/hfl/chinese-roberta-wwm-ext-large
数据集:预训练使用了 pointwise 和 pairwise 集合,部分融合模型中加上了 test 集合(只有 mlm 和 mfm 任务)
超参:batch_size=128, epoch=40, learning_rate=5e-5, scheduler=warmup_with_cos_decay, warum_ratio=0.06
注:预训练更多的 epoch 对效果提升比较大,从10 epoch 提升至 20 epoch 对下游任务 finetune 效果提升显著。

Finetune

(1) 下游任务
视频 pair 分别通过 model 得到 256维 embedding,两个 embedding 的 cos 相似度与人工标注标签计算 mse

(2) Finetune header
实验中发现相似度任务中,使用 mean_pooling 或者 attention_pooling 聚合最后一层 emb 接 fc 层降维效果较好。

(3) Label normalize
评估指标为 spearman,考查预测值和实际值 rank 之间的相关性,因此对人工标注 label 做了 rank 归一化。
即 target = scipy.stats.rankdata(target, 'average')

(4) Finetune Setting
数据集:训练集使用了 pairwise 中 (id1%5!=0) | (id2%5 !=0) 的部分约 6.5w,验证集使用了(id1%5==0) & (id2%5==0) 的部分约 2.5k
超参:batch_size=32, epoch=10, learning_rate=1e-5, scheduler=warmup_with_cos_decay, warum_ratio=0.06

Ensemble

(1) 融合的方法
采用了 weighted concat -> svd 降维 方法进行融合。实验中发现这种方法降维效果折损较小。
concat_vec = [np.sqrt(w1) * emb1, np.sqrt(w2) * emb2, np.sqrt(w3) * emb3 ...]
svd_vec = SVD(concat_vec, 256)

(2) 融合的模型
最终的提交融合了六个模型。 模型都使用了 bert-large 这种结构,均为迭代过程中产出的模型,各模型之间只有微小的 diff,各个模型加权权重均为 1/6。
下面表格中列出了各模型的diff部分,验证集mse,验证集spearman

jobid ensemble-weight detail val-spearman val-mse
job1 1/6 base 0.886031 0.028813
job2 1/6 预训练tag分类任务为mean_pooling+fc 0.884257 0.029493
job3 1/6 预训练任务无 mfm 0.883843 0.029248
job4 1/6 预训练数据为 (point + pair)shuf-40epoch => pair-5epoch 0.885397 0.029059
job5 1/6 预训练数据为 (point-shuf => pair-shuf => test-shuf)-32epoch 0.885795 0.028866
job6 1/6 预训练 mlm/mfm mask概率调整为25% 0.886289 0.029039

(3) 单模型的效果与融合的效果
单模的测试集成绩约在 0.836
融合两个模型在 0.845
融合三个模型在 0.849
融合五个模型在 0.852

This is the face keypoint train code of project face-detection-project

face-key-point-pytorch 1. Data structure The structure of landmarks_jpg is like below: |--landmarks_jpg |----AFW |------AFW_134212_1_0.jpg |------AFW_

I‘m X 3 Nov 27, 2022
PAthological QUpath Obsession - QuPath and Python conversations

PAQUO: PAthological QUpath Obsession Welcome to paquo 👋 , a library for interacting with QuPath from Python. paquo's goal is to provide a pythonic in

Bayer AG 60 Dec 31, 2022
A pre-trained language model for social media text in Spanish

RoBERTuito A pre-trained language model for social media text in Spanish READ THE FULL PAPER Github Repository RoBERTuito is a pre-trained language mo

25 Dec 29, 2022
A research toolkit for particle swarm optimization in Python

PySwarms is an extensible research toolkit for particle swarm optimization (PSO) in Python. It is intended for swarm intelligence researchers, practit

Lj Miranda 1k Dec 30, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
Repository for GNSS-based position estimation using a Deep Neural Network

Code repository accompanying our work on 'Improving GNSS Positioning using Neural Network-based Corrections'. In this paper, we present a Deep Neural

32 Dec 13, 2022
Gym environments used in the paper: "Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV with Thrust Vectoring Rotors"

gym_multirotor Gym to train reinforcement learning agents on UAV platforms Quadrotor Tiltrotor Requirements This package has been tested on Ubuntu 18.

Aditya M. Deshpande 19 Dec 29, 2022
2021-MICCAI-Progressively Normalized Self-Attention Network for Video Polyp Segmentation

2021-MICCAI-Progressively Normalized Self-Attention Network for Video Polyp Segmentation Authors: Ge-Peng Ji*, Yu-Cheng Chou*, Deng-Ping Fan, Geng Che

Ge-Peng Ji (Daniel) 85 Dec 30, 2022
An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

PC-SOS-SDP: an Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering PC-SOS-SDP is an exact algorithm based on the branch-and-bound techn

Antonio M. Sudoso 1 Nov 13, 2022
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model About This repository contains the code to replicate the syn

Haruka Kiyohara 12 Dec 07, 2022
DaReCzech is a dataset for text relevance ranking in Czech

Dataset DaReCzech is a dataset for text relevance ranking in Czech. The dataset consists of more than 1.6M annotated query-documents pairs,

Seznam.cz a.s. 8 Jul 26, 2022
Supplementary code for SIGGRAPH 2021 paper: Discovering Diverse Athletic Jumping Strategies

SIGGRAPH 2021: Discovering Diverse Athletic Jumping Strategies project page paper demo video Prerequisites Important Notes We suspect there are bugs i

54 Dec 06, 2022
Relative Human dataset, CVPR 2022

Relative Human (RH) contains multi-person in-the-wild RGB images with rich human annotations, including: Depth layers (DLs): relative depth relationsh

Yu Sun 112 Dec 02, 2022
Framework for evaluating ANNS algorithms on billion scale datasets.

Billion-Scale ANN http://big-ann-benchmarks.com/ Install The only prerequisite is Python (tested with 3.6) and Docker. Works with newer versions of Py

Harsha Vardhan Simhadri 132 Dec 24, 2022
This project intends to use SVM supervised learning to determine whether or not an individual is diabetic given certain attributes.

Diabetes Prediction Using SVM I explore a diabetes prediction algorithm using a Diabetes dataset. Using a Support Vector Machine for my prediction alg

Jeff Shen 1 Jan 14, 2022
A rule-based log analyzer & filter

Flog 一个根据规则集来处理文本日志的工具。 前言 在日常开发过程中,由于缺乏必要的日志规范,导致很多人乱打一通,一个日志文件夹解压缩后往往有几十万行。 日志泛滥会导致信息密度骤减,给排查问题带来了不小的麻烦。 以前都是用grep之类的工具先挑选出有用的,再逐条进行排查,费时费力。在忍无可忍之后决

上山打老虎 9 Jun 23, 2022
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control

DeepFaceEditing: Deep Face Generation and Editing with Disentangled Geometry and Appearance Control One version of our system is implemented using the

260 Nov 28, 2022
Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Facebook Research 171 Nov 23, 2022