Multi-Modal Machine Learning toolkit based on PyTorch.

Last update: Jan 05, 2022

Related tags

Deep Learning TorchMM

Overview

简体中文 | English

TorchMM

简介

多模态学习工具包 TorchMM 旨在于提供模态联合学习和跨模态学习算法模型库，为处理图片文本等多模态数据提供高效的解决方案，助力多模态学习应用落地。

近期更新

2022.1.5 发布 TorchMM 初始版本 v1.0

特性

丰富的任务场景：工具包提供多模态融合、跨模态检索、图文生成等多种多模态学习任务算法模型库，支持用户自定义数据和训练。
成功的落地实践：基于工具包算法已有相关落地应用，如球鞋真伪鉴定、球鞋风格迁移、家具图片自动描述、舆情监控等。

应用展示

球鞋真伪鉴定

更多信息欢迎访问我们的网站 Ysneaker ！

框架

TorchMM 包括以下模块：

数据处理：提供统一的数据接口和多种数据处理格式
模型库：包括多模态融合、跨模态检索、图文生成、多任务算法
训练器：对每种任务设置统一的训练流程和相关指标计算

使用

下载工具包

git clone https://github.com/njustkmg/TorchMM.git

使用示例：

from torchmm import TorchMM

# config: Model running parameters, see configs/
# data_root: Path to dataset
# image_root: Path to images
# gpu: Which gpu to use

runner = PaddleMM(config='configs/cmml.yml',
                  data_root='data/COCO', 
                  image_root='data/COCO/images', 
                  cuda=0)

或者

python run.py --config configs/cmml.yml --data_root data/COCO --image_root data/COCO/images --cuda 0

模型库 (更新中)

[1] Comprehensive Semi-Supervised Multi-Modal Learning

[2] Stacked Cross Attention for Image-Text Matching

[4] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

[5] Attention on Attention for Image Captioning

[6] VQA: Visual Question Answering

[7] ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

实验结果

多模态融合

	Average_Precision	Coverage	Example_AUC	Macro_AUC	Micro_AUC	Ranking_loss
CMML	0.682	18.827	0.948	0.927	0.950	0.052
Early(add)							ResNet+LSTM
Early(concat)							ResNet+GRU

许可证书

本项目的发布受 Apache 2.0 license 许可认证。

Multi-Modal Machine Learning toolkit based on PyTorch.

Related tags

Overview

TorchMM

简介

近期更新

特性

应用展示

框架

使用

模型库 (更新中)

实验结果

许可证书

Owner

njustkmg

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Implementation of the Remixer Block from the Remixer paper, in Pytorch

The "breathing k-means" algorithm with datasets and example notebooks

MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;

Visyerres sgdf woob - Modules Woob pour l'intranet et autres sites Scouts et Guides de France

A Streamlit component to render ECharts.

This code is a near-infrared spectrum modeling method based on PCA and pls

一个目标检测的通用框架(不需要cuda编译)，支持Yolo全系列(v2~v5)、EfficientDet、RetinaNet、Cascade-RCNN等SOTA网络。

A full-fledged version of Pix2Seq

EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit

Implementation of "Scaled-YOLOv4: Scaling Cross Stage Partial Network" using PyTorch framwork.

Self-Supervised CNN-GCN Autoencoder

Certified Patch Robustness via Smoothed Vision Transformers

pytorch implementation of trDesign

Supercharging Imbalanced Data Learning WithCausal Representation Transfer

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

PyTorch implementation of SwAV (Swapping Assignments between Views)

Implementation of the paper "Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning"

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR