ColossalAI-Benchmark - Performance benchmarking with ColossalAI

Overview

Benchmark for Tuning Accuracy and Efficiency

Overview

The benchmark includes our efforts in using Colossal-AI to train different tasks to achieve SOTA results. We are interested in both validataion accuracy and training speed, and prefer larger batch size to take advantage of more GPU devices. For example, we trained vision transformer with batch size 512 on CIFAR10 and 4096 on ImageNet1k, which are basically not used in existing works. Some of the results in the benchmark trained with 8x A100 are shown below.

Task Model Training Time Top-1 Accuracy
CIFAR10 ViT-Lite-7/4 ~ 16 min ~ 90.5%
ImageNet1k ViT-S/16 ~ 16.5 h ~ 74.5%

The train.py script in each task runs training with the specific configuration script in configs/ for different parallelisms. Supported parallelisms include data parallel only (ends with vanilla), 1D (ends with 1d), 2D (ends with 2d), 2.5D (ends with 2p5d), 3D (ends with 3d).

Each configuration scripts basically includes the following elements, taking ImageNet1k task as example:

TOTAL_BATCH_SIZE = 4096
LEARNING_RATE = 3e-3
WEIGHT_DECAY = 0.3

NUM_EPOCHS = 300
WARMUP_EPOCHS = 32

# data parallel only
TENSOR_PARALLEL_SIZE = 1    
TENSOR_PARALLEL_MODE = None

# parallelism setting
parallel = dict(
    pipeline=1,
    tensor=dict(mode=TENSOR_PARALLEL_MODE, size=TENSOR_PARALLEL_SIZE),
)

fp16 = dict(mode=AMP_TYPE.TORCH, ) # amp setting

gradient_accumulation = 2 # accumulate 2 steps for gradient update

BATCH_SIZE = TOTAL_BATCH_SIZE // gradient_accumulation # actual batch size for dataloader

clip_grad_norm = 1.0 # clip gradient with norm 1.0

Upper case elements are basically what train.py needs, and lower case elements are what Colossal-AI needs to initialize the training.

Usage

To start training, use the following command to run each worker:

$ DATA=/path/to/dataset python train.py --world_size=WORLD_SIZE \
                                        --rank=RANK \
                                        --local_rank=LOCAL_RANK \
                                        --host=MASTER_IP_ADDRESS \
                                        --port=MASTER_PORT \
                                        --config=CONFIG_FILE

It is also recommended to start training with torchrun as:

$ DATA=/path/to/dataset torchrun --nproc_per_node=NUM_GPUS_PER_NODE \
                                 --nnodes=NUM_NODES \
                                 --node_rank=NODE_RANK \
                                 --master_addr=MASTER_IP_ADDRESS \
                                 --master_port=MASTER_PORT \
                                 train.py --config=CONFIG_FILE
Owner
HPC-AI Tech
We are a global team to help you train and deploy your AI models
HPC-AI Tech
Feedback is important: response-aware feedback mechanism for background based conversation

RFM The code for the paper: "Feedback is important: response-aware feedback mechanism for background based conversation." Requirements python 3.7 pyto

Jiatao Chen 2 Sep 29, 2022
Time should be taken seer-iously

TimeSeers seers - (Noun) plural form of seer - A person who foretells future events by or as if by supernatural means TimeSeers is an hierarchical Bay

279 Dec 26, 2022
Yolov5 + Deep Sort with PyTorch

딥소트 수정중 Yolov5 + Deep Sort with PyTorch Introduction This repository contains a two-stage-tracker. The detections generated by YOLOv5, a family of obj

1 Nov 26, 2021
Pytorch implementation of DeepMind's differentiable neural computer paper.

DNC pytorch This is a Pytorch implementation of DeepMind's Differentiable Neural Computer (DNC) architecture introduced in their recent Nature paper:

Yuanpu Xie 91 Nov 21, 2022
Implements VQGAN+CLIP for image and video generation, and style transfers, based on text and image prompts. Emphasis on ease-of-use, documentation, and smooth video creation.

VQGAN-CLIP-GENERATOR Overview This is a package (with available notebook) for running VQGAN+CLIP locally, with a focus on ease of use, good documentat

Ryan Hamilton 98 Dec 30, 2022
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

40 Dec 29, 2022
This repo is about implementing different approaches of pose estimation and also is a sub-task of the smart hospital bed project :smile:

Pose-Estimation This repo is a sub-task of the smart hospital bed project which is about implementing the task of pose estimation 😄 Many thanks to th

Max 11 Oct 17, 2022
This repo holds codes of the ICCV21 paper: Visual Alignment Constraint for Continuous Sign Language Recognition.

VAC_CSLR This repo holds codes of the paper: Visual Alignment Constraint for Continuous Sign Language Recognition.(ICCV 2021) [paper] Prerequisites Th

Yuecong Min 64 Dec 19, 2022
This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

The-Emergence-of-Objectness This is the official released code for our paper, The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos

44 Oct 08, 2022
The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data This repository provides the implementation details for

124 Dec 27, 2022
A high performance implementation of HDBSCAN clustering.

HDBSCAN HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates

2.3k Jan 02, 2023
An AI Assistant More Than a Toolkit

tymon An AI Assistant More Than a Toolkit The reason for creating framework tymon is simple. making AI more like an assistant, helping us to complete

TymonXie 46 Oct 24, 2022
UFPR-ADMR-v2 Dataset

UFPR-ADMR-v2 Dataset The UFPR-ADMRv2 dataset contains 5,000 dial meter images obtained on-site by employees of the Energy Company of Paraná (Copel), w

Gabriel Salomon 8 Sep 29, 2022
Tensorflow implementation of DeepLabv2

TF-deeplab This is a Tensorflow implementation of DeepLab, compatible with Tensorflow 1.2.1. Currently it supports both training and testing the ResNe

Chenxi Liu 21 Sep 27, 2022
Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

Neema Kotonya 42 Nov 17, 2022
PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

1-bit Wide ResNet PyTorch implementation of training 1-bit Wide ResNets from this paper: Training wide residual networks for deployment using a single

Sergey Zagoruyko 122 Dec 07, 2022
Get started learning C# with C# notebooks powered by .NET Interactive and VS Code.

.NET Interactive Notebooks for C# Welcome to the home of .NET interactive notebooks for C#! How to Install Download the .NET Coding Pack for VS Code f

.NET Platform 425 Dec 25, 2022
An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

SafeGraph 99 Dec 11, 2022
A PyTorch implementation of the architecture of Mask RCNN

EDIT (AS OF 4th NOVEMBER 2019): This implementation has multiple errors and as of the date 4th, November 2019 is insufficient to be utilized as a reso

Sai Himal Allu 975 Dec 30, 2022
Weakly Supervised Learning of Rigid 3D Scene Flow

Weakly Supervised Learning of Rigid 3D Scene Flow This repository provides code and data to train and evaluate a weakly supervised method for rigid 3D

Zan Gojcic 124 Dec 27, 2022