Single machine, multiple cards training; mix-precision training; DALI data loader.

Overview

Template

Script Category Description

Category script
comparison script train.py, loader.py
for single-machine-multiple-cards training train_DP.py, train_DDP.py
for mixed-precision training train_amp.py
for DALI data loading loader_DALI.py

Note: The comment # new # in script represents newly added code block (compare to comparison script, e.g., train.py)

Environment

  • CPU: Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
  • GPU: RTX 2080Ti
  • OS: Ubuntu 18.04.3 LTS
  • DL framework: Pytorch 1.6.0, Torchvision 0.7.0

Single-machine-multiple-cards training (two cards for example)

train_DP.py -- Parallel computing using nn.DataParallel

Usage:

cd Template/src
python train_DP.py

Superiority:
- Easy to use
- Accelerate training (inconspicuous)
Weakness:
- Unbalanced load
Description:
DataParallel is very convenient to use, we just need to use DataParallel to package the model:

model = ...
model = nn.DataParallel(model)

train_DDP.py -- Parallel computing using torch.distributed

Usage:

cd Template/src
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train_DDP.py

Superiority:
- balanced load
- Accelerate training (conspicuous)
Weakness:
- Hard to use
Description:
Unlike DataParallel who control multiple GPUs via single-process, distributed creates multiple process. we just need to accomplish one code and torch will automatically assign it to n processes, each running on corresponding GPU.
To config distributed model via torch.distributed, the following steps needed to be performed:

  1. Get current process index:
parser = argparse.ArgumentParser()
parser.add_argument('--local_rank', default=-1, type=int, help='node rank for distributed training')
opt = parser.parse_args()
# print(opt.local_rank)
  1. Set the backend and port used for communication between GPUs:
dist.init_process_group(backend='nccl')
  1. Config current device according to the local_rank:
torch.cuda.set_device(opt.local_rank)
  1. Config data sampler:
dataset = ...
sampler = distributed.DistributedSampler(dataset)
dataloader = DataLoader(dataset=dataset, ..., sampler=sampler)
  1. Package the model:
model = ...
model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = nn.parallel.DistributedDataParallel(model.cuda(), device_ids=[opt.local_rank])

Mixed-precision training

train_amp.py -- Mixed-precision training using torch.cuda.amp

Usage:

cd Template/src
python train_amp.py

Superiority:
- Easy to use
- Accelerate training (conspicuous for heavy model)
Weakness:
- Accelerate training (inconspicuous for light model)
Description:
Mixed-precision training is a set of techniques that allows us to use fp16 without causing our model training to diverge.
To config mixed-precision training via torch.cuda.amp, the following steps needed to be performed:

  1. Instantiate GradScaler object:
scaler = torch.cuda.amp.GradScaler()
  1. Modify the traditional optimization process:
# Before:
optimizer.zero_grad()
preds = model(imgs)
loss = loss_func(preds, labels)
loss.backward()
optimizer.step()

# After:
optimizer.zero_grad()
with torch.cuda.amp.autocast():
    preds = model(imgs)
    loss = loss_func(preds, labels)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

DALI data loading

loader_DALI.py -- Data loading using nvidia.dali

Prerequisite:
- NVIDIA Driver supporting CUDA 10.0 or later (i.e., 410.48 or later driver releases)
- PyTorch 0.4 or later
- Data organization format that matches the code, the format that matches the loader_DALI.py is as follows:
 /dataset / train or test / img or gt / sub_dirs / imgs [View]
Usage:

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda102
cd Template/src
python loader_DALI.py --data_source /path/to/dataset

Superiority:
- Easy to use
- Accelerate data loading
Weakness:
- Occupy video memory
Description:
NVIDIA Data Loading Library (DALI) is a collection of highly optimized building blocks and an execution engine that accelerates the data pipeline for computer vision and audio deep learning applications.
To load dataset using DALI, the following steps needed to be performed:

  1. Config external input iterator:
eii = ExternalInputIterator(data_source=opt.data_source, batch_size=opt.batch_size, shuffle=True)
# A demo of external input iterator
class ExternalInputIterator(object):
    def __init__(self, data_source, batch_size, shuffle):
        self.batch_size = batch_size
        
        img_paths = sorted(glob.glob(data_source + '/train' + '/blurry' + '/*/*.*'))
        gt_paths = sorted(glob.glob(data_source + '/train' + '/sharp' + '/*/*.*'))
        self.paths = list(zip(*(img_paths,gt_paths)))
        if shuffle:
            random.shuffle(self.paths)

    def __iter__(self):
        self.i = 0
        return self

    def __next__(self):
        imgs = []
        gts = []

        if self.i >= len(self.paths):
            self.__iter__()
            raise StopIteration

        for _ in range(self.batch_size):
            img_path, gt_path = self.paths[self.i % len(self.paths)]
            imgs.append(np.fromfile(img_path, dtype = np.uint8))
            gts.append(np.fromfile(gt_path, dtype = np.uint8))
            self.i += 1
        return (imgs, gts)

    def __len__(self):
        return len(self.paths)

    next = __next__
  1. Config pipeline:
pipe = externalSourcePipeline(batch_size=opt.batch_size, num_threads=opt.num_workers, device_id=0, seed=opt.seed, external_data = eii, resize=opt.resize, crop=opt.crop)
# A demo of pipeline
@pipeline_def
def externalSourcePipeline(external_data, resize, crop):
    imgs, gts = fn.external_source(source=external_data, num_outputs=2)
    
    crop_pos = (fn.random.uniform(range=(0., 1.)), fn.random.uniform(range=(0., 1.)))
    flip_p = (fn.random.coin_flip(), fn.random.coin_flip())
    
    imgs = transform(imgs, resize, crop, crop_pos, flip_p)
    gts = transform(gts, resize, crop, crop_pos, flip_p)
    return imgs, gts

def transform(imgs, resize, crop, crop_pos, flip_p):
    imgs = fn.decoders.image(imgs, device='mixed')
    imgs = fn.resize(imgs, resize_y=resize)
    imgs = fn.crop(imgs, crop=(crop,crop), crop_pos_x=crop_pos[0], crop_pos_y=crop_pos[1])
    imgs = fn.flip(imgs, horizontal=flip_p[0], vertical=flip_p[1])
    imgs = fn.transpose(imgs, perm=[2, 0, 1])
    imgs = imgs/127.5-1
    
    return imgs
  1. Instantiate DALIGenericIterator object:
dgi = DALIGenericIterator(pipe, output_map=["imgs", "gts"], last_batch_padded=True, last_batch_policy=LastBatchPolicy.PARTIAL, auto_reset=True)
  1. Read data:
for i, data in enumerate(dgi):
    imgs = data[0]['imgs']
    gts = data[0]['gts']
A lightweight, hub-and-spoke dashboard for multi-account Data Science projects

A lightweight, hub-and-spoke dashboard for cross-account Data Science Projects Introduction Modern Data Science environments often involve many indepe

AWS Samples 3 Oct 30, 2021
Integrate bus data from a variety of sources (batch processing and real time processing).

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and r

1 Nov 25, 2021
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Cloudera 759 Jan 07, 2023
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Jan 02, 2023
📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

Web Trader Web Trader is a trading website that consolidates data from Nasdaq, allowing the user to search up the ticker symbol and price of any stock

Paulina Khew 21 Aug 30, 2022
A tool to compare differences between dataframes and create a differences report in Excel

similarpanda A module to check for differences between pandas Dataframes, and generate a report in Excel format. This is helpful in a workplace settin

Andre Pretorius 9 Sep 15, 2022
bigdata_analyse 大数据分析项目

bigdata_analyse 大数据分析项目 wish 采用不同的技术栈,通过对不同行业的数据集进行分析,期望达到以下目标: 了解不同领域的业务分析指标 深化数据处理、数据分析、数据可视化能力 增加大数据批处理、流处理的实践经验 增加数据挖掘的实践经验

Way 2.4k Dec 30, 2022
t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology.

tree-SNE t-SNE and hierarchical clustering are popular methods of exploratory data analysis, particularly in biology. Building on recent advances in s

Isaac Robinson 61 Nov 21, 2022
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como código Utuilizando um cluster Kubernetes na Azure Ingestão

Otacilio Filho 4 Jan 23, 2022
Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.

Amber Electric Usage Summary This is a command line tool that produces a summary CSV report of an Amber Electric customer's energy consumption and cos

Graham Lea 12 May 26, 2022
Binance Kline Data With Python

Binance Kline Data by seunghan(gingerthorp) reference https://github.com/binance/binance-public-data/ All intervals are supported: 1m, 3m, 5m, 15m, 30

shquant 5 Jul 13, 2022
Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

Alexandr Savinov 14 Sep 15, 2022
Fitting thermodynamic models with pycalphad

ESPEI ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for thermodynamic database development within the CALPHAD method

Phases Research Lab 42 Sep 12, 2022
Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format

Finds, downloads, parses, and standardizes public bikeshare data into a standard pandas dataframe format.

Brady Law 2 Dec 01, 2021
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

Mobeen Ahmed 1 Nov 01, 2021
Stochastic Gradient Trees implementation in Python

Stochastic Gradient Trees - Python Stochastic Gradient Trees1 by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on th

John Koumentis 2 Nov 18, 2022
DaCe is a parallel programming framework that takes code in Python/NumPy and other programming languages

aCe - Data-Centric Parallel Programming Decoupling domain science from performance optimization. DaCe is a parallel programming framework that takes c

SPCL 330 Dec 30, 2022
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample c

Kevin Schwarzwald 42 Nov 09, 2022
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dy

2 Nov 08, 2021
Projects that implement various aspects of Data Engineering.

DATAWAREHOUSE ON AWS The purpose of this project is to build a datawarehouse to accomodate data of active user activity for music streaming applicatio

2 Oct 14, 2021