OSLO: Open Source framework for Large-scale transformer Optimization

Last update: Nov 24, 2022

Related tags

Deep Learning oslo

Overview

O S L O

Open Source framework for Large-scale transformer Optimization

What's New:

December 21, 2021 Released OSLO 1.0.

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

If the C++ is available:

CPP_AVAILABLE=1 pip install oslo-core

If the C++ is not available:

CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
Kernel Fusion: A GPU optimization method to increase training and inference speed.
DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments

[WIP] Implement ZeRO Stage 3 (FSDP)
Title

Implement ZeRO Stage 3 (FullyShardedDataParallel)

Description

[x] Add reduce_scatter_bucketer.py

[x] Add test_reduce_scatter_bucketer.py

[x] Add flatten_params_wrapper.py

[x] Add test_flatten_params_wrapper.py

[x] Add containers.py

[x] Add test_containers.py

[x] Add parallel.py

[x] Add test_parallel.py

[x] Add fsdp_optim_utils.py

[x] Update fsdp.py

[x] Add auto_wrap.py

[x] Add test_wrap.py
opened by jinok2im 9
FusedAdam & CPUAdam
Title

-FusedAdam & CPUAdam

Description

Implement FusedAdam & CPUAdam

Tasks

[x] Implement FusedAdam

[x] implement CPUAdam

[x] Test FusedAdam

[x] Test CPUAdam

[x] Test FusedSclaeMaskSoftmax (Name changed)
opened by cozytk 6
[WIP] Add data processing modules referring to the lassl
Title

add data processing modules referring to the lassl

Description

brought data processing functions that fit gpt2 with reference to lassl

Linked Issues

None
opened by gimmaru 6
Implementation of Sequential Parallelism
SP with DP implementation

Implemented SP wrapper with DP

Description

SequenceDataParallel works like native torch DDP with SP

you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
opened by ohwi 5
Update data collators and Add models
Title

Update data collators and Add models

Description

Updated data collators to utilize sequence parallel in Oslo trainer

Add models by referring to the transformers library
opened by gimmaru 3
Implement Expert Parallel and Test for Initialization and Forward Pass
Title

Implement Expert Parallel and Test for Initialization and Forward Pass

Description

Implement Wrapper, Modules and Features for Expert Parallel

Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace

Test initialization and forward pass for expert parallel
opened by scsc0511 3
Integrate Sequence Parallelism branches
Title

Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

Description

This PR is Integration of SP current version. But there is something wrong.

We will fix the bugs for the coming week and write test modules according to the SP design.

It did not include the contents of the branch that worked for the test.
opened by l-yohai 3
implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers
implement tp-3d wrapper

rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.

revise tp-3d layers for huggingface compatibility

implement tp-3d test codes

refactor all tp test codes

unify format across all tensor parallel modules.
opened by bzantium 2
Refactoring MultiheadAttention with todo anchors
Title

Refactoring MultiheadAttention with todo anchors

Description

Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.

Remove unnecessary or unintended code and clean up annotations.

Unify return format and the variable name with native torch.

Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

cc. @hyunwoongko @ohwi
opened by l-yohai 2
Add tp-1d layers testing
Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d

modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
opened by bzantium 2
[WIP] add test code of sp training
Title

SP Model Test Code

Description

Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
opened by l-yohai 2

Releases(v2.0.2)

v2.0.2(Aug 25, 2022)
Revert oslo to 1.1.2.

Source code(tar.gz)
Source code(zip)
v2.0.1(Feb 20, 2022)
Merge changes from functorch upstream.

Fix documents and tutorials

Source code(tar.gz)
Source code(zip)
v2.0.0(Feb 14, 2022)
Official release of OSLO 2.0.0 🎉🎉

This version of OSLO provides the following features:

Tensor model parallelism

Efficient activation checkpointing

Kernel fusion

We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.

New feature: Kernel Fusion

{ "kernel_fusion": { "enable": "bool", "memory_efficient_fusion": "bool", "custom_cuda_kernels": "list" } }

For more information, please check the kernel fusion tutorial
Source code(tar.gz)
Source code(zip)
v2.0.0a2(Feb 2, 2022)

Quick fix of cuda rng state tracker
Source code(tar.gz)
Source code(zip)

v2.0.0a1(Feb 2, 2022)

Add activation checkpointing

You can use efficient activation checkpointing using OSLO with the following configuration.

model = oslo.initialize(
    model,
    config={
        "model_parallelism": {
            "enable": True,
            "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
        },
        "activation_checkpointing": {
            "enable": True,
            "cpu_checkpointing": True,
            "partitioned_checkpointing": True,
            "contiguous_checkpointing": True,
        },
    },
)

Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

Source code(tar.gz)
Source code(zip)

v2.0.0a0(Jan 30, 2022)
New API

We paid homage to DeepSpeed. Now it's easier and simpler to use.

import oslo model = oslo.initialize(model, config="oslo-config.json")

Add new models

Albert

Bert

Bart

T5

GPT2

GPTNeo

GPTJ

Electra

Roberta

Add document

https://tunib-ai.github.io/oslo

Remove old pipeline parallelism, kernel fusion code

We'll refurbish them using the latest methods

Kernel fusion: AOTAutograd

Pipeline parallelism: Sagemaker PP

Source code(tar.gz)
Source code(zip)
v.1.1.2(Jan 15, 2022)
Updates

[#7] Selective Kernel Fusion [#9] Fix argument bug

New Feature: Selective Kernel Fusion

Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

from oslo import GPT2MLP, GPT2Attention # MLP only fusion model.fuse([GPT2MLP]) # Attention only fusion model.fuse([GPT2Attention]) # MLP + Attention fusion model.fuse([GPT2MLP, GPT2Attention])
Source code(tar.gz)
Source code(zip)

v1.1(Dec 29, 2021)

[#3] Add deployment launcher of Parallelformers into OSLO.

from oslo import GPTNeoForCausalLM

model = GPTNeoForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-neo-2.7B",
    tensor_parallel_size=2,
    pipeline_parallel_size=2,
    deployment=True  # <-- new feature !
)

You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

Source code(tar.gz)
Source code(zip)

v1.0.1(Dec 22, 2021)
Quick Fix

Support Megatron-LM style (.jsonl) file preprecessing.

Source code(tar.gz)
Source code(zip)
v1.0(Dec 21, 2021)
O S L O

Open Source framework for Large-scale transformer Optimization

What's New:

December 21, 2021 Released OSLO 1.0.

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

If the C++ is available:

CPP_AVAILABLE=1 pip install oslo-core

If the C++ is not available:

CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed from oslo import GPTJForCausalLM # 1. 3D Parallelism model = GPTJForCausalLM.from_pretrained_with_parallel( "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2, ) # 2. Kernel Fusion model = model.fuse() # 3. DeepSpeed Support engines = deepspeed.initialize( model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ..., ) # 4. Data Processing from oslo import ( DatasetPreprocessor, DatasetBlender, DatasetForCausalLM, ... )

OSLO offers the following features.

3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.

Kernel Fusion: A GPU optimization method to increase training and inference speed.

DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.

Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo, author = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong}, title = {OSLO: Open Source framework for Large-scale transformer Optimization}, howpublished = {\url{https://github.com/tunib-ai/oslo}}, year = {2021}, }

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).
Source code(tar.gz)
Source code(zip)

Owner

TUNiB

TUNiB Inc.

GitHub Repository

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

14 Sep 16, 2022

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

ACCENTOR: Adding Chit-Chat to Enhance Task-Oriented Dialogues Overview ACCENTOR consists of the human-annotated chit-chat additions to the 23.8K dialo

69 Dec 29, 2022

A really easy-to-use and powerful sudoku solver.

SodukuSolver This is a really useful sudoku solver with a Qt gui. USAGE Enter the numbers in and click "RUN"! If you don't want to wait, simply press

11 Jun 02, 2022

Testbed of AI Systems Quality Management

qunomon Description A testbed for testing and managing AI system qualities. Demo Sorry. Not deployment public server at alpha version. Requirement Ins

15 Nov 27, 2021

Auto HMM: Automatic Discrete and Continous HMM including Model selection

29 Dec 07, 2022

Official PyTorch implementation of PS-KD

Self-Knowledge Distillation with Progressive Refinement of Targets (PS-KD) Accepted at ICCV 2021, oral presentation Official PyTorch implementation of

61 Dec 28, 2022

A simple configurable bot for sending arXiv article alert by mail

arXiv-newsletter A simple configurable bot for sending arXiv article alert by mail. Prerequisites PyYAML=5.3.1 arxiv=1.4.0 Configuration All config

21 Nov 09, 2022

Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search

Breaking the Curse of Space Explosion: Towards Effcient NAS with Curriculum Search Pytorch implementation for "Breaking the Curse of Space Explosion:

17 Jan 03, 2023

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

Taxonomizing local versus global structure in neural network loss landscapes Int

8 Dec 30, 2022

clustimage is a python package for unsupervised clustering of images.

clustimage The aim of clustimage is to detect natural groups or clusters of images. Image recognition is a computer vision task for identifying and ve

52 Jan 02, 2023

This repo is the official implementation for Multi-Scale Adaptive Graph Neural Network for Multivariate Time Series Forecasting

1 MAGNN This repo is the official implementation for Multi-Scale Adaptive Graph Neural Network for Multivariate Time Series Forecasting. 1.1 The frame

12 Nov 08, 2022

Cancer-and-Tumor-Detection-Using-Inception-model - In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks, specifically here the Inception model by google.

Cancer-and-Tumor-Detection-Using-Inception-model In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks

1 Jan 01, 2022

OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Overview

O S L O

What's New:

What is OSLO about?

Installation

Key Features

Administrative Notes

Citing OSLO

Licensing

Acknowledgements

Comments

Title

Description

Title

Description

Tasks

Title

Description

Linked Issues

SP with DP implementation

Description

Title

Description

Title

Description

Title

Description

Title

Description

Title

Description

Releases(v2.0.2)

v2.0.2(Aug 25, 2022)

v2.0.1(Feb 20, 2022)

v2.0.0(Feb 14, 2022)

Official release of OSLO 2.0.0 🎉🎉

New feature: Kernel Fusion

v2.0.0a2(Feb 2, 2022)

v2.0.0a1(Feb 2, 2022)

Add activation checkpointing

v2.0.0a0(Jan 30, 2022)

New API

Add new models

Add document

Remove old pipeline parallelism, kernel fusion code

v.1.1.2(Jan 15, 2022)

Updates

New Feature: Selective Kernel Fusion

v1.1(Dec 29, 2021)

v1.0.1(Dec 22, 2021)

v1.0(Dec 21, 2021)

O S L O

What's New:

What is OSLO about?

Installation

Key Features

Administrative Notes

Citing OSLO

Licensing

Acknowledgements

Owner

TUNiB

Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Data & Code for ACCENTOR Adding Chit-Chat to Enhance Task-Oriented Dialogues

A really easy-to-use and powerful sudoku solver.

Testbed of AI Systems Quality Management

Auto HMM: Automatic Discrete and Continous HMM including Model selection

Official PyTorch implementation of PS-KD

A simple configurable bot for sending arXiv article alert by mail

Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search

Losslandscapetaxonomy - Taxonomizing local versus global structure in neural network loss landscapes

clustimage is a python package for unsupervised clustering of images.

This repo is the official implementation for Multi-Scale Adaptive Graph Neural Network for Multivariate Time Series Forecasting

PyTorch implementation of Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose

Dogs classification with Deep Metric Learning using some popular losses

Posterior predictive distributions quantify uncertainties ignored by point estimates.

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

a delightful machine learning tool that allows you to train, test and use models without writing code