OneFlow is a performance-centered and open-source deep learning framework.

Last update: Jan 07, 2023

Overview

OneFlow

OneFlow is a performance-centered and open-source deep learning framework.

Latest News

Version 0.5.0 is out!
- First class support for eager execution. The deprecated APIs are moved to oneflow.compatible.single_client
- Drop-in replacement of import torch for existing Pytorch projects. You could test it by inter-changing import oneflow as torch and import torch as flow.
- Full changelog

Install OneFlow

System Requirements

Python 3.6, 3.7, 3.8, 3.9

(Highly recommended) Upgrade pip

python3 -m pip install --upgrade pip #--user

CUDA Toolkit Linux x86_64 Driver
- CUDA runtime is statically linked into OneFlow. OneFlow will work on a minimum supported driver, and any driver beyond. For more information, please refer to CUDA compatibility documentation.
- Please upgrade your Nvidia driver to version 440.33 or above and install OneFlow for CUDA 10.2 if possible.

Install with Pip Package

To install latest stable release of OneFlow with CUDA support:

python3 -m pip install -f https://release.oneflow.info oneflow==0.5.0+cu102

To install nightly release of OneFlow with CUDA support:

python3 -m pip install oneflow -f https://staging.oneflow.info/branch/master/cu102

To install other available builds for different variants:

Stable

python3 -m pip install --find-links https://release.oneflow.info oneflow==0.5.0+[PLATFORM]

Nightly

python3 -m pip install oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]

All available [PLATFORM]:

Platform	CUDA Driver Version	Supported GPUs
cu112	>= 450.80.02	GTX 10xx, RTX 20xx, A100, RTX 30xx
cu111	>= 450.80.02	GTX 10xx, RTX 20xx, A100, RTX 30xx
cu110, cu110_xla	>= 450.36.06	GTX 10xx, RTX 20xx, A100
cu102, cu102_xla	>= 440.33	GTX 10xx, RTX 20xx
cu101, cu101_xla	>= 418.39	GTX 10xx, RTX 20xx
cu100, cu100_xla	>= 410.48	GTX 10xx, RTX 20xx
cpu	N/A	N/A

If you are in China, you could run this to have pip download packages from domestic mirror of pypi:
```
python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
```
For more information on this, please refer to pypi 镜像使用帮助

Use docker image

docker pull oneflowinc/oneflow:nightly-cuda10.2
docker pull oneflowinc/oneflow:nightly-cuda11.1

Build from Source

Clone Source Code

Option 1: Clone source code from GitHub

git clone https://github.com/Oneflow-Inc/oneflow --depth=1

Option 2: Download from Aliyun

If you are in China, please download OneFlow source code from: https://oneflow-public.oss-cn-beijing.aliyuncs.com/oneflow-src.zip
```
curl https://oneflow-public.oss-cn-beijing.aliyuncs.com/oneflow-src.zip -o oneflow-src.zip
unzip oneflow-src.zip
```

Build OneFlow

Option 1: Build with Conda (recommended)

Please refer to this repo
Option 2: Build in docker container (recommended)
- Pull a docker image:
```
docker pull oneflowinc/oneflow-manylinux2014-cuda10.2:0.1
```
  All images available : https://hub.docker.com/u/oneflowinc
- In the root directory of OneFlow source code, run:
```
python3 docker/package/manylinux/build_wheel.py --python_version=3.6
```
  This should produce .whl files in the directory wheelhouse
- If you are in China, you might need to add these flags:
```
--use_tuna --use_system_proxy --use_aliyun_mirror
```
- You can choose CUDA/Python versions of wheel by adding:
```
--cuda_version=10.1 --python_version=3.6,3.7
```
- For more useful flags, plese run the script with flag --help or refer to the source code of the script.
Option 3: Build on bare metal
- Install dependencies
  - on Ubuntu 20.04, run:
```
sudo apt install -y libopenblas-dev nasm g++ gcc python3-pip cmake autoconf libtool
```
  - on macOS, run:
```
brew install nasm
```
- In the root directory of OneFlow source code, run:
```
mkdir build
cd build
```
- Config the project, inside build directory:
  - If you are in China
    
    run this to config for CUDA:
```
cmake .. -C ../cmake/caches/cn/cuda.cmake
```
    run this to config for CPU-only:
```
cmake .. -C ../cmake/caches/cn/cpu.cmake
```
  - If you are not in China
    
    run this to config for CUDA:
```
cmake .. -C ../cmake/caches/international/cuda.cmake
```
    run this to config for CPU-only:
```
cmake .. -C ../cmake/caches/international/cpu.cmake
```
- Build the project, inside build directory, run:
```
make -j$(nproc)
```
- Add oneflow to your PYTHONPATH, inside build directory, run:
```
source source.sh
```
  Please note that this change is not permanent.
- Simple validation
```
python3 -m oneflow --doctor
```

Troubleshooting

Please refer to troubleshooting for common issues you might encounter when compiling and running OneFlow.

Advanced features

XRT

You can check this doc to obtain more details about how to use XLA and TensorRT with OneFlow.

Getting Started

3 minutes to run MNIST.

Clone the demo code from OneFlow documentation

git clone https://github.com/Oneflow-Inc/oneflow-documentation.git
cd oneflow-documentation/cn/docs/single_client/code/quick_start/

Run it in Python
```
python mlp_mnist.py
```

Oneflow is running and you got the training loss

More info on this demo, please refer to doc on quick start.

Documentation

Model Zoo and Benchmark

Communication

GitHub issues: any install, bug, feature issues.
www.oneflow.org: brand related information.
中文
- QQ 群: 331883
- 微信号（加好友入交流群）: OneFlowXZS
- 知乎
International
- Discord
- Twitter
- LinkedIn
- Medium

The Team

OneFlow was originally developed by OneFlow Inc and Zhejiang Lab.

License

Apache License 2.0

Comments

source op support s and fixed generator bug
这个PR的目的

[x] randperm op支持 S0、添加单测

[x] 处理 random op 支持s 过程中，各个rank间local tensor 总是相等的bug 相关背景记录在https://github.com/Oneflow-Inc/oneflow/pull/7434#issuecomment-1033306931

random op 支持 global tensor 一致性

在处理 randint op 和 rand op 支持B/S 保持global tensor 的一致性所采取的方案是利用 GetOpKernelRandomSeed(ctx)这个工具函数进行设计，当op 支持 S时不同rank 间调用GetOpKernelRandomSeed(ctx) 返回一个不同的seed，再通过generator->set_current_seed(ctx->Attr<int64_t>("seed") + GetOpKernelRandomSeed(ctx)) 就可以为每个rank 设计不同的seed,这样能保证uniform 类的kernel 经过S 生成同分布不同数值的local tensor ，当op支持B时每个rank 上 kernel 调用GetOpKernelRandomSeed(ctx) 时会生成相同的seed ,再通过generator->set_current_seed(ctx->Attr<int64_t>("seed") + GetOpKernelRandomSeed(ctx)) 就保证了每个rank 都拿到了相同的seed,这样就可以保持global tensor 的一致性

在处理 randperm op 和 arange op 支持 B/S 时保持 global tensor 的一致性，目前打算采用的处理方案是让多个rank 公用seed 然后在先在每个rank上生成完整的tensor再根据 infer physic shape信息利GetTensorSliceView4ParallelId(parallel_hierarchy, nd_sbp, logical_shape, parallel_id) 这个工具函数，获得本rank_id 和 physic shape所对应的tensor 上的索引信息，再把对应的位置的数据拷贝到本rank 的local tensor 上

以上方案是通过与xiaoyu，yinggang开会总结出来的

fixed: https://github.com/Oneflow-Inc/OneTeam/issues/1167
enhancement automerge op test graph global
opened by grybd 82
Dev non-contiguous view ops
从https://github.com/Oneflow-Inc/oneflow/tree/dev_contiguous_view_ops 分支剥离出的pr，完成以下功能： 1.ods注册op时支持添加SupportNonContiguous属性，标识是否支持non-contiguous的输入tensor，不支持，则会在interpreter处统一进行tensor->contiguous()操作 2.~~导出接口flow._oneflow_internal.has_same_tensor_storage用于检查原tensor和view tensor是否共享storage~~ 3.支持以下none-contiguous view ops：

[x] transpose

[x] permute

[x] narrow

[x] expand/expand_as

[x] split

[x] chunk

[x] unfold_tensor

[x] movedim

[x] as_strided

[x] select

[x] swapaxes

[x] T/t

[x] hsplit/vsplit/tensor_split

[ ] ~~TODO（再其他pr中完成）：slice/slice_update~~

enhancement automerge eager op api
opened by Flowingsun007 67
Fix fill_
解决 https://github.com/Oneflow-Inc/oneflow/issues/8278 提出的 oneflow.Tensor.fill_ 速度慢问题。

实现 fill_ kernel 使用了两种写法：

如果 value 为 Scalar，使用 fill primitive 实现

如果 value 为 Tensor，分别实现算子的 GPU 和 CPU 逻辑

性能测试结果如下： | OP | Args | Library | Kernel Time (us, GPU) | Kernel Time (us, 1 CPU) | End-to-end Time (us, 1 CPU) | Kernel Time (us, 32 CPUs) | End-to-end Time (us, 32 CPUs) | | ------------ | ----------------------------- | ------- | --------------------- | ----------------------- | --------------------------- | ------------------------- | ----------------------------- | | Tensor.fill_ | ones(1, 8, 16, 16), 2 | OneFlow | 7 | 2.5 | 10.5 | 2.4 | 9.8 | | Tensor.fill_ | ones(1, 8, 16, 16), 2 | PyTorch | 1.1 | 2.4 | 7 | 1.2 | 3.7 | | Tensor.fill_ | ones(1000, 1000), 2 | OneFlow | 21.6 | 187.6 | 189.2 | 183 | 184.6 | | Tensor.fill_ | ones(1000, 1000), 2 | PyTorch | 11 | 186.4 | 191.3 | 26.4 | 30.7 | | Tensor.fill_ | ones(1, 8, 16, 16), tensor(2) | OneFlow | 20.4 | 3.1 | 21.5 | 3.1 | 21.8 | | Tensor.fill_ | ones(1, 8, 16, 16), tensor(2) | PyTorch | 1.2 | 7.8 | 9.3 | 3.6 | 5.7 | | Tensor.fill_ | ones(1000, 1000), tensor(2) | OneFlow | 26.7 | 180.4 | 184.4 | 175.9 | 179.8 | | Tensor.fill_ | ones(1000, 1000), tensor(2) | PyTorch | 11 | 184.2 | 187.8 | 23.8 | 25.9 |
enhancement automerge eager
opened by zhongshsh 64
Graph rename v2
本 pr 去掉 Block 上的 attribute 和 config

1、彻底避免重名问题；

2、去掉 block config；

实现的方案： | | Eager original | Proxy ，基类叫Proxy | GraphBlock ，基类 GraphBlock | |--------|-----------------|-------------------------------------------------------------------------------|--------------------------------------------------------------------| | 功能 | 支持拿到原始的 eager类型 | 代理执行能力，使用执行接口和 Module 和 Tensor 一样，但是行为已经变化，比如是 lazy 的，可能执行的 op 也被改写了 | GraphBlock, 对应的一个 Graph代码块，保存graph执行需要的信息，比如name/scope/lazy op or tensor，一些 graph 上的分模块的优化开关 | | Module | Module | ProxyModule，内含了一个Module成员和一个GraphModule成员 | GraphModule | | Tensor | Tensor | ProxyTensor，内含了一个Tensor成员和一个GraphTensor成员 | GraphTensor |

用例

from oneflow.nn.graph import GraphModule import oneflow.nn as nn class AGraph(nn.Graph): def __init__(self, module: nn.Module): super().__init__() self.m = module # self.m is a ProxyModule # ProxyModule中有两大部分，一部分是原 module，一部分是 GraphModule self.m.name // 默认取 eager module 的 name self.m.to(GraphModule).name // 取 GraphModule 的 name self.m.to(nn.Module) // 取得原 nn.Module # 取到 GraphModule 上的 config 的方法 self.m.to(GraphModule).set_stage(id, placement)

Fix issue: https://github.com/Oneflow-Inc/oneflow/issues/9193

另外支持 nn.Module 多重继承时的property获取

Fix issue：https://github.com/Oneflow-Inc/oneflow/issues/9345 and https://github.com/Oneflow-Inc/oneflow/issues/9186
enhancement automerge bug api python
opened by strint 60
add searchsorted op

背景：NERF网络需要用到这个算子算子描述：参考pytorch的实现https://pytorch.org/docs/stable/generated/torch.searchsorted.html?highlight=searchsorted#torch.searchsorted 接口与pytorch 1.10 版本实现完全对齐。
enhancement automerge op

opened by yoonlee888 59
Optimize slice and tensor getitem
[x] 基于issue：https://github.com/Oneflow-Inc/OneTeam/issues/1268#issuecomment-1085433728 中提到的，tensor getitem优化，对所有使用eager dataloader的网络都有效。

[x] test case

enhancement feature automerge eager
opened by Flowingsun007 57
Decouple vm mem and compute
让vm worker线程集中注意力做OpKernel::Compute，如果除此之外其他部分的性能优化到位，理论上eager能达到最高的性能。

指令的执行现在分为两步：

Infer。包括内存分配释放，以及opkernel state和cache的准备。

Compute。只执行user_op::OpKernel::Compute函数。

Infer阶段总是在scheduler线程里执行。Compute阶段默认在Worker线程里执行，通过设置ONEFLOW_VM_WORKLOAD_ON_SCHEDULER_THREAD=1，令其在scheduler线程工作执行。

本pr 依赖其他几个pr或分支： vm优化pr：

https://github.com/Oneflow-Inc/oneflow/pull/7923 将指令实现迁移到ep

https://github.com/Oneflow-Inc/oneflow/pull/7623 合并InstructionMsg和Instruction Call指令优化pr：

https://github.com/Oneflow-Inc/oneflow/pull/7617 让StatefullOpKernel变得线程安全。

https://github.com/Oneflow-Inc/oneflow/tree/refactor_eager_tmp_buffer_x_merge_instruction_msg_to_instruction 完全重构指令对temp storage的处理，使得Infer/Compute可以异步工作。

enhancement eager system
opened by lixinqi 55
Refactor MemoryCase to eliminate determine statements of device_type

重构 MemoryCase 结构体来消除代码逻辑中对 device 的特判逻辑。

MemoryCase 改为开放性结构，避免每次增加 DeviceType 枚举类型时，都需对 MemoryCase 进行修改。

MemoryCase 改为开放性结构后，也可消除很多地方的 if (device_type == DeviceType::kGPU) 或 if (mem_case.has_device_cuda_mem()) 等特判逻辑。

重构完后理论上唯一会剩下的就是对 device mem 是否是 host mem 的逻辑判断，因为有些地方的逻辑处理要特别对待 host mem。

重构完后并不能完全消除对 gpu device 的特判逻辑，有些特判写法是与 mem_case 无关的，目前可能重点集中在内存复用那一块的逻辑，task graph 也可能有一些残余，待后续 pr 进一步重构。
enhancement graph need-test-distributed

opened by leaves-zwx 53
Implement oneflow.embedding op
概述

这个PR补充了oneflow.nn.Embedding的实现，之前的实现并没有考虑到padding_idx，max_norm ，norm_type ，scale_grad_by_freq 四个参数，所以直接使用了oneflow.gather，但引入上述参数之后，无法直接复用gather op，需要自定义Embedding op。

pytorch接口链接

功能 CheckList

注意 : 功能复选框均为可选项，若未选择，说明理由即可。例如：该 Op 由 Python 接口拼接而成，因此无 SetBatchAxisInferFn Op 注册；再比如：该 Op 无输入，因此无 SetInputArgModifyFn。

Op

[ ] Op SetBatchAxisInferFn

[x] Op SetGetSbpFn

[x] Op SetInputArgModifyFn

[x] Op 反向梯度注册

CPU Kernel

[x] CPU in:float32

[x] CPU in:float64

[ ] CPU in:int32

[ ] CPU in:int64

[ ] CPU in:int8

GPU Kernel

[x] GPU in:float32

[x] GPU in:float64

[ ] GPU in:int32

[ ] GPU in:int64

[x] GPU in:float16

[ ] GPU in:int8

Python Wrapper

[x] Python API 参数检查及异常提示

[x] 接口注释

[x] Example

测试

[x] 单机单卡 CPU Test Case

[x] 单机单卡 GPU Test Case

[ ] 单机多卡 CPU Test Case

[ ] 单机多卡 GPU Test Case

[ ] 分布式 CPU Test Case

[ ] 分布式 GPU Test Case

GPU 有效带宽

带 GPU 的 Op，请参考 https://github.com/Oneflow-Inc/OneTeam/issues/167 测试有效带宽，并附带测试报告。以下是报告样例：

理论带宽：

Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 250798.5

实际带宽：

PROFILER::KERNEL::CUDA_MEMORY_BANDWIDTH op_name: sqrt_2 elapsed(ms): 0.196064 memory_size(Byte): 50331648 bandwidth(GB/s): 239.08 PROFILER::KERNEL::CUDA_MEMORY_BANDWIDTH op_name: sqrt_2_grad elapsed(ms): 0.29072 memory_size(Byte): 75497472 bandwidth(GB/s): 241.856

PR Checklist

[x] PR 标题语句通畅，明确表达 PR 内容，适合直接作为新版本发布时的 changelog

[x] 代码格式化

[x] 已经本地编译通过

[x] 已本地针对改动测试

[x] 已添加 type 标签:(填写 type 标签名，如 bug, enhancement, purge, feature, documentation)

[x] 已添加 component 标签:(填写 component 标签名，如 op, system, eager, build, xla, python, ci, test, tooling)

[x] Draft 转正式 PR 前已请人 Review

enhancement automerge op
opened by EsdeathYZH 46
check graph op global test
This PR is done:

[x] 执行一些 op 的 Graph Global test(only cuda)。

还有一些未打开 graph 测试的 global op，情况见 https://github.com/Oneflow-Inc/oneflow/pull/8614#issuecomment-1185097594 。
enhancement automerge test graph global
opened by lixiang007666 39
Implement exponential_ and multinomial
需求来源： https://github.com/Oneflow-Inc/OneTeam/issues/1184#issuecomment-1232440993

Todo lists

[x] 实现 exponential_ 算子

[x] functor 逻辑

[x] cpu kernel

[x] cuda kernel

[x] 测试

[x] 实现 multinomial 算子

[x] functor 逻辑

[x] cpu kernel

[x] cuda kernel

[x] 测试

[x] 添加 Distribution 模块

[x] 实现 Categorical

feature automerge op api python need-clean-ccache
opened by Ldpe2G 37
dev add spectral_norm
@BBuf 修复了一些 spectral_norm 实现过程中遇到的bug

[x] 修复 dot 在 cpu 下不支持 int32 与 int64 计算的 bug （因为matmul）

[x] 增加 spectral_norm 的基本功能

[x] 修复 kaiming_uniform_ 和 kaiming_normal_ 在输入0size tensor 的时候的除0 bug

[x] 新增 oneflow.linalg.multi_dot()

[ ] oneflow.contiguous_format

[ ] spectral_norm 的 load_state_dict 测试与 global 测试， load 与 hook

[ ] spectral_norm 和 multi_dot 的文档

好像有一些多余头文件我后面检查一下

feature bug op
opened by hhhfccz 0

Oneflow fails in einops CI, likely due to conflict with new numpy

Summary

___________________ ERROR collecting tests/test_examples.py ____________________
tests/test_examples.py:5: in <module>
    from tests.test_ops import imp_op_backends
<frozen importlib._bootstrap>:1007: in _find_and_load
    ???
<frozen importlib._bootstrap>:986: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:680: in _load_unlocked
    ???
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/_pytest/assertion/rewrite.py:168: in exec_module
    exec(co, module.__dict__)
tests/test_ops.py:10: in <module>
    imp_op_backends = collect_test_backends(symbolic=False, layers=False)
tests/__init__.py:64: in collect_test_backends
    result.append(backend_type())
einops/_backends.py:554: in __init__
    import oneflow as flow
../../../.local/lib/python3.9/site-packages/oneflow/__init__.py:199: in <module>
    import oneflow.framework.register_class_method_util as register_class_method_util
../../../.local/lib/python3.9/site-packages/oneflow/framework/register_class_method_util.py:17: in <module>
    import oneflow.framework.check_point_v2 as check_point_v2
../../../.local/lib/python3.9/site-packages/oneflow/framework/check_point_v2.py:30: in <module>
    import oneflow.framework.dtype as dtype_util
../../../.local/lib/python3.9/site-packages/oneflow/framework/dtype.py:49: in <module>
    oneflow.bool: np.bool,
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/__init__.py:284: in __getattr__
    raise AttributeError("module {!r} has no attribute "
E   AttributeError: module 'numpy' has no attribute 'bool'
------------------------------- Captured stderr --------------------------------
2022-12-27 07:50:33.696556: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2022-12-27 07:50:33.696647: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2022-12-27 07:50:33.696656: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Code to reproduce bug

See CI job for full detailed messages and configuration:

https://github.com/arogozhnikov/einops/actions/runs/3785978910/jobs/6436456017

System Information

What is your OneFlow installation (pip, source, dockerhub): pip
OS: linux
OneFlow version (run python3 -m oneflow --doctor):
Python version: 3.9
CUDA driver version: None
GPU models: None
Other info:

bug community

opened by arogozhnikov 8

Releases(v0.8.0)

Owner

OneFlow

GitHub Repository http://www.oneflow.org

Bayesian Optimization using GPflow

Note: This package is for use with GPFlow 1. For Bayesian optimization using GPFlow 2 please see Trieste, a joint effort with Secondmind. GPflowOpt GP

257 Dec 26, 2022

Pytorch implementation of the paper "COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking."

Expert-Linking Pytorch implementation of the paper "COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking." This is

12 Jan 01, 2023

Ontologysim: a Owlready2 library for applied production simulation

Ontologysim: a Owlready2 library for applied production simulation Ontologysim is an open-source deep production simulation framework, with an emphasi

10 Nov 30, 2022

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "

87 Dec 24, 2022

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data. Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well i

0 Sep 06, 2022

Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

Label-Efficient Semantic Segmentation with Diffusion Models Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion

355 Jan 06, 2023

Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018)

CDAN Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018) New version: https://github.com/thuml/Transfer-Learning-Library Dataset

363 Dec 20, 2022

3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

39 Nov 21, 2022

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

MOS-Multi-Task-Face-Detect Introduction This repo is the official implementation of "MOS: A Low Latency and Lightweight Framework for Face Detection,

104 Dec 08, 2022

A simple tutoral for error correction task, based on Pytorch

gramcorrector A simple tutoral for error correction task, based on Pytorch Grammatical Error Detection (sentence-level) a binary sequence-based classi

8 Dec 03, 2022

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

2.1k Dec 28, 2022

Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via Bayesian Deep Learning

Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via Bayesian Deep Learning Update (September 18th, 2021) A supporting document de

1 Mar 16, 2022

High-quality implementations of standard and SOTA methods on a variety of tasks.

Uncertainty Baselines The goal of Uncertainty Baselines is to provide a template for researchers to build on. The baselines can be a starting point fo

1.1k Dec 30, 2022

PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

snn-localization repo PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch. Install Dependencies Orig

1 Jan 06, 2022

minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) impleme

148 Dec 01, 2022

Unsupervised captioning - Code for Unsupervised Image Captioning

Unsupervised Image Captioning by Yang Feng, Lin Ma, Wei Liu, and Jiebo Luo Introduction Most image captioning models are trained using paired image-se

207 Dec 24, 2022

HackBMU-5.0-Team-Ctrl-Alt-Elite - HackBMU 5.0 Team Ctrl Alt Elite

HackBMU-5.0-Team-Ctrl-Alt-Elite The search is over. We present to you ‘Health-A-

3 Feb 19, 2022

unet for image segmentation

Implementation of deep learning framework -- Unet, using Keras The architecture was inspired by U-Net: Convolutional Networks for Biomedical Image Seg

4.1k Dec 31, 2022

This repository contains the code for TACL2021 paper: SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

SummaC: Summary Consistency Detection This repository contains the code for TACL2021 paper: SummaC: Re-Visiting NLI-based Models for Inconsistency Det

24 Jan 03, 2023

Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral

Reconstructing 3D Human Pose by Watching Humans in the Mirror Qi Fang*, Qing Shuai*, Junting Dong, Hujun Bao, Xiaowei Zhou CVPR 2021 Oral The videos a

178 Dec 13, 2022

OneFlow is a performance-centered and open-source deep learning framework.

Related tags

Overview

OneFlow

Latest News

Install OneFlow

System Requirements

Install with Pip Package

Use docker image

Build from Source

Option 1: Clone source code from GitHub

Option 2: Download from Aliyun

Option 1: Build with Conda (recommended)

Option 2: Build in docker container (recommended)

Option 3: Build on bare metal

Troubleshooting

Advanced features

Getting Started

Documentation

Model Zoo and Benchmark

Communication

中文

International

The Team

License

Comments

这个PR的目的

random op 支持 global tensor 一致性

本 pr 去掉 Block 上的 attribute 和 config

另外支持 nn.Module 多重继承时的property获取

概述

功能 CheckList

Op

CPU Kernel

GPU Kernel

Python Wrapper

测试

GPU 有效带宽

PR Checklist

This PR is done:

Todo lists

Summary

Code to reproduce bug

System Information

Releases(v0.8.0)

Owner

OneFlow

Bayesian Optimization using GPflow

Pytorch implementation of the paper "COAD: Contrastive Pre-training with Adversarial Fine-tuning for Zero-shot Expert Linking."

Ontologysim: a Owlready2 library for applied production simulation

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

Code release for "Conditional Adversarial Domain Adaptation" (NIPS 2018)

3D ResNet Video Classification accelerated by TensorRT

Code for BMVC2021 "MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation"

A simple tutoral for error correction task, based on Pytorch

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via Bayesian Deep Learning

High-quality implementations of standard and SOTA methods on a variety of tasks.

PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

Unsupervised captioning - Code for Unsupervised Image Captioning

HackBMU-5.0-Team-Ctrl-Alt-Elite - HackBMU 5.0 Team Ctrl Alt Elite

unet for image segmentation

This repository contains the code for TACL2021 paper: SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

Code for "Reconstructing 3D Human Pose by Watching Humans in the Mirror", CVPR 2021 oral