DLFlow is a deep learning framework.

Last update: Oct 27, 2022

Overview

DLFlow - A Deep Learning WorkFlow

DLFlow概述

DLFlow是一套深度学习pipeline，它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测，十分适合离线环境下的生产任务。利用DLFlow，用户只需专注于模型开发，而无需关心原始特征处理、pipeline构建、生产部署等工作。

功能支持

配置驱动： DLFlow通过配置驱动，修改配置可以快速更换特征、模型超参数、任务流程等等，极大提高工作效率。

模块化结构： 任务和模型以插件形式存在，便于使用和开发，用户可以可以轻地将自定义任务和模型注册到框架内使用。

任务自组织： 通过内置的Workflow框架，根据任务的产出标记自动解决任务依赖，轻松构建深度学习pipeline。

最佳实践： 融入滴滴用户画像团队深度学习离线任务的最佳实践，有效应对离线生产中的多种问题。将Tensorflow和Spark进行合理结合，更适合离线深度学习任务。

快速开始

环境准备

首先请确保环境中已经安装和配置Hadoop和Spark，并设置好了基本的环境变量。

Tensorflow访问HDFS

为了能够使用让Tensorflow访问HDFS，需要确保如下环境变量生效：

# 确保libjvm.so被添加到LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${JAVA_HOME}/jre/lib/amd64/server

# 确保hadoop jars被添加到CLASSPATH
export CLASSPATH=${CLASSPATH}:$(hadoop classpath --glob)

关于Tensorflow访问HDFS更多内容请参见 TensorFlow on Hadoop。

Spark读写TFReocrds

# Clone tensorflow/ecosystem项目
git clone https://github.com/tensorflow/ecosystem.git

cd ecosystem/spark/spark-tensorflow-connector/

# 构建spark-tensorflow-connector
mvn versions:set -DnewVersion=1.14.0
mvn clean install

项目构建后生成 target/spark-tensorflow-connector_2.11-1.14.0.jar，后续需要确保该jar被添加到 spark.jars 中。关于Spark读写TFRecoreds更多内容请参见 spark-tensorflow-connector。

安装

通过pip安装：

pip install dlflow

通过源代码安装：

git clone  https://github.com/didi/dlflow.git
cd dlflow
python setup.py install

使用

配置文件

运行配置可参考 conf 目录中的配置。关于配置详情请参考配置说明。

以模块运行

python -m dlflow.main --config <CONFIGURATION FILE>.conf

以脚本运行

确保python环境的 bin 目录已经被添加到环境变量 PATH 中

export PATH=$PATH:/usr/local/python/bin

之后通过如下命令运行

dlflow --config .conf

更详细的使用参见使用说明。

预定义任务

预定义任务	描述
Merge	特征融合任务，请参见特征融合
Encode	解析原始特征，对特征进行编码和预处理，生成能够直接输入模型的特征
Train	模型训练任务
Evaluate	模型评估任务
Predict	模型预测任务，使用Spark进行分布式预测，具备处理大规模数据能力

手册目录

技术方案

DLFlow整体架构

DLFLow pipeline

Contributing

欢迎使用并参与到本项目的建设中，详细内容请参见 Contribution Guide。

License

DLFlow 基于Apache-2.0协议进行分发和使用，更多信息参见 LICENSE。

DLFlow is a deep learning framework.

Related tags

Overview

DLFlow - A Deep Learning WorkFlow

DLFlow概述

功能支持

快速开始

环境准备

安装

使用

预定义任务

手册目录

技术方案

Contributing

License

Owner

DiDi

Adversarial Self-Defense for Cycle-Consistent GANs

MaskTrackRCNN for video instance segmentation based on mmdetection

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible

Angle data is a simple data type.

StarGAN v2 - Official PyTorch Implementation (CVPR 2020)

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

official implementation for the paper "Simplifying Graph Convolutional Networks"

Codes for AAAI 2022 paper: Context-aware Health Event Prediction via Transition Functions on Dynamic Disease Graphs

Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

Learning Time-Critical Responses for Interactive Character Control

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

K-Means Clustering and Hierarchical Clustering Unsupervised Learning Solution in Python3.

Train DeepLab for Semantic Image Segmentation

Pytorch implementation of DeePSiM

SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

Complete U-net Implementation with keras

Tensorflow implementation of Character-Aware Neural Language Models.

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Crosslingual Segmental Language Model