AI Flow is an open source framework that bridges big data and artificial intelligence.

Related tags

Deep Learningai-flow
Overview

Flink AI Flow

Introduction

Flink AI Flow is an open source framework that bridges big data and artificial intelligence. It manages the entire machine learning project lifecycle as a unified workflow, including feature engineering, model training, model evaluation, model service, model inference, monitoring, etc. Throughout the entire workflow, Flink is used as the general purpose computing engine.

In addition to the capability of orchestrating a group of batch jobs, by leveraging an event-based scheduler(enhanced version of Airflow), Flink AI Flow also supports workflows that contain streaming jobs. Such capability is quite useful for complicated real-time machine learning systems as well as other real-time workflows in general.

Features

You can use Flink AI Flow to do the following:

  1. Define the machine learning workflow including batch/stream jobs.

  2. Manage metadata(generated by the machine learning workflow) of date sets, models, artifacts, metrics, jobs etc.

  3. Schedule and run the machine learning workflow.

  4. Publish and subscribe events

To support online machine learning scenarios, notification service and event-based schedulers are introduced. Flink AI Flow's current components are:

  1. SDK: It defines how to build a machine learning workflow and includes the api of the Flink AI Flow.

  2. Notification Service: It provides event listening and notification functions.

  3. Meta Service: It saves the meta data of the machine learning workflow.

  4. Event-Based Scheduler: It is a scheduler that triggered jobs by some events happened.

Documentation

QuickStart

You can use Flink AI Flow according to the guidelines of Quick Start. Besides, you can also take a look at our Tutorial for writing your own workflow.

API

Please refer to the API Documentation to find the API supported by Flink AI Flow.

Design

If you are interested in design principle of Flink AI Flow, please see the Design Documentation for more details.

Examples

You can refer to some examples of Flink AI Flow to have a better understanding of how to write a workflow. Please see the Examples directory.

Reporting bugs

If you encounter any issues please open an issue in github and we encourage you to provide a patch through github pull request as well.

Contributing

We happily welcome contributions to Flink AI Flow. Please see our contribution guide for details.

Contact Us

For more information, you can join the Flink AI Flow Users Group on DingTalk to contact us. The number of the DingTalk group is 35876083.

You can also join the group by scanning the QR code below:

Comments
  • [Notification] Support CLI tool for notification server

    [Notification] Support CLI tool for notification server

    What is the purpose of the change

    Support CLI tool for notification server

    Brief change log

    (for example:)

    • Support CLI tool for notification server

    Verifying this change

    (Please pick either of the following options)

    This change added tests.

    opened by jiangxin369 2
  • [Notification] Add notification cli server command

    [Notification] Add notification cli server command

    What is the purpose of the change

    Add notification cli server command #170

    Brief change log

    • Add notification cli server command

    Verifying this change

    This change added tests.

    opened by Sxnan 2
  • [Airflow]Fix remote log handler bug

    [Airflow]Fix remote log handler bug

    What is the purpose of the change

    Fix #172

    Brief change log

    • Change the log_relative_path in remote log handler like S3, GCS, Azure when they init the context.
    • Add get_provider_info.py to pass airflow test cases.
    • Modified old test case in providers.

    Verifying this change

    This change has modified old test case to fit these new remote log handlers.

    opened by aqua7regia 2
  • [Airflow] The remote logging supports the current log mechanism

    [Airflow] The remote logging supports the current log mechanism

    FileTaskHandler, a python log handler that handles and reads task instance logs, supports the current log mechanism which uses seq_num and try_number to name the log file. The remote log handler like S3TaskHandler should support the log mechanism.

    bug Airflow 
    opened by SteNicholas 2
  • [hotfix] fix typo of spark operator

    [hotfix] fix typo of spark operator

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Daily uploading packages to PyPI

    Daily uploading packages to PyPI

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Add Tutorial Example

    Add Tutorial Example

    Brief change log

    (for example:)

    • Add tutorial example and docs
    • Each workflow execution should have its own working directory to avoid file overriding

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Add documentations

    Add documentations

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Refactor Notification Event

    Refactor Notification Event

    What is the purpose of the change

    Refactor Notification Event

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Fix unittests about existed namespace

    Fix unittests about existed namespace

    What is the purpose of the change

    Fix unittests about existed namespace

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Add docs about concepts

    Add docs about concepts

    What is the purpose of the change

    Add docs about concepts

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Unifying TaskStatusEvent and TaskStatusChangedEvent

    Unifying TaskStatusEvent and TaskStatusChangedEvent

    Describe the bug

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

    Steps to reproduce the behavior (if you can):

    1. Submit a '...'
    2. Click on '....'
    3. See error

    Expected behavior

    Actual behavior

    Screenshots

    Additional context

    opened by jiangxin369 0
  • Workflow Execution Status Incorrect

    Workflow Execution Status Incorrect

    Describe the bug

    task1 action_on_task_status(task2, success) when task1 failed, task2 would not be scheduled, but the status of workflow execution is still running. I think the status shoud be failed

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

    Steps to reproduce the behavior (if you can):

    1. Submit a '...'
    2. Click on '....'
    3. See error

    Expected behavior

    Actual behavior

    Screenshots

    Additional context

    opened by jiangxin369 0
  • AIFlow support python3.6

    AIFlow support python3.6

    Describe the feature

    AIFlow support python3.6

    Describe the solution you'd like

    Describe alternatives you've considered

    Additional context

    When run aiflow with python3.6, notification server blocks wich following logs:

    [2022-07-25 14:07:07,682 - server.py:68 [MainThread] - INFO: Notification server started.
    [2022-07-25 14:07:18,780 - server.py:189 [Thread-1] - ERROR: Lock is not acquired.
    Traceback (most recent call last):
      File "/root/venv_for_aiflow/lib64/python3.6/site-packages/notification_service/server.py", line 185, in _call_behavior_async
        return await behavior(argument, context), True
      File "/usr/lib64/python3.6/asyncio/coroutines.py", line 225, in coro
        res = yield from await_meth()
      File "/root/venv_for_aiflow/lib64/python3.6/site-packages/notification_service/service.py", line 221, in _list_all_events
        pass
      File "/usr/lib64/python3.6/asyncio/coroutines.py", line 212, in coro
        res = func(*args, **kw)
      File "/usr/lib64/python3.6/asyncio/locks.py", line 86, in __aexit__
        self.release()
      File "/usr/lib64/python3.6/asyncio/locks.py", line 207, in release
        raise RuntimeError('Lock is not acquired.')
    RuntimeError: Lock is not acquired.
    
    opened by jiangxin369 0
  • action_on_event_received only process events sent by current workflow execution

    action_on_event_received only process events sent by current workflow execution

    Describe the feature

    Currently, the event is broadcasting, once the wanted event is sent, all workflow executions of this workflow would be triggered.

    Describe the solution you'd like

    As the scheduler dispatcher would inspect the context of each event to figure out if it contains the workflow execution id, we can inject the runtime context of each task execution to the user-defined event to make sure the event will only effect on the current workflow execution.

    We need the following changes:

    • Add a global variable _CURRENT_TASK_CONTEXT in context.py, it is used to store the runtime context of each task execution.
    • To read and write the _CURRENT_TASK_CONTEXT, add get_runtime_task_context and set_runtime_task_context functions.
    • Add a public API called wrap_execution_context to inject the runtime info to the context of the event before sending it.
    _CURRENT_TASK_CONTEXT: TaskExecutionContext = None
    
    
    def set_runtime_task_context(context: TaskExecutionContext):
        global _CURRENT_TASK_CONTEXT
        _CURRENT_TASK_CONTEXT = context
    
    
    def get_runtime_task_context():
        return _CURRENT_TASK_CONTEXT
    
    
    def wrap_execution_info_to_context(event: Event):
        """
      The event whose context is wrapped with workflow execution info would only be processed by specific workflow execution.
      """
        pass
    

    How to use it?

    def func():
        notification_client = get_notification_client()
        event = Event(event_key=EVENT_KEY, message='This is a custom message.')
        
        // wrap event with context
        wrap_execution_info_to_context(event)
        // send event
        notification_client.send_event(event)
    
    with Workflow(name='workflow') as w1:
        task = PythonOperator(name='task', python_callable=func)
    

    Describe alternatives you've considered

    Additional context

    opened by jiangxin369 0
  • [NotificationService] Cannot use one EmbeddedNotificationClient instance in multiple threads

    [NotificationService] Cannot use one EmbeddedNotificationClient instance in multiple threads

    Describe the bug

    Cannot use one EmbeddedNotificationClient instance in multiple threads.

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

        def test_create_client_multiple_threads(self):
            import threading
            from notification_service.embedded_notification_client import EmbeddedNotificationClient
            from notification_service.event import Event, EventKey
    
            client = EmbeddedNotificationClient(server_uri="localhost:50052", namespace='default', sender='sender')
    
            def send_event():
                event = Event(event_key=EventKey('key1'), message='a')
                client.send_event(event)
                print(client.sequence_number)
            threads = []
            for i in range(100):
                thread = threading.Thread(target=send_event)
                threads.append(thread)
            for t in threads:
                t.start()
            for t in threads:
                t.join()
    

    Expected behavior

    100 events sent.

    Actual behavior

    less than 100 events sent.

    Screenshots

    Additional context

    opened by jiangxin369 0
Releases(release-0.3.1)
  • release-0.3.1(Feb 22, 2022)

    Features

    1. Flink job plugin supports the multiple Flink versions #236
    2. Support stopping/resuming job scheduling #241
    3. Support log view of job execution for frontend #251
    4. Improve documentation

    Bug Fixes

    1. Airflow operator context does not set correctly #250
    2. Failed to trigger workflow execution #260
    3. Fix update the deployed model version multiple times #269
    4. Fix register_model_version sending wrong event type bug #290

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
  • release-0.3.0(Dec 16, 2021)

    Features

    AIFlow

    1. Introduces the command-line interface to help operation.
    2. Support database version migration.
    3. Supports the worflow development on the Jupyter Notebook #140

    Notification Service

    1. Introduces the command-line interface to help operation.
    2. Support database version migration.

    Bug Fixes

    1. The remote logging supports the current log mechanism #172
    2. Unittest failed cause by init_ai_flow_context #185
    3. AIFlow Webserver cannot sort model version by version #207

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
    examples.tar.gz(32.95 MB)
  • release-0.2.2(Nov 19, 2021)

    Features

    AIFlow

    1. Add the documents of the AIFlow. #117

    Airflow

    1. Allow LocalExecutor to run with SQLite. #41
    2. Airflow webserver supports the Airflow and Notification databases. #49

    Notification Service

    1. Introduces the countEvents interface. #11

    Bug Fixes

    1. Fix oss blob manager download concurrently. #3
    2. Deepcopy tasks in celery executor to avoid race condition. #12
    3. Fix periodic workflow cannot run. #77
    4. Fix HDFSBlobManager failed to download existed file. #87
    5. Removes the uncompleted api action_on_dataset_event. #104
    6. The workflow directory is set incorrect in AIFlow runtime. #111

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
    examples.tar.gz(32.93 MB)
  • release-0.2.0(Feb 9, 2022)

    Features

    AIFlow

    1. Add workflow execution on event and ContextExtractor API #476
    2. Add task execution restful api #478
    3. AIFlow add WorkflowEventManager to listen and handle events #492
    4. Support start new workflow execution with context #479
    5. Introduce the workflow frontend of the AIFlow UI #509
    6. Add FlinkSqlProcessor #527
    7. Support job execution label #529
    8. Frontend support metadata ui #533
    9. Notification service supports the idempotence #553
    10. Add read-only job plugin #555

    Airflow

    1. Support celery executor on event based scheduler #482
    2. Add AirFlowScheduler with airflow restful api #486

    Bug Fixes

    1. Make AI Flow be able to use Notification Service with HA enabled #510
    2. Duplicated entry when create dagrun #570
    3. EventBaseScheduler catches and prints exceptions #586
    4. Scheduler should find schedulable tasks once dagrun finished #587
    5. EventBaseScheduler would trigger task multiple times incorrectly #598
    Source code(tar.gz)
    Source code(zip)
    ai-flow-examples.tar.gz(32.93 MB)
  • release-0.1.0(Feb 9, 2022)

Owner
A neutral organization to host ecosystem projects for Apache Flink
TensorFlow (Python API) implementation of Neural Style

neural-style-tf This is a TensorFlow implementation of several techniques described in the papers: Image Style Transfer Using Convolutional Neural Net

Cameron 3.1k Jan 02, 2023
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

46 Nov 09, 2022
Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps Here is the code for ssbassline model. We also provide OCR results/features/mode

ZephyrZhuQi 51 Nov 18, 2022
Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks

Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks Work accepted at NeurIPS'21 [paper, video]. If you use this code in

TU Delft 43 Dec 07, 2022
An implementation of RetinaNet in PyTorch.

RetinaNet An implementation of RetinaNet in PyTorch. Installation Training COCO 2017 Pascal VOC Custom Dataset Evaluation Todo Credits Installation In

Conner Vercellino 297 Jan 04, 2023
[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning

Transform and Tell: Entity-Aware News Image Captioning This repository contains the code to reproduce the results in our CVPR 2020 paper Transform and

Alasdair Tran 85 Dec 13, 2022
Efficient Deep Learning Systems course

Efficient Deep Learning Systems This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Sc

Max Ryabinin 173 Dec 29, 2022
Lucid Sonic Dreams syncs GAN-generated visuals to music.

Lucid Sonic Dreams Lucid Sonic Dreams syncs GAN-generated visuals to music. By default, it uses NVLabs StyleGAN2, with pre-trained models lifted from

731 Jan 02, 2023
A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

FaceIDLight 📘 Description A lightweight face-recognition toolbox and pipeline based on tensorflow-lite with MTCNN-Face-Detection and ArcFace-Face-Rec

Martin Knoche 16 Dec 07, 2022
Byzantine-robust decentralized learning via self-centered clipping

Byzantine-robust decentralized learning via self-centered clipping In this paper, we study the challenging task of Byzantine-robust decentralized trai

EPFL Machine Learning and Optimization Laboratory 4 Aug 27, 2022
TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network

TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network Created by Seunghoon Hong, Junhyuk Oh,

42 Jun 29, 2022
DUE: End-to-End Document Understanding Benchmark

This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based

21 Dec 29, 2022
Implementation of "Learning to Match Features with Seeded Graph Matching Network" ICCV2021

SGMNet Implementation PyTorch implementation of SGMNet for ICCV'21 paper "Learning to Match Features with Seeded Graph Matching Network", by Hongkai C

87 Dec 11, 2022
Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Overinterpretation This repository contains the code for the paper: Overinterpretation reveals image classification model pathologies Authors: Brandon

Gifford Lab, MIT CSAIL 17 Dec 10, 2022
GARCH and Multivariate LSTM forecasting models for Bitcoin realized volatility with potential applications in crypto options trading, hedging, portfolio management, and risk management

Bitcoin Realized Volatility Forecasting with GARCH and Multivariate LSTM Author: Chi Bui This Repository Repository Directory ├── README.md

Chi Bui 113 Dec 29, 2022
Official implementation for: Blended Diffusion for Text-driven Editing of Natural Images.

Blended Diffusion for Text-driven Editing of Natural Images Blended Diffusion for Text-driven Editing of Natural Images Omri Avrahami, Dani Lischinski

328 Dec 30, 2022
Pytorch implementation of DeepMind's differentiable neural computer paper.

DNC pytorch This is a Pytorch implementation of DeepMind's Differentiable Neural Computer (DNC) architecture introduced in their recent Nature paper:

Yuanpu Xie 91 Nov 21, 2022
Tilted Empirical Risk Minimization (ICLR '21)

Tilted Empirical Risk Minimization This repository contains the implementation for the paper Tilted Empirical Risk Minimization ICLR 2021 Empirical ri

Tian Li 40 Nov 28, 2022
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 01, 2023
🚀 An end-to-end ML applications using PyTorch, W&B, FastAPI, Docker, Streamlit and Heroku

🚀 An end-to-end ML applications using PyTorch, W&B, FastAPI, Docker, Streamlit and Heroku

Made With ML 82 Jun 26, 2022