SAMO: Streaming Architecture Mapping Optimisation

Last update: Dec 10, 2022

Overview

SAMO: Streaming Architecture Mapping Optimiser

The SAMO framework provides a method of optimising the mapping of a Convolutional Neural Network model onto an FPGA platform for Streaming Architecture frameworks. Both a Simulated Annealing and Brute Force optimiser are implemented. We currently support the following frameworks:

Installation

You can install this package using:

python -m pip install samo

Usage

The general usage of the SAMO tool can be seen by running python -m samo --help.

Example platform configurations are given in the platforms directory and example CNN models can be generated by running python scripts/generate_networks.py.

FINN

In order to run the optimiser with the FINN toolflow, the first step is to download the following fork

git clone https://github.com/Yu-Zhewen/finn.git
cd finn
git checkout 4cc0b6fdae2f5c06f0b5bcc6fa45fba4d8b69111

As FINN requires docker, set SAMO_DIR to the path of SAMO in run_docker.sh, before entering the docker.

bash run_docker.sh

Within the docker, generate the FINN-ONNX through the following steps.

cd ../samo
cp models/${network}.onnx outputs/saved/finn/${network}.onnx
cp ../finn/notebooks/samo/config/${network}.json ../finn/notebooks/samo/config.json
jupyter nbconvert --to notebook --execute ../finn/notebooks/samo/pre_optimiser_steps.ipynb
mv ../finn/notebooks/samo/pre_optimiser_steps.nbconvert.ipynb outputs/saved/finn/${network}_pre_optimiser_steps.nbconvert.ipynb

To optimise the CNN model in the FINN-ONNX format, you need to do:

python -m samo --optimiser annealing --model outputs/saved/finn/${network}_pre_optimiser.onnx  \
    --backend finn --platform platforms/zedboard.json \
    --output-path outputs/saved/finn/${network}_post_optimiser.onnx

Finally, the following command is used to generate the hardware.

jupyter nbconvert --to notebook --execute ../finn/notebooks/samo/post_optimiser_steps.ipynb

HLS4ML

This tool can be used to generate optimised designs for the HLS4ML framework. SAMO tunes the reuse-factor for layers of the CNN model, and generates a Resource driven design.

To optimise a keras model for a given platform, run the following:

python -m samo --optimiser annealing --model models/model.keras \
    --backend hls4ml --platform platforms/zedboard.json \
    --output-path outputs/model_hls4ml.json

The previous command generates a configuration file (outputs/model_hls4ml.json), which can be used by the HLS4ML to generate hardware. To do this, you will need to use the HLS4ML API to convert this configuration file into a HLS project.

import hls4ml
from tensorflow import keras

# load the configuration
with open("outputs/model_hls4ml.json", "r") as f:
    config = json.load(f)

# load the platform
with open("platforms/zedboard.json", "r") as f:
    platform = json.load(f)

# load the keras model
model = keras.models.load_model("models/model.keras")

# create the hls model
hls_model = hls4ml.converters.convert_from_keras_model(model, hls_config=config,
        output_dir="outputs/hls4ml_prj",  io_type="io_stream", fpga_part=platform["part"])

# build the HLS project
hls_model.build(csim=True, cosim=True)

Feel free to post an issue if you have any questions or problems!

SAMO: Streaming Architecture Mapping Optimisation

Related tags

Overview

SAMO: Streaming Architecture Mapping Optimiser

Installation

Usage

FINN

HLS4ML

Owner

Alexander Montgomerie-Corcoran

一个多模态内容理解算法框架，其中包含数据处理、预训练模型、常见模型以及模型加速等模块。

Repository containing detailed experiments related to the paper "Memotion Analysis through the Lens of Joint Embedding".

Pytorch implementation for "Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets" (ECCV 2020 Spotlight)

OCR Post Correction for Endangered Language Texts

NAS-FCOS: Fast Neural Architecture Search for Object Detection (CVPR 2020)

Job Assignment System by Real-time Emotion Detection

We are More than Our JOints: Predicting How 3D Bodies Move

A new framework, collaborative cascade prediction based on graph neural networks (CCasGNN) to jointly utilize the structural characteristics, sequence features, and user profiles.

Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions

Official implementation of Self-supervised Image-to-text and Text-to-image Synthesis

A vanilla 3D face modeling on pose-invariant and multi-lightning image data

Algebraic effect handlers in Python

Dataset Condensation with Contrastive Signals

Evaluating different engineering tricks that make RL work

Making Structure-from-Motion (COLMAP) more robust to symmetries and duplicated structures

Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust.

Code and data for ImageCoDe, a contextual vison-and-language benchmark

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models Benchmark and Efficient Evaluation

PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection