An energy estimator for eyeriss-like DNN hardware accelerator

Overview

Energy-Estimator-for-Eyeriss-like-Architecture-

An energy estimator for eyeriss-like DNN hardware accelerator

This is an energy estimator for eyeriss-like architecture utilizing Row-Stationary dataflow which is a DNN hardware accelerator created by works from Vivienne Sze’s group in MIT. You can refer to their original works in github, Y. N. Wu, V. Sze, J. S. Emer, “An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs,” IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2020, http://eyeriss.mit.edu/, etc. Thanks to their contribution in DNN accelerator and energy efficient design.

image

Eyeriss-like architecture utilizes row-stationary dataflow in order to fully explore data reuse including convolutional reuse, ifmap reuse and filter reuse. In general, the energy breakdown in each DNN layer can be separated in terms of computation and memory access (or data transfer). image

Computation Energy : Performing MAC operations. Data Energy : The number of bits accessed at each memory level is calculated based on the dataflow and scaled by the hardware energy cost of accessing one bit at that memory level. The data energy is the summation of each memory hierarchy (DRAM, NoC, Global Buffer, RF) or each data type (ifmap, weight, partial sum). image

  1. Quantization Bitwidth Energy scaling in computation : linear for single operand scaling. Quadratic for two operands scaling. Energy scaling in data access : Linear scaling for any data type in any memory hierarchy.
  2. Pruning on filters (weights) Energy scaling in computation : Skip MAC operations according to pruning ratio. (Linear scaling) Energy scaling in data access : Linear scaling for weight access. image

Assumptions: Initial image input and weights in each layer should be first read from DRAM (external off-chip memory). Global Buffer is big enough to store any amount of datum and intermediate numbers. NoC has high-performance and high throughput with non-blocking broadcasting and inter-PE forwarding capability which supports multiple information transactions simultaneously. No data compression technique is considered in estimator design. Each PE is able to recognize information transferred among NoCs so that only those in need could receive data. Sparsity of weights and activations aren’t considered. Register File inside each PE only has the capacity to store one row of weights, one row of ifmap and one partial sum which means that we won’t take the capacity of RF into account. (A pity in this energy estimator) Ifmap and ofmap of each layer should be read from or written back into DRAM for external read operations. Once a data value is read from one memory level and then written into another memory level, the energy consumption of this transaction is always decided by the higher-cost level and only regarded as a single operation. Data transfer could happen directly between any 2 memory levels. This estimator is only applied to 2D systolic PE arrays. Partial sum and ofmap of one layer have the same bitwidth as activations. Maxpooling, Relu and LRN are not taken into account with respect to energy estimation. (little impact on total estimation) In order to make full use of data reuse (convolutional reuse and ifmap reuse), apart from row-stationary dataflow, scheduling algorithm will try to avoid reading ifmaps as much as possible. Once a channel of ifmap is kept inside the RF, the computation will be executed across the corresponding channel of entire filters in each layer.

Example analysis : Hardware Architecture : Eyeriss PE size : 12*14, 2D Dataflow : Row Stationary DNN Model : AlexNet (5 conv layers, 3 FC layers) Initial Input : single image from ImageNet Additional Attributes : Pruning and Quantization (You can revise your own pruning ratio and bitwidth of weight and activation in my source code) Everything is not hard-coded !

A pity ! (future works to do) 3D PE arrays. Memory size is considered in scheduling algorithm to accommodate more intermediate datum in low-cost level without writing back to high-cost level. Possible I/O data compression. (encoder, decoder) Possible sparsity optimization. (zero-gated MAC) Elaborate operation with specific arguments like random read, repeated write, constant read, etc. The impact of memory size, spatial distribution, location can be taken into account when we try to improve precision of our energy estimator. For example, the spatial distribution between two PEs can be characterized by Manhattan distance which can be used to scale the energy consumption of data forwarding in NoC.

If you have any questions or troubles please contact me. I'd also like to listen to your advice and opinions!

Owner
HEXIN BAO
UESTC Bachelor EE NUS Master ECE Future unknown
HEXIN BAO
Rainbow: Combining Improvements in Deep Reinforcement Learning

Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning [1]. Results and pretrained models can be found in the releases. DQN [2] Double

Kai Arulkumaran 1.4k Dec 29, 2022
The author's officially unofficial PyTorch BigGAN implementation.

BigGAN-PyTorch The author's officially unofficial PyTorch BigGAN implementation. This repo contains code for 4-8 GPU training of BigGANs from Large Sc

Andy Brock 2.6k Jan 02, 2023
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization Official PyTorch implementation for our URST (Ultra-Resolution Sty

czczup 148 Dec 27, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 01, 2023
[ICML 2020] "When Does Self-Supervision Help Graph Convolutional Networks?" by Yuning You, Tianlong Chen, Zhangyang Wang, Yang Shen

When Does Self-Supervision Help Graph Convolutional Networks? PyTorch implementation for When Does Self-Supervision Help Graph Convolutional Networks?

Shen Lab at Texas A&M University 106 Nov 11, 2022
Robotic Process Automation in Windows and Linux by using Driagrams.net BPMN diagrams.

BPMN_RPA Robotic Process Automation in Windows and Linux by using BPMN diagrams. With this Framework you can draw Business Process Model Notation base

23 Dec 14, 2022
Scrutinizing XAI with linear ground-truth data

This repository contains all the experiments presented in the corresponding paper: "Scrutinizing XAI using linear ground-truth data with suppressor va

braindata lab 2 Oct 04, 2022
For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code

Athar Sefid 6 Nov 02, 2022
Code for Recurrent Mask Refinement for Few-Shot Medical Image Segmentation (ICCV 2021).

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation Steps Install any missing packages using pip or conda Preprocess each dataset using

XIE LAB @ UCI 39 Dec 08, 2022
[AAAI2022] Source code for our paper《Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning》

SSVC The source code for paper [Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation Learning] samples of the

7 Oct 26, 2022
[AI6122] Text Data Management & Processing

[AI6122] Text Data Management & Processing is an elective course of MSAI, SCSE, NTU, Singapore. The repository corresponds to the AI6122 of Semester 1, AY2021-2022, starting from 08/2021. The instruc

HT. Li 1 Jan 17, 2022
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 03, 2023
[SIGGRAPH 2022 Journal Track] AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars Fangzhou Hong1*  Mingyuan Zhang1*  Liang Pan1  Zhongang Cai1,2,3  Lei Yang2 

Fangzhou Hong 749 Jan 04, 2023
Neural Style and MSG-Net

PyTorch-Style-Transfer This repo provides PyTorch Implementation of MSG-Net (ours) and Neural Style (Gatys et al. CVPR 2016), which has been included

Hang Zhang 904 Dec 21, 2022
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022
A model that attempts to learn and benefit from data collected on card counting.

A model that attempts to learn and benefit from data collected on card counting. A decision tree like model is built to win more often than loose and increase the bet of the player appropriately to c

1 Dec 17, 2021
Learning and Building Convolutional Neural Networks using PyTorch

Image Classification Using Deep Learning Learning and Building Convolutional Neural Networks using PyTorch. Models, selected are based on number of ci

Mayur 126 Dec 22, 2022
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with ONNX, TensorRT, ncnn, and OpenVINO supported.

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

7.7k Jan 03, 2023
Fast, accurate and reliable software for algebraic CT reconstruction

KCT CBCT Fast, accurate and reliable software for algebraic CT reconstruction. This set of software tools includes OpenCL implementation of modern CT

Vojtěch Kulvait 4 Dec 14, 2022
A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

Steven Newton 0 Dec 09, 2021