Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning

Overview

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning.

Circuit Training is an open-source framework for generating chip floor plans with distributed deep reinforcement learning. This framework reproduces the methodology published in the Nature 2021 paper:

A graph placement methodology for fast chip design. Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter & Jeff Dean, 2021. Nature, 594(7862), pp.207-212. [PDF]

Our hope is that Circuit Training will foster further collaborations between academia and industry, and enable advances in deep reinforcement learning for Electronic Design Automation, as well as, general combinatorial and decision making optimization problems. Capable of optimizing chip blocks with hundreds of macros, Circuit Training automatically generates floor plans in hours, whereas baseline methods often require human experts in the loop and can take months.

Circuit training is built on top of TF-Agents and TensorFlow 2.x with support for eager execution, distributed training across multiple GPUs, and distributed data collection scaling to 100s of actors.

Table of contents

Features
Installation
Quick start
Results
Testing
Releases
How to contribute
AI Principles
Contributors
How to cite
Disclaimer

Features

  • Places netlists with hundreds of macros and millions of stdcells (in clustered format).
  • Computes both macro location and orientation (flipping).
  • Optimizes multiple objectives including wirelength, congestion, and density.
  • Supports alignment of blocks to the grid, to model clock strap or macro blockage.
  • Supports macro-to-macro, macro-to-boundary spacing constraints.
  • Allows users to specify their own technology parameters, e.g. and routing resources (in routes per micron) and macro routing allocation.
  • Coming soon: Tools for generating a clustered netlist given a netlist in common formats (Bookshelf and LEF/DEF).
  • Coming soon: Generates macro placement tcl command compatible with major EDA tools (Innovus, ICC2).

Installation

Circuit Training requires:

  • Installing TF-Agents which includes Reverb and TensorFlow.
  • Downloading the placement cost binary into your system path.
  • Downloading the circuit-training code.

Using the code at HEAD with the nightly release of TF-Agents is recommended.

# Installs TF-Agents with nightly versions of Reverb and TensorFlow 2.x
$  pip install tf-agents-nightly[reverb]
# Copies the placement cost binary to /usr/local/bin and makes it executable.
$  sudo curl https://storage.googleapis.com/rl-infra-public/circuit-training/placement_cost/plc_wrapper_main \
     -o  /usr/local/bin/plc_wrapper_main
$  sudo chmod 555 /usr/local/bin/plc_wrapper_main
# Clones the circuit-training repo.
$  git clone https://github.com/google-research/circuit-training.git

Quick start

This quick start places the Ariane RISC-V CPU macros by training the deep reinforcement policy from scratch. The num_episodes_per_iteration and global_batch_size used below were picked to work on a single machine training on CPU. The purpose is to illustrate a running system, not optimize the result. The result of a few thousand steps is shown in this tensorboard. The full scale Ariane RISC-V experiment matching the paper is detailed in Circuit training for Ariane RISC-V.

The following jobs will be created by the steps below:

  • 1 Replay Buffer (Reverb) job
  • 1-3 Collect jobs
  • 1 Train job
  • 1 Eval job

Each job is started in a tmux session. To switch between sessions use ctrl + b followed by s and then select the specified session.

: Starts 2 more collect jobs to speed up training. # Change to the tmux session `collect_job_01`. # `ctrl + b` followed by `s` $ python3 -m circuit_training.learning.ppo_collect \ --root_dir=${ROOT_DIR} \ --replay_buffer_server_address=${REVERB_SERVER} \ --variable_container_server_address=${REVERB_SERVER} \ --task_id=1 \ --netlist_file=${NETLIST_FILE} \ --init_placement=${INIT_PLACEMENT} # Change to the tmux session `collect_job_02`. # `ctrl + b` followed by `s` $ python3 -m circuit_training.learning.ppo_collect \ --root_dir=${ROOT_DIR} \ --replay_buffer_server_address=${REVERB_SERVER} \ --variable_container_server_address=${REVERB_SERVER} \ --task_id=2 \ --netlist_file=${NETLIST_FILE} \ --init_placement=${INIT_PLACEMENT} ">
# Sets the environment variables needed by each job. These variables are
# inherited by the tmux sessions created in the next step.
$  export ROOT_DIR=./logs/run_00
$  export REVERB_PORT=8008
$  export REVERB_SERVER="127.0.0.1:${REVERB_PORT}"
$  export NETLIST_FILE=./circuit_training/environment/test_data/ariane/netlist.pb.txt
$  export INIT_PLACEMENT=./circuit_training/environment/test_data/ariane/initial.plc

# Creates all the tmux sessions that will be used.
$  tmux new-session -d -s reverb_server && \
   tmux new-session -d -s collect_job_00 && \
   tmux new-session -d -s collect_job_01 && \
   tmux new-session -d -s collect_job_02 && \
   tmux new-session -d -s train_job && \
   tmux new-session -d -s eval_job && \
   tmux new-session -d -s tb_job

# Starts the Replay Buffer (Reverb) Job
$  tmux attach -t reverb_server
$  python3 -m circuit_training.learning.ppo_reverb_server \
   --root_dir=${ROOT_DIR}  --port=${REVERB_PORT}

# Starts the Training job
# Change to the tmux session `train_job`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.train_ppo \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --num_episodes_per_iteration=16 \
  --global_batch_size=64 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

# Starts the Collect job
# Change to the tmux session `collect_job_00`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.ppo_collect \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --task_id=0 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

# Starts the Eval job
# Change to the tmux session `eval_job`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.eval \
  --root_dir=${ROOT_DIR} \
  --variable_container_server_address=${REVERB_SERVER} \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

# Start Tensorboard.
# Change to the tmux session `tb_job`.
# `ctrl + b` followed by `s`
$  tensorboard dev upload --logdir ./logs

# 
   
    : Starts 2 more collect jobs to speed up training.
   
# Change to the tmux session `collect_job_01`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.ppo_collect \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --task_id=1 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

# Change to the tmux session `collect_job_02`.
# `ctrl + b` followed by `s`
$  python3 -m circuit_training.learning.ppo_collect \
  --root_dir=${ROOT_DIR} \
  --replay_buffer_server_address=${REVERB_SERVER} \
  --variable_container_server_address=${REVERB_SERVER} \
  --task_id=2 \
  --netlist_file=${NETLIST_FILE} \
  --init_placement=${INIT_PLACEMENT}

Results

The results below are reported for training from scratch, since the pre-trained model cannot be shared at this time.

Ariane RISC-V CPU

View the full details of the Ariane experiment on our details page. With this code we are able to get comparable or better results training from scratch as fine-tuning a pre-trained model. At the time the paper was published, training from a pre-trained model resulted in better results than training from scratch for the Ariane RISC-V. Improvements to the code have also resulted in 50% less GPU resources needed and a 2x walltime speedup even in training from scratch. Below are the mean and standard deviation for 3 different seeds run 3 times each. This is slightly different than what was used in the paper (8 runs each with a different seed), but better captures the different sources of variability.

Proxy Wirelength Proxy Congestion Proxy Density
mean 0.1013 0.9174 0.5502
std 0.0036 0.0647 0.0568

The table below summarizes the paper result for fine-tuning from a pre-trained model over 8 runs with each one using a different seed.

Proxy Wirelength Proxy Congestion Proxy Density
mean 0.1198 0.9718 0.5729
std 0.0019 0.0346 0.0086

Testing

# Runs tests with nightly TF-Agents.
$  tox -e py37,py38,py39
# Runs with latest stable TF-Agents.
$  tox -e py37-nightly,py38-nightly,py39-nightly

# Using our Docker for CI.
## Build the docker
$  docker build --tag circuit_training:ci -f tools/docker/ubuntu_ci tools/docker/
## Runs tests with nightly TF-Agents.
$  docker run -it --rm -v $(pwd):/workspace --workdir /workspace circuit_training:ci \
     tox -e py37-nightly,py38-nightly,py39-nightly
## Runs tests with latest stable TF-Agents.
$  docker run -it --rm -v $(pwd):/workspace --workdir /workspace circuit_training:ci \
     tox -e py37,py38,py39

Releases

While we recommend running at HEAD, we have tagged the code base to mark compatibility with stable releases of the underlying libraries.

Release Branch / Tag TF-Agents
HEAD main tf-agents-nightly
0.0.1 v0.0.1 tf-agents==0.11.0

Follow this pattern to utilize the tagged releases:

$  git clone https://github.com/google-research/circuit-training.git
$  cd circuit-training
# Checks out the tagged version listed in the table in the releases section.
$  git checkout v0.0.1
# Installs the corresponding version of TF-Agents along with Reverb and
# Tensorflow from the table.
$  pip install tf-agents[reverb]==x.x.x
# Copies the placement cost binary to /usr/local/bin and makes it executable.
$  sudo curl https://storage.googleapis.com/rl-infra-public/circuit-training/placement_cost/plc_wrapper_main \
     -o  /usr/local/bin/plc_wrapper_main
$  sudo chmod 555 /usr/local/bin/plc_wrapper_main

How to contribute

We're eager to collaborate with you! See CONTRIBUTING for a guide on how to contribute. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code of conduct.

Principles

This project adheres to Google's AI principles. By participating, using or contributing to this project you are expected to adhere to these principles.

Main Contributors

We would like to recognize the following individuals for their code contributions, discussions, and other work to make the release of the Circuit Training library possible.

  • Sergio Guadarrama
  • Summer Yue
  • Ebrahim Songhori
  • Joe Jiang
  • Toby Boyd
  • Azalia Mirhoseini
  • Anna Goldie
  • Mustafa Yazgan
  • Shen Wang
  • Terence Tam
  • Young-Joon Lee
  • Roger Carpenter
  • Quoc Le
  • Ed Chi

How to cite

If you use this code, please cite both:

@article{mirhoseini2021graph,
  title={A graph placement methodology for fast chip design},
  author={Mirhoseini, Azalia and Goldie, Anna and Yazgan, Mustafa and Jiang, Joe
  Wenjie and Songhori, Ebrahim and Wang, Shen and Lee, Young-Joon and Johnson,
  Eric and Pathak, Omkar and Nazi, Azade and Pak, Jiwoo and Tong, Andy and
  Srinivasa, Kavya and Hang, William and Tuncer, Emre and V. Le, Quoc and
  Laudon, James and Ho, Richard and Carpenter, Roger and Dean, Jeff},
  journal={Nature},
  volume={594},
  number={7862},
  pages={207--212},
  year={2021},
  publisher={Nature Publishing Group}
}
@misc{CircuitTraining2021,
  title = {{Circuit Training}: An open-source framework for generating chip
  floor plans with distributed deep reinforcement learning.},
  author = {Guadarrama, Sergio and Yue, Summer and Boyd, Toby and Jiang, Joe
  Wenjie and Songhori, Ebrahim and Tam, Terence and Mirhoseini, Azalia},
  howpublished = {\url{https://github.com/google_research/circuit_training}},
  url = "https://github.com/google_research/circuit_training",
  year = 2021,
  note = "[Online; accessed 21-December-2021]"
}

Disclaimer

This is not an official Google product.

Owner
Google Research
Google Research
Transfer Learning for Pose Estimation of Illustrated Characters

bizarre-pose-estimator Transfer Learning for Pose Estimation of Illustrated Characters Shuhong Chen *, Matthias Zwicker * WACV2022 [arxiv] [video] [po

Shuhong Chen 142 Dec 28, 2022
Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

JunSeok 8 Oct 15, 2021
A mini lib that implements several useful functions binding to PyTorch in C++.

Torch-gather A mini library that implements several useful functions binding to PyTorch in C++. What does gather do? Why do we need it? When dealing w

maxwellzh 8 Sep 07, 2022
mmdetection version of TinyBenchmark.

introduction This project is an mmdetection version of TinyBenchmark. TODO list: add TinyPerson dataset and evaluation add crop and merge for image du

34 Aug 27, 2022
Adversarial vulnerability of powerful near out-of-distribution detection

Adversarial vulnerability of powerful near out-of-distribution detection by Stanislav Fort In this repository we're collecting replications for the ke

Stanislav Fort 9 Aug 30, 2022
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Miloš Stanojević 11 Oct 14, 2022
An open-source outlier detection package by Getcontact Data Team

pyfbad The pyfbad library supports anomaly detection projects. An end-to-end anomaly detection application can be written using the source codes of th

Teknasyon Tech 41 Dec 27, 2022
Mining-the-Social-Web-3rd-Edition - The official online compendium for Mining the Social Web, 3rd Edition (O'Reilly, 2018)

Mining the Social Web, 3rd Edition The official code repository for Mining the Social Web, 3rd Edition (O'Reilly, 2019). The book is available from Am

Mikhail Klassen 838 Jan 01, 2023
A PyTorch implementation of EfficientNet and EfficientNetV2 (coming soon!)

EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor

Luke Melas-Kyriazi 7.2k Jan 06, 2023
Quantized tflite models for ailia TFLite Runtime

ailia-models-tflite Quantized tflite models for ailia TFLite Runtime About ailia TFLite Runtime ailia TF Lite Runtime is a TensorFlow Lite compatible

ax Inc. 13 Dec 23, 2022
Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral) [Project] [Paper] [Demo] [Related Work: A2RL (for Auto Image Cropping)] [C

Wu Huikai 402 Dec 27, 2022
Auto-updating data to assist in investment to NEPSE

Symbol Ratios Summary Sector LTP Undervalued Bonus % MEGA Strong Commercial Banks 368 5 10 JBBL Strong Development Banks 568 5 10 SIFC Strong Finance

Amit Chaudhary 16 Nov 01, 2022
Project Tugas Besar pertama Pengenalan Komputasi Institut Teknologi Bandung

Vending_Machine_(Mesin_Penjual_Minuman) Project Tugas Besar pertama Pengenalan Komputasi Institut Teknologi Bandung Raw Sketch untuk Essay Ringkasan P

QueenLy 1 Nov 08, 2021
An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

SafeGraph 99 Dec 11, 2022
Code for Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions

EMS-COLS-recourse Initial Code for Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions Folder structure: data folder contains raw an

Prateek Yadav 1 Nov 25, 2022
A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

LegoNet This code is the implementation of ICML2019 paper LegoNet: Efficient Convolutional Neural Networks with Lego Filters Run python train.py You c

YangZhaohui 140 Sep 26, 2022
A Python package for faster, safer, and simpler ML processes

Bender 🤖 A Python package for faster, safer, and simpler ML processes. Why use bender? Bender will make your machine learning processes, faster, safe

Otovo 6 Dec 13, 2022
Official PyTorch implementation of "Improving Face Recognition with Large AgeGaps by Learning to Distinguish Children" (BMVC 2021)

Inter-Prototype (BMVC 2021): Official Project Webpage This repository provides the official PyTorch implementation of the following paper: Improving F

Jungsoo Lee 16 Jun 30, 2022
PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 Oral paper PiCO; also see our Project

王皓波 147 Jan 07, 2023
Fuwa-http - The http client implementation for the fuwa eco-system

Fuwa HTTP The HTTP client implementation for the fuwa eco-system Example import

Fuwa 2 Feb 16, 2022