BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

Related tags

Data AnalysisDev
Overview

BinTuner

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings. it also can assist the binary code analysis research in generating more diversified datasets for training and testing. The BinTuner framework is based on OpenTuner, thanks to all contributors for their contributions.

The architecture of BinTuner:

image

The core on the server-side is a metaheuristic search engine (e.g., the genetic algorithm), which directs iterative compilation towards maximizing the effect of binary code differences.

The client-side runs different compilers (GCC, LLVM ...) and the calculation of the fitness function.

Both sides communicate valid optimization options, fitness function scores, and compiled binaries to each other, and these data are stored in a database for future exploration. When BinTuner reaches a termination condition, we select the iterations showing the highest fitness function score and output the corresponding binary code as the final outcomes.

System dependencies

A list of system dependencies can be found in packages-deps which are primarily python 2.6+ (not 3.x) and sqlite3.

On Ubuntu/Debian there can be installed with:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install `cat packages-deps | tr '\n' ' '`

Installation

Running it out of a git checkout, a list of python dependencies can be found in requirements.txt these can be installed system-wide with pip.

sudo apt-get install python-pip
sudo pip install -r requirements.txt

If you encounter an error message like this:

Could not find a version that satisfies the requirement fn>=0.2.12 (from -r requirements.txt (line 2)) (from versions:)
No matching distribution found for fn>=0.2.12 (from -r requirements.tet (line 2))

Please try again or install each manually

pip install fn>=0.2.12
...
pip install numpy>=1.8.0
...

If you encounter an error message like this:

ImportError: No module named lzma

Please install lzma

sudo apt-get install python-lzma

If you encounter an error message like this:

assert compile_result['returncode'] == 0
AssertionError

Please confirm how to use the compiler in your terminal, such as GCC or gcc-10.2.0 it needs to be modified in your .Py file

If you encounter an error message like this:

sqlalchemy.exc.OperationalError: (pysqlite2.dbapi2.OperationalError) database is locked [SQL: u'INSERT INTO tuning_run (uuid, program_version_id, machine_class_id, input_class_id, name, args, objective, state, start_date, end_date, final_config_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: ('b3311f3609ff4ce9aa40c0f9bb291d26', 1, None, None, 'unnamed', 
   
   
    
    , 
    
    
     
     , 'QUEUED', '2021-xx-xx 03:42:04.145932', None, None)] (Background on this error at: http://sqlalche.me/e/e3q8)

    
    
   
   

Just delete the DB file saved before (PATH:/examples/gccflags/opentuner.db/Your PC's Name.db).

Install Compiler

GCC

Check to see if the compiler is installed

e.g.

gcc -v  shows that
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)

Please note that there have different optimization options in different versions of compilers.

If you use the optimization options that are not included in this version of the compiler, the program can not run and report an error.

It is strongly recommended to confirm that the optimization options are in the official instructions of GCC or LLVM before using them.

e.g. GCC version 10.2.0.

You can also use the command to display all options in terminal

gcc --help=optimizers


The following options control optimizations:
  -O
   
   
    
                      Set optimization level to 
    
    
     
     .
  -Ofast                      Optimize for speed disregarding exact standards
                              compliance.
  -Og                         Optimize for debugging experience rather than
                              speed or size.
  -Os                         Optimize for space rather than speed.
  -faggressive-loop-optimizations Aggressively optimize loops using language
                              constraints.
  -falign-functions           Align the start of functions.
  -falign-jumps               Align labels which are only reached by jumping.
  -falign-labels              Align all labels.
  -falign-loops               Align the start of loops.
  ...


    
    
   
   

LLVM

clang -v

Check how to install LLVM here

https://apt.llvm.org/

https://clang.llvm.org/get_started.html

Checking Installation

Enter the following command in terminal to test:

[email protected]:~/BinTuner/examples/gccflags$ python main.py 2

You will see some info like this:

Program Start
************************ Z3 ************************
5- Result--> Unavailable
3- Result--> Available
[ Z3 return Results = first second True four False]
[ Changed "shrink-wrap" value ]
...
-------------------------------------------------

--- BinTuner ---
--- Command lines and compiler optimization options ---:
gcc benchmarks/bzip2.c -lm -o ./tmp0.bin -O3 -fauto-inc-dec -fbranch-count-reg -fno-combine-stack-adjustments 
-fcompare-elim -fcprop-registers -fno-dce -fdefer-pop -fdelayed-branch -fno-dse -fforward-propagate -fguess-branch-probability 
-fno-if-conversion2 -fno-if-conversion -finline-functions-called-once -fipa-pure-const -fno-ipa-profile -fipa-reference 
-fno-merge-constants -fmove-loop-invariants -freorder-blocks -fshrink-wrap -fsplit-wide-types -fno-tree-bit-ccp -fno-tree-ccp 
-ftree-ch -fno-tree-coalesce-vars -ftree-copy-prop -ftree-dce -fno-tree-dse -ftree-forwprop -fno-tree-fre -ftree-sink -fno-tree-slsr 
-fno-tree-sra -ftree-pta -ftree-ter -fno-unit-at-a-time -fno-omit-frame-pointer -ftree-phiprop -fno-tree-dominator-opts -fno-ssa-backprop 
-fno-ssa-phiopt -fshrink-wrap-separate -fthread-jumps -falign-functions -fno-align-labels -fno-align-labels -falign-loops -fno-caller-saves 
-fno-crossjumping -fcse-follow-jumps -fno-cse-skip-blocks -fno-delete-null-pointer-checks -fno-devirtualize -fdevirtualize-speculatively 
-fexpensive-optimizations -fno-gcse -fno-gcse-lm -fno-hoist-adjacent-loads -finline-small-functions -fno-indirect-inlining -fipa-cp 
-fipa-sra -fipa-icf -fno-isolate-erroneous-paths-dereference -fno-lra-remat -foptimize-sibling-calls -foptimize-strlen 
-fpartial-inlining -fno-peephole2 -fno-reorder-blocks-and-partition -fno-reorder-functions -frerun-cse-after-loop -fno-sched-interblock 
-fno-sched-spec -fno-schedule-insns -fno-strict-aliasing -fstrict-overflow -fno-tree-builtin-call-dce -fno-tree-switch-conversion 
-ftree-tail-merge -ftree-pre -fno-tree-vrp -fno-ipa-ra -freorder-blocks -fno-schedule-insns2 -fcode-hoisting -fstore-merging 
-freorder-blocks-algorithm=simple -fipa-bit-cp -fipa-vrp -fno-inline-functions -fno-unswitch-loops -fpredictive-commoning 
-fno-gcse-after-reload -fno-tree-loop-vectorize -ftree-loop-distribute-patterns -fno-tree-slp-vectorize -fvect-cost-model 
-ftree-partial-pre -fpeel-loops -fipa-cp-clone -fno-split-paths -ftree-vectorize --param early-inlining-insns=526 
--param gcse-cost-distance-ratio=12 --param iv-max-considered-uses=762
 -O3
--NCD:0.807842390787
---Test----
--Max:0
--Current:0
--Count:0
...

Results

The DB file saved in the PATH:/examples/gccflags/opentuner.db/Your PC's Name.db

Each sequence of compilation flags and the corresponding ncd value are saved in the db file.

Set up how many times to run

Please refer to the settings in main.py There are two strategies The default setting runs 100 times, if you want to modify it according to your own wishes this is ok. For example, by monitoring the change of NCD value in 100 times, if the cumulative change of 100 times increase is less than 5%, let's terminte it.

First-order formulas

We manually generate first-order formulas after understanding the compiler manual. The knowledge we learned is easy to move between the same compiler series---we only need to consider the different optimization options introduced by the new version.

We use Z3 Prover to analyze all generated optimization option sequences for conflicts and make changes to conflicting options for greater compiling success.

For more details, please refer Z3Prover.

Setting for Genetic Algorithm

The genetic algorithm is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms. Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover, and selection.

We tune four parameters for the genetic algorithm, including mutation_rate, crossover_rate, must_mutate_count, crossover_strength.

For more details, please refer globalGA.

Future Work

We are studying constructing custom optimization sequences that present the best tradeoffs between multiple objective functions (e.g., execution speed & NCD). To further reduce the total iterations of BinTuner, an exciting direction is to develop machine learning methods that correlate C language features with particular optimization options. In this way, we can predict program-specific optimization strategies that achieve the expected binary code differences.

Owner
BinTuner
BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.
BinTuner
Gaussian processes in TensorFlow

Website | Documentation (release) | Documentation (develop) | Glossary Table of Contents What does GPflow do? Installation Getting Started with GPflow

GPflow 1.7k Jan 06, 2023
BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings. it also can assist the binary code analysis rese

BinTuner 42 Dec 16, 2022
Generates a simple report about the current Covid-19 cases and deaths in Malaysia

Generates a simple report about the current Covid-19 cases and deaths in Malaysia. Results are delay one day, data provided by the Ministry of Health Malaysia Covid-19 public data.

Yap Khai Chuen 7 Dec 15, 2022
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 07, 2021
A set of tools to analyse the output from TraDIS analyses

QuaTradis (Quadram TraDis) A set of tools to analyse the output from TraDIS analyses Contents Introduction Installation Required dependencies Bioconda

Quadram Institute Bioscience 2 Feb 16, 2022
Includes all files needed to satisfy hw02 requirements

HW 02 Data Sets Mean Scale Score for Asian and Hispanic Students, Grades 3 - 8 This dataset provides insights into the New York City education system

7 Oct 28, 2021
nrgpy is the Python package for processing NRG Data Files

nrgpy nrgpy is the Python package for processing NRG Data Files Website and source: https://github.com/nrgpy/nrgpy Documentation: https://nrgpy.github

NRG Tech Services 23 Dec 08, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

Tuplex 791 Jan 04, 2023
Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks

The following Python scripts aim to use a Random Forest machine learning algorithm to predict the water affinity of Metal-Organic Frameworks (MOFs). The training set is extracted from the Cambridge S

1 Jan 09, 2022
Important dataframe statistics with a single command

quick_eda Receiving dataframe statistics with one command Project description A python package for Data Scientists, Students, ML Engineers and anyone

Sven Eschlbeck 2 Dec 19, 2021
Modular analysis tools for neurophysiology data

Neuroanalysis Modular and interactive tools for analysis of neurophysiology data, with emphasis on patch-clamp electrophysiology. Functions for runnin

Allen Institute 5 Dec 22, 2021
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

Battery Intelligence Lab 20 Sep 28, 2022
Pip install minimal-pandas-api-for-polars

Minimal Pandas API for Polars Install From PyPI: pip install minimal-pandas-api-for-polars Example Usage (see tests/test_minimal_pandas_api_for_polars

Austin Ray 6 Oct 16, 2022
Data cleaning tools for Business analysis

Datacleaning datacleaning tools for Business analysis This program is made for Vicky's work. You can use it, too. 数据清洗 该数据清洗工具是为了商业分析 这个程序是为了Vicky的工作而

Lin Jian 3 Nov 16, 2021
Program that predicts the NBA mvp based on data from previous years.

NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep

Muhammad Rabee 1 Jan 21, 2022
Churn prediction with PySpark

It is expected to develop a machine learning model that can predict customers who will leave the company.

3 Aug 13, 2021
peptides.py is a pure-Python package to compute common descriptors for protein sequences

peptides.py Physicochemical properties and indices for amino-acid sequences. 🗺️ Overview peptides.py is a pure-Python package to compute common descr

Martin Larralde 32 Dec 31, 2022
BErt-like Neurophysiological Data Representation

BENDR BErt-like Neurophysiological Data Representation This repository contains the source code for reproducing, or extending the BERT-like self-super

114 Dec 23, 2022
Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

Mohammed Hassan El Sayed 1 Jan 24, 2022
Cleaning and analysing aggregated UK political polling data.

Analysing aggregated UK polling data The tweet collection & storage pipeline used in email-service is used to also collect tweets from @britainelects.

Ajay Pethani 0 Dec 22, 2021