QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions.

Overview

Qsynthesis

QSynthesis is a Python3 API to perform I/O based program synthesis of bitvector expressions. It aims at facilitating code deobfuscation. The algorithm is greybox approach combining both a blackbox I/O based synthesis and a whitebox AST search to synthesize sub-expressions (if the root node cannot be synthesized).

This algorithm as originaly been described at the BAR academic workshop:

The code has been release as part of the following Black Hat talk:

Disclaimer: This framework is experimental, and shall only be used for experimentation purposes. It mainly aims at stimulating research in this area.

Documentation

The installation, examples, and API documentation is available on the dedicated documentation: Documentation

Functionalities

The core synthesis is based on Triton symbolic engine on which is built the whole framework. It provides the following functionalities:

  • synthesis of bitvector expressions
  • ability to check through SMT the semantic equivalence of synthesized expressions
  • ability to synthesize constants (if the expression encode a constant)
  • ability to improve oracles (pre-computed tables) overtime through a learning mechanism
  • ability to reassemble synthesized expression back to assembly
  • ability to serve oracles through a REST API to facilitate the synthesis usage
  • an IDA plugin providing an integration of the synthesis

Quick start

Installation

In order to work Triton first has to be installed: install documentation. Triton does not automatically install itself in a virtualenv, copy it in your venv or use --system-site-packages when configuring your venv.

Then:

$ git clone [email protected]:synthesis/qsynthesis.git
$ cd qsynthesis
$ pip3 install '.[all]'

The [all] will installed all dependencies (see the documentation for a light install).

Table generation

The synthesis algorithm requires generating oracle tables derived from a grammar (a set of variables and operators). Qsynthesis installation provides the utility qsynthesis-table-manager enabling manipulating tables. The following command generate a table with 3 variables of 64 bits, 5 operators using a vector of 16 inputs. We limit the generation to 5 million entries.

$ qsynthesis-table-manager generate -bs 64 --var-num 3 --input-num 16 --random-level 5 --ops AND,NEG,MUL,XOR,NOT --watchdog 80 --limit 5000000 my_oracle_table
Generate Table
Watchdog value: 80.0
Depth 2 (size:3) (Time:0m0.23120s)
Depth 3 (size:21) (Time:0m0.23198s)
Depth 4 (size:574) (Time:0m0.26068s)
Depth 5 (size:400858) (Time:0m21.23231s)
Threshold reached, generation interrupted
Stop required
Depth 5 (size:5000002) (Time:4m52.56009s) [RAM:9.52Gb]

Note: The generation process is RAM consuming the --watchdog enables setting a percentage of the RAM above which the generation is interrupted.

Synthesizing a bitvector expression

We then can try simplifying a seemingly obfuscated expression with:

from qsynthesis import SimpleSymExec, TopDownSynthesizer, InputOutputOracleLevelDB

blob = b'UH\x89\xe5H\x89}\xf8H\x89u\xf0H\x89U\xe8H\x89M\xe0L\x89E\xd8H\x8bE' \
       b'\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0H\x8dH\x01H\x8b' \
       b'E\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2H)\xd0H\x83\xe8' \
       b'\x02H!\xc1H\x8bE\xe0H\xf7\xd0H\x0bE\xf8H\x89\xc2H\x8bE\xe0H\x01\xd0' \
       b'H\x8dp\x01H\x8bE\xf8H+E\xe8H\x8bU\xe8H\xf7\xd2H\x0bU\xf8H\x01\xd2' \
       b'H)\xd0H\x83\xe8\x02H\t\xf0H)\xc1H\x89\xc8H\x83\xe8\x01]\xc3'

# Perform symbolic execution of the instructions
symexec = SimpleSymExec("x86_64")
symexec.initialize_register('rip', 0x40B160)  # arbitrary address
symexec.initialize_register('rsp', 0x800000)  # arbitrary stack
symexec.execute_blob(blob, 0x40B160)
rax = symexec.get_register_ast("rax")  # retrieve rax register expressions

# Load lookup tables
ltm = InputOutputOracleLevelDB.load("my_oracle_table")

# Perform Synthesis of the expression
synthesizer = TopDownSynthesizer(ltm)
synt_rax, simp = synthesizer.synthesize(rax)

print(f"expression: {rax.pp_str}")
print(f"synthesized expression: {synt_rax.pp_str} [{simp}]")

Limitations

  • synthesis accuracy limited by pre-computed tables exhaustivness
  • table generation limited by RAM consumption
  • reassembly cannot involve memory variable, destination is necessarily a register and architecture depends on llvmlite (thus mostly x86_64)
  • the code references trace-based synthesis which is disabled (as the underlying framework is not yet open-source)

Authors

  • Robin David (@RobinDavid), Quarkslab

Contributors

Huge thanks to contributors to this research:

  • Luigi Coniglio
  • Jonathan Salwan
You might also like...
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour

A stack-based systems language that supports structures, functions, expressions, and user-defined operator behaviour. Currently compiles to URCL with plans to add additional formats in the future.

a BTC mining program based on python3

BTC-Miner a BTC mining program based on python3 Our project refers to the nightminer project by ricmoo, which is written in Python2 (https://github.co

texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Custom Python linting through AST expressions
Custom Python linting through AST expressions

bellybutton bellybutton is a customizable, easy-to-configure linting engine for Python. What is this good for? Tools like pylint and flake8 provide, o

Graphical Python debugger which lets you easily view the values of all evaluated expressions
Graphical Python debugger which lets you easily view the values of all evaluated expressions

birdseye birdseye is a Python debugger which records the values of expressions in a function call and lets you easily view them after the function exi

A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Turning SymPy expressions into PyTorch modules.

sympytorch A micro-library as a convenience for turning SymPy expressions into PyTorch Modules. All SymPy floats become trainable parameters. All SymP

Turning SymPy expressions into JAX functions

sympy2jax Turn SymPy expressions into parametrized, differentiable, vectorizable, JAX functions. All SymPy floats become trainable input parameters. S

The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.
The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

A collection of common regular expressions bundled with an easy to use interface.

CommonRegex Find all times, dates, links, phone numbers, emails, ip addresses, prices, hex colors, and credit card numbers in a string. We did the har

Enable ++x and --x expressions in Python

By default, Python supports neither pre-increments (like ++x) nor post-increments (like x++). However, the first ones are syntactically correct since Python parses them as two subsequent +x operations, where + is the unary plus operator (same with --x and the unary minus). They both have no effect, since in practice -(-x) == +(+x) == x.

Karen is a Discord Bot that will check for a list of forbidden words/expressions, removing the message that contains them and replying with another message.

Karen is a Discord Bot that will check for a list of forbidden words/expressions, removing the message that contains them and replying with another message. Everything is highly customizable.

MacroTools provides a library of tools for working with Julia code and expressions.

MacroTools.jl MacroTools provides a library of tools for working with Julia code and expressions. This includes a powerful template-matching system an

A library for pattern matching on symbolic expressions in Python.
A library for pattern matching on symbolic expressions in Python.

MatchPy is a library for pattern matching on symbolic expressions in Python. Work in progress Installation MatchPy is available via PyPI, and

poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions

poetry2nix poetry2nix turns Poetry projects into Nix derivations without the need to actually write Nix expressions. It does so by parsing pyproject.t

Comments
  • unrolling issue with PlaceHolderSynthesizer

    unrolling issue with PlaceHolderSynthesizer

    The PlaceHolderSynthesizer seem's not to properly replace placeholder variable during the synthesis process, on some use-cases. The resulting synthesized expression is thus completely wrong. I need to dig deeper.

    opened by RobinDavid 7
Owner
Quarkslab
Quarkslab
Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule.

simaster.ics Small Python script to generate a calendar (.ics) file from SIMASTER courses schedule. Usage Getting the events.json file from SIMASTER O

Faiz Jazadi 8 Nov 02, 2022
An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

Mike Orr 506 Dec 29, 2022
Simple Python File Manager

This script lets you automatically relocate files based on their extensions. Very useful from the downloads folder !

Aimé Risson 22 Dec 27, 2022
CSV To VCF (Multiples en un archivo)

CSV To VCF Convierte archivo CSV a Tarjeta VCF (varias en una) How to use En main.py debes reemplazar CONTACTOS.csv por tu archivo csv, y debes respet

Jorge Ivaldi 2 Jan 12, 2022
Python library for reading and writing tabular data via streams.

tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame

Frictionless Data 231 Dec 09, 2022
A Python script to backup your favorite Discord gifs

About the project Discord recently felt like it would be a good idea to limit the favorites to 250, which made me lose most of my gifs... Luckily for

4 Aug 03, 2022
Annotate your Python requirements.txt file with summaries of each package.

Summarize Requirements 🐍 📜 Annotate your Python requirements.txt file with a short summary of each package. This tool: takes a Python requirements.t

Zeke Sikelianos 8 Apr 22, 2022
BOOTH宛先印刷用CSVから色々な便利なリストを作成してCSVで出力するプログラムです。

BOOTH注文リスト作成スクリプト このPythonスクリプトは、BOOTHの「宛名印刷用CSV」から、 未発送の注文 今月の注文 特定期間の注文 を抽出した上で、各注文を商品毎に一覧化したCSVとして出力するスクリプトです。 簡単な使い方 ダウンロード 通常は、Relaseから、booth_ord

hinananoha 1 Nov 28, 2021
RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem

RMfuse provides access to your reMarkable Cloud files in the form of a FUSE filesystem. These files are exposed either in their original format, or as PDF files that contain your annotations. This le

Robert Schroll 82 Nov 24, 2022
Kartothek - a Python library to manage large amounts of tabular data in a blob store

Kartothek - a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store

15 Dec 25, 2022
Provides a convenient way to append numpy arrays to a file.

Provides a convenient way to append numpy arrays to a file. The NpendWriter and NpendReader classes are used to write and read numpy arrays respective

3 May 14, 2022
This simple python script pcopy reads a list of file names and copies them to a separate folder

pCopy This simple python script pcopy reads a list of file names and copies them to a separate folder. Pre-requisites Python 3 (ver. 3.6) How to use

Madhuranga Rathnayake 0 Sep 03, 2021
Convert All TXT Files To One File.

AllToOne Convert All TXT Files To One File. Hi 👋 , I'm Alireza A Python Developer Boy 🔭 I’m currently working on my C# projects 🌱 I’m currently Lea

4 Jun 07, 2022
A simple file sharing tool written in python

Share it A simple file sharing tool written in python Installation If you are using Windows os you can directly Run .exe file -- download If you are

Sachit Yadav 7 Dec 16, 2022
A python script generate password files in plain text

KeePass (or any desktop pw manager?) Helper WARNING: This script will generate password files in plain text. ITS NOT SECURE. I needed help remembering

Eric Thomas 1 Nov 21, 2021
Uproot is a library for reading and writing ROOT files in pure Python and NumPy.

Uproot is a library for reading and writing ROOT files in pure Python and NumPy. Unlike the standard C++ ROOT implementation, Uproot is only an I/O li

Scikit-HEP Project 164 Dec 31, 2022
A simple bulk file renamer, written in python.

Python File Editor A simple bulk file renamer, written in python. There are two functions, the bulk rename and the bulk file extention change. Bulk Fi

Sam Bloomfield 2 Dec 22, 2021
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Samuel Ko 1 Feb 05, 2022
PyDeleter - delete a specifically formatted file in a directory or delete all other files

PyDeleter If you want to delete a specifically formatted file in a directory or delete all other files, PyDeleter does it for you. How to use? 1- Down

Amirabbas Motamedi 1 Jan 30, 2022
Test app for importing contact information in CSV files.

Contact Import TestApp Test app for importing contact information in CSV files. Explore the docs » · Report Bug · Request Feature Table of Contents Ab

1 Feb 06, 2022