Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

Last update: Dec 05, 2022

Overview

StochFuzz: A New Solution for Binary-only Fuzzing

StochFuzz is a (probabilistically) sound and cost-effective fuzzing technique for stripped binaries. It is facilitated by a novel incremental and stochastic rewriting technique that is particularly suitable for binary-only fuzzing. Any AFL-based fuzzer, which takes edge coverage (defined by AFL) as runtime feedback, can acquire benefits from StochFuzz to directly fuzz stripped binaries.

More data and the results of the experiments can be found here. Example cases of leveraging StochFuzz to improve advanced AFL-based fuzzers (AFL++ and Polyglot) can be found in system.md.

Clarifications

We adopt a new system design than the one from the paper. Details can be found at system.md.
In the paper, when we are talking about e9patch, we are actually talking about the binary-only fuzzing tool built upon e9patch, namely e9tool. Please refer to its website for more details.
StochFuzz provides sound rewriting for binaries without inlined data, and probabilistically sound rewriting for the rest.

Building StochFuzz

StochFuzz is built upon Keystone, Capstone, GLib, and libunwind.

These dependences can be built by build.sh. If you are trying to build StochFuzz in a clean container, make sure some standard tools like autoreconf and libtool are installed.

$ git clone https://github.com/ZhangZhuoSJTU/StochFuzz.git
$ cd StochFuzz
$ ./build.sh

StochFuzz itself can be built by GNU Make.

$ cd src
$ make release

We have tested StochFuzz on Ubuntu 18.04. If you have any issue when running StochFuzz on other systems, please kindly let us know.

How to Use

StochFuzz provides multiple rewriting options, which follows the AFL's style of passing arguments.

$ ./stoch-fuzz -h
stoch-fuzz 1.0.0 by <[email protected]>

./stoch-fuzz [ options ] -- target_binary [ ... ]

Mode settings:

  -S            - start a background daemon and wait for a fuzzer to attach (defualt mode)
  -R            - dry run target_binary with given arguments without an attached fuzzer
  -P            - patch target_binary without incremental rewriting
  -D            - probabilistic disassembly without rewriting
  -V            - show currently observed breakpoints

Rewriting settings:

  -g            - trace previous PC
  -c            - count the number of basic blocks with conflicting hash values
  -d            - disable instrumentation optimization
  -r            - assume the return addresses are only used by RET instructions
  -e            - install the fork server at the entrypoint instead of the main function
  -f            - forcedly assume there is data interleaving with code
  -i            - ignore the call-fallthrough edges to defense RET-misusing obfuscation

Other stuff:

  -h            - print this help
  -x execs      - set the number of executions after which a checking run will be triggered
                  set it as zero to disable checking runs (default: 200000)
  -t msec       - set the timeout for each daemon-triggering execution
                  set it as zero to ignore the timeout (default: 2000 ms)
  -l level      - set the log level, including INFO, WARN, ERROR, and FATAL (default: INFO)

Basic Usage

- It is worth first trying the advanced strategy (see below) because that is much more cost-effective.

To fuzz a stripped binary, namely example.out, we need to cd to the directory of the target binary. For example, if the full path of example.out is /root/example.out, we need to first cd /root/. Furthermore, it is dangerous to run two StochFuzz instances under the same directory. These restrictions are caused by some design faults and we will try to relax them in the future.

Assuming StochFuzz is located at /root/StochFuzz/src/stoch-fuzz, execute the following command to start rewriting the target binary.

$ cd /root/
$ /root/StochFuzz/src/stoch-fuzz -- example.out # do not use ./example.out here

After the initial rewriting, we will get a phantom file named example.out.phantom. This phantom file can be directly fuzzed by AFL or any AFL-based fuzzer. Note that the StochFuzz process would not stop during fuzzing, so please make sure the process is alive during fuzzing.

Here is a demo that shows how StochFuzz works.

Advanced Usage

Compared with the compiler-based instrumentation (e.g., afl-clang-fast), StochFuzz has additional runtime overhead because it needs to emulate each CALL instruction to support stack unwinding.

Inspired by a recent work, we provide an advanced rewriting strategy where we do not emulate CALL instructions but wrap the _ULx86_64_step function from libunwind to support stack unwinding. This strategy works for most binaries but may fail in some cases like fuzzing statically linked binaries.

To enable such strategy, simply provide a -r option to StochFuzz.

$ cd /root/
$ /root/StochFuzz/src/stoch-fuzz -r -- example.out # do not use ./example.out here

Addtionally, before fuzzing, we need to prepare the AFL_PRELOAD environment variable for AFL.

$ export STOCHFUZZ_PRELOAD=$(/root/StochFuzz/scritps/stochfuzz_env.sh)
$ AFL_PRELOAD=$STOCHFUZZ_PRELOAD afl-fuzz -i seeds -o output -t 2000 -- example.out.phantom @@

Following demo shows how to apply this advanced strategy.

Troubleshootings

Common issues can be referred to trouble.md. If it cannot help solve your problem, please kindly open a Github issue.

Besides, we provide some tips on using StochFuzz, which can be found at tips.md

Development

Currently, we have many todo items. We present them in todo.md.

We also present many pending decisions which we are hesitating to take, in todo.md. If you have any thought/suggestion, do not hesitate to let us know. It would be very appreciated if you can help us improve StochFuzz.

StochFuzz should be considered an alpha-quality software and it is likely to contain bugs.

I will try my best to maintain StochFuzz timely, but sometimes it may take me more time to respond. Thanks for your understanding in advance.

Cite

Zhang, Zhuo, et al. "STOCHFUZZ: Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting." 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021.

References

Duck, Gregory J., Xiang Gao, and Abhik Roychoudhury. "Binary rewriting without control flow recovery." Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 2020.
Meng, Xiaozhu, and Weijie Liu. "Incremental CFG patching for binary rewriting." Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 2021.
Aschermann, Cornelius, et al. "Ijon: Exploring deep state spaces via fuzzing." 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020.
Google. “Google/AFL.” GitHub, github.com/google/AFL.

Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

Related tags

Overview

StochFuzz: A New Solution for Binary-only Fuzzing

Clarifications

Building StochFuzz

How to Use

Basic Usage

Advanced Usage

Troubleshootings

Development

Cite

References

Owner

Zhuo Zhang

Official PyTorch implementation for "Low Precision Decentralized Distributed Training with Heterogenous Data"

Project repo for Learning Category-Specific Mesh Reconstruction from Image Collections

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

This is the official PyTorch implementation of our paper: "Artistic Style Transfer with Internal-external Learning and Contrastive Learning".

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

JupyterNotebook - C/C++, Javascript, HTML, LaTex, Shell scripts in Jupyter Notebook Also run them on remote computer

CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

SeqAttack: a framework for adversarial attacks on token classification models

Unified file system operation experience for different backend

Easily pull telemetry data and create beautiful visualizations for analysis.

Texture mapping with variational auto-encoders

《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

(CVPR 2021) Lifting 2D StyleGAN for 3D-Aware Face Generation

Implementation for Shape from Polarization for Complex Scenes in the Wild

Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

A simple AI that will give you si ple task and this is made with python