PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

Related tags

Deep Learningperffuzz
Overview

PerfFuzz

Performance problems in software can arise unexpectedly when programs are provided with inputs that exhibit pathological behavior. But how can we find these inputs in the first place? PerfFuzz can generate such inputs automatically: given a program and at least one seed input, PerfFuzz automatically generates inputs that exercise pathological behavior across program locations, without any domain knowledge.

PerfFuzz uses multi-dimensional performance feedback and independently maximizes execution counts for all program locations. This enables PerfFuzz to find a variety of inputs that exercise distinct hot spots in a program.

Read the ISSTA paper for more details.

Built by Caroline Lemieux ([email protected]) and Rohan Padhye ([email protected]) on top of Michal Zalewski's ([email protected]) AFL.

Building PerfFuzz

To build on *nix machines, run

make

in the perffuzz directory. Since PerfFuzz is built on AFL, it will not build on Windows machines. You will also need to build PerfFuzz's instrumenting compiler, which can be done by running

cd llvm_mode
make
cd ..

in the perffuzz directory, after having built PerfFuzz.

  • Q: What version of clang should I use?

  • A: PerfFuzz was evaluated with clang-3.8.0 on Linux and works with verison 8 on Mac. To experiment with different clang/LLVM version, add the bin/ directory from the pre-build clang archives to the front of your PATH when compiling.

  • Q: I'm getting an error involving the -fno-rtti option.

  • A: If you're on Redhat Linux, this may be a gcc/clang compatibility issue. Apparently gcc-4.7 fixes the issue.

Test PerfFuzz on Insertion Sort

To check whether PerfFuzz is working correctly, try running it on the insertion sort benchmark provided. The following commands assume you are in the PerfFuzz directory.

Build

First, compile the benchmark:

./afl-clang-fast insertion-sort.c -o isort

Run PerfFuzz

Let's make some seeds for PerfFuzz to start with:

mkdir isort-seeds
head -c 64 /dev/zero > isort-seeds/zeroes

Now we can run PerfFuzz:

./afl-fuzz -p -i isort-seeds -o isort_perf_test/ -N 64 ./isort @@

You should see the number of total paths (this is a misnomer; it's just the number of saved inputs) increase consistently. You can also check to see if the saved inputs are heading towards a worst-case by running

for i in isort_perf_test/queue/id*; do ./isort $i | grep comps; done

(which, for each saved input, plots the number of comparisons insertion sort performed while sorting that input)

For comparison with the performance compared to regular afl, you can run: ./afl-fuzz -i isort-seeds -o isort_afl_test/ -N 64 ./isort @@ without the -p option, this should just run regular AFL. You should see total_paths quickly topping out around ~20 or so, and the number of cycles increase a lot. There will probably be much fewer comparisons performed for the saved inputs as well. The highest number of comparisons printed when you run:

for i in isort_afl_test/queue/id*; do ./isort $i | grep comps; done

should be smaller than what you saw for the inputs in isort_perf_test/queue.

Running PerfFuzz on a program of your choice

Compile your program with PerfFuzz

To compile your C/C++ program with perffuzz, replace CC (resp. CXX) with path/to/perffuzz/afl-clang-fast (resp. path/to/perffuzz/afl-clang-fast++) in your build process. See section (3) of README (not README.md) for more details, replacing references of path/to/afl/afl-gcc with path/to/perffuzz/afl-clang-fast.

  • Q: afl-clang-fast doesn't exist!
  • A: make sure you ran make in the llvm_mode directory (see "Building PerfFuzz")

Run PerfFuzz on your program.

In short, follow the instructions in README (regular AFL readme) section 6, but add the -p option to enable PerfFuzz, and the -N num option to restrict the size of produced inputs to a maximum file size of num. Make sure your initial seed inputs (in the input directory) are of smaller size than num bytes!

On many programs (including the benchmarks in the paper), the -d option (Fidgety mode) offers better performance.

Let PerfFuzz run for as long as you like: we ran for a few hours on larger benchmarks.

Interpret PerfFuzz results.

In the queue directory of the ouput directory, inputs postfixed with +max were saved because the maximized a performance key.

We provide some tools to help analyze the results. Notably, afl-showmax can print:

  1. The total path length (default)
  2. The maximum hotspot (-x option)
  3. The entire performance map in a key:value format (-a option)

To build afl-showmax, run

make afl-showmax

in the PerfFuzz directory.

You might also like...
This repository contains the code for the paper
This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

A testcase generation tool for Persistent Memory Programs.

PMFuzz PMFuzz is a testcase generation tool to generate high-value tests cases for PM testing tools (XFDetector, PMDebugger, PMTest and Pmemcheck) If

Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Chex Chex is a library of utilities for helping to write reliable JAX code. This includes utils to help: Instrument your code (e.g. assertions) Debug

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

In this project, two programs can help you take full agvantage of time on the model training with a remote server

In this project, two programs can help you take full agvantage of time on the model training with a remote server, which can push notification to your phone about the information during model training, like the model indices and unexpected interrupts. Then you can do something in time for your work.

An NVDA add-on to split screen reader and audio from other programs to different sound channels

An NVDA add-on to split screen reader and audio from other programs to different sound channels (add-on idea credit: Tony Malykh)

Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter
Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter

Programas en Python Algunos programas simples creados en Python: 📹 Webcam con c

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

Comments
  • test of llvm_mode fails

    test of llvm_mode fails

    Hi,

    On a recent Arch Linux, when building llvm_mode, I'm getting:

    [email protected]:llvm_mode$ make
    [*] Checking for working 'llvm-config'...
    [*] Checking for working 'clang'...
    [*] Checking for '../afl-showmap'...
    [+] All set and ready to build.
    clang -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DAFL_PATH=\"/usr/local/lib/afl\" -DBIN_PATH=\"/usr/local/bin\" -DVERSION=\"2.52b\"  afl-clang-fast.c -o ../afl-clang-fast 
    ln -sf afl-clang-fast ../afl-clang-fast++
    clang++ `llvm-config --cxxflags` -fno-rtti -fpic -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DVERSION=\"2.52b\" -Wno-variadic-macros -shared afl-llvm-pass.so.cc -o ../afl-llvm-pass.so `llvm-config --ldflags` 
    clang -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DAFL_PATH=\"/usr/local/lib/afl\" -DBIN_PATH=\"/usr/local/bin\" -DVERSION=\"2.52b\"  -fPIC -shared afl-catch-dlclose.so.c -o ../afl-catch-dlclose.so
    clang -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DAFL_PATH=\"/usr/local/lib/afl\" -DBIN_PATH=\"/usr/local/bin\" -DVERSION=\"2.52b\"  -fPIC -c afl-llvm-rt.o.c -o ../afl-llvm-rt.o
    afl-llvm-rt.o.c:99:20: warning: incompatible pointer types assigning to 'u32 *' (aka 'unsigned int *') from 'u8 *' (aka 'unsigned char *') [-Wincompatible-pointer-types]
        __afl_perf_ptr = &__afl_area_ptr[MAP_SIZE];
                       ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
    1 warning generated.
    [*] Building 32-bit variant of the runtime (-m32)... success!
    [*] Building 64-bit variant of the runtime (-m64)... success!
    [*] Testing the CC wrapper and instrumentation output...
    unset AFL_USE_ASAN AFL_USE_MSAN AFL_INST_RATIO; AFL_QUIET=1 AFL_PATH=. AFL_CC=clang ../afl-clang-fast -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DAFL_PATH=\"/usr/local/lib/afl\" -DBIN_PATH=\"/usr/local/bin\" -DVERSION=\"2.52b\"  ../test-instr.c -o test-instr 
    echo 0 | ../afl-showmap -m none -q -o .test-instr0 ./test-instr
    echo 1 | ../afl-showmap -m none -q -o .test-instr1 ./test-instr
    
    Oops, the instrumentation does not seem to be behaving correctly!
    
    Please ping <[email protected]> to troubleshoot the issue.
    
    make: *** [Makefile:105: test_build] Error 1**
    

    It was a full normal compile, so I'm a bit confused. Is the test incorrectly set up for perffuzz and hasn't been changed/fixed?

    opened by msoos 7
  • Prioritize maximizing values with more granularity

    Prioritize maximizing values with more granularity

    Some values in the key: value map may be more worth increasing than others (either more interesteing, or others may just not increase). Two ideas:

    1. Favour based on the key achieving maximum value (similar to afl-rb's minimizing branch hits)
    2. Favour based on whether value is actually increasing.
    opened by carolemieux 3
  • What is Perf_Mask in the instrumentation pass?

    What is Perf_Mask in the instrumentation pass?

    Hey, I am trying to do some thing new on PerfFuzz. But there is one thing in the code I am confused.

    What is the purpose of this Perf_Mask? https://github.com/carolemieux/perffuzz/blob/f937f370555d0c54f2109e3b1aa5763f8defe337/llvm_mode/afl-llvm-pass.so.cc#L129

    I don't think it is correct to add Perf_Mask to Edge_Id to create a GEP instruction in PerfBranchPtr https://github.com/carolemieux/perffuzz/blob/f937f370555d0c54f2109e3b1aa5763f8defe337/llvm_mode/afl-llvm-pass.so.cc#L176 https://github.com/carolemieux/perffuzz/blob/f937f370555d0c54f2109e3b1aa5763f8defe337/llvm_mode/afl-llvm-pass.so.cc#L177

    However, EdgeId % PERF_SIZE is acctually needed to index the perf map.

    Looking forward to your reply, thanks.

    opened by zhanggenex 1
  • Rename staleness

    Rename staleness

    Find a new name for staleness which is either (1) more intuitive or (2) involves the use of the word "gradient".

    Suggestions What we currently use as staleness is really the inverse of what all these things could be...

    • magnitude-agnostic gradient
    • increase gradient
    • binary gradient
    opened by carolemieux 0
Releases(1.0)
Owner
Caroline Lemieux
Caroline Lemieux
PyTorch implementation of paper A Fast Knowledge Distillation Framework for Visual Recognition.

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
BC3407-Group-5-Project - BC3407 Group Project With Python

BC3407-Group-5-Project As the world struggles to contain the ever-changing varia

1 Jan 26, 2022
offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

LifelongReID Offical implementation of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021 by Nan Pu, Wei Chen, Yu L

PeterPu 76 Dec 08, 2022
Python library for computer vision labeling tasks. The core functionality is to translate bounding box annotations between different formats-for example, from coco to yolo.

PyLabel pip install pylabel PyLabel is a Python package to help you prepare image datasets for computer vision models including PyTorch and YOLOv5. I

PyLabel Project 176 Jan 01, 2023
DLFlow is a deep learning framework.

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

DiDi 152 Oct 27, 2022
Unified MultiWOZ evaluation scripts for the context-to-response task.

MultiWOZ Context-to-Response Evaluation Standardized and easy to use Inform, Success, BLEU ~ See the paper ~ Easy-to-use scripts for standardized eval

Tomáš Nekvinda 38 Dec 13, 2022
CONditionals for Ordinal Regression and classification in tensorflow

Condor Ordinal regression in Tensorflow Keras Tensorflow Keras implementation of CONDOR Ordinal Regression (aka ordinal classification) by Garrett Jen

9 Jul 31, 2022
Cross View SLAM

Cross View SLAM This is the associated code and dataset repository for our paper I. D. Miller et al., "Any Way You Look at It: Semantic Crossview Loca

Ian D. Miller 99 Dec 09, 2022
Kalidokit is a blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models

Blendshape and kinematics solver for Mediapipe/Tensorflow.js face, eyes, pose, and hand tracking models.

Rich 4.5k Jan 07, 2023
git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction Code for the ECCV 2020 paper by Yiming Qian and Yasutaka Furukawa Getting

37 Dec 04, 2022
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.

OpenPCDet OpenPCDet is a clear, simple, self-contained open source project for LiDAR-based 3D object detection. It is also the official code release o

OpenMMLab 3.2k Dec 31, 2022
Weight initialization schemes for PyTorch nn.Modules

nninit Weight initialization schemes for PyTorch nn.Modules. This is a port of the popular nninit for Torch7 by @kaixhin. ##Update This repo has been

Alykhan Tejani 69 Jan 26, 2021
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
GULAG: GUessing LAnGuages with neural networks

GULAG: GUessing LAnGuages with neural networks Classify languages in text via neural networks. Привет! My name is Egor. Was für ein herrliches Frühl

Egor Spirin 12 Sep 02, 2022
RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation YouTube | BiliBili 16X interpolation results from two input images: Introd

旷视天元 MegEngine 28 Dec 09, 2022
SMIS - Semantically Multi-modal Image Synthesis(CVPR 2020)

Semantically Multi-modal Image Synthesis Project page / Paper / Demo Semantically Multi-modal Image Synthesis(CVPR2020). Zhen Zhu, Zhiliang Xu, Anshen

316 Dec 01, 2022
Trajectory Variational Autoencder baseline for Multi-Agent Behavior challenge 2022

MABe_2022_TVAE: a Trajectory Variational Autoencoder baseline for the 2022 Multi-Agent Behavior challenge This repository contains jupyter notebooks t

Andrew Ulmer 15 Nov 08, 2022
Self-supervised learning optimally robust representations for domain generalization.

OptDom: Learning Optimal Representations for Domain Generalization This repository contains the official implementation for Optimal Representations fo

Yangjun Ruan 18 Aug 25, 2022
Liecasadi - liecasadi implements Lie groups operation written in CasADi

liecasadi liecasadi implements Lie groups operation written in CasADi, mainly di

Artificial and Mechanical Intelligence 14 Nov 05, 2022
This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

Credit Card Fraud Detection Introduction Online transactions have become a crucial part of any business over the years. Many of those transactions use

Jonathan Hasbani 0 Jan 20, 2022