PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

Last update: Nov 18, 2022

Related tags

Overview

PerfFuzz

Performance problems in software can arise unexpectedly when programs are provided with inputs that exhibit pathological behavior. But how can we find these inputs in the first place? PerfFuzz can generate such inputs automatically: given a program and at least one seed input, PerfFuzz automatically generates inputs that exercise pathological behavior across program locations, without any domain knowledge.

PerfFuzz uses multi-dimensional performance feedback and independently maximizes execution counts for all program locations. This enables PerfFuzz to find a variety of inputs that exercise distinct hot spots in a program.

Read the ISSTA paper for more details.

Built by Caroline Lemieux ([email protected]) and Rohan Padhye ([email protected]) on top of Michal Zalewski's ([email protected]) AFL.

Building PerfFuzz

To build on *nix machines, run

make

in the perffuzz directory. Since PerfFuzz is built on AFL, it will not build on Windows machines. You will also need to build PerfFuzz's instrumenting compiler, which can be done by running

cd llvm_mode
make
cd ..

in the perffuzz directory, after having built PerfFuzz.

Q: What version of clang should I use?
A: PerfFuzz was evaluated with clang-3.8.0 on Linux and works with verison 8 on Mac. To experiment with different clang/LLVM version, add the bin/ directory from the pre-build clang archives to the front of your PATH when compiling.
Q: I'm getting an error involving the -fno-rtti option.
A: If you're on Redhat Linux, this may be a gcc/clang compatibility issue. Apparently gcc-4.7 fixes the issue.

Test PerfFuzz on Insertion Sort

To check whether PerfFuzz is working correctly, try running it on the insertion sort benchmark provided. The following commands assume you are in the PerfFuzz directory.

Build

First, compile the benchmark:

./afl-clang-fast insertion-sort.c -o isort

Run PerfFuzz

Let's make some seeds for PerfFuzz to start with:

mkdir isort-seeds
head -c 64 /dev/zero > isort-seeds/zeroes

Now we can run PerfFuzz:

./afl-fuzz -p -i isort-seeds -o isort_perf_test/ -N 64 ./isort @@

You should see the number of total paths (this is a misnomer; it's just the number of saved inputs) increase consistently. You can also check to see if the saved inputs are heading towards a worst-case by running

for i in isort_perf_test/queue/id*; do ./isort $i | grep comps; done

(which, for each saved input, plots the number of comparisons insertion sort performed while sorting that input)

For comparison with the performance compared to regular afl, you can run: ./afl-fuzz -i isort-seeds -o isort_afl_test/ -N 64 ./isort @@ without the -p option, this should just run regular AFL. You should see total_paths quickly topping out around ~20 or so, and the number of cycles increase a lot. There will probably be much fewer comparisons performed for the saved inputs as well. The highest number of comparisons printed when you run:

for i in isort_afl_test/queue/id*; do ./isort $i | grep comps; done

should be smaller than what you saw for the inputs in isort_perf_test/queue.

Running PerfFuzz on a program of your choice

Compile your program with PerfFuzz

To compile your C/C++ program with perffuzz, replace CC (resp. CXX) with path/to/perffuzz/afl-clang-fast (resp. path/to/perffuzz/afl-clang-fast++) in your build process. See section (3) of README (not README.md) for more details, replacing references of path/to/afl/afl-gcc with path/to/perffuzz/afl-clang-fast.

Q: afl-clang-fast doesn't exist!
A: make sure you ran make in the llvm_mode directory (see "Building PerfFuzz")

Run PerfFuzz on your program.

In short, follow the instructions in README (regular AFL readme) section 6, but add the -p option to enable PerfFuzz, and the -N num option to restrict the size of produced inputs to a maximum file size of num. Make sure your initial seed inputs (in the input directory) are of smaller size than num bytes!

On many programs (including the benchmarks in the paper), the -d option (Fidgety mode) offers better performance.

Let PerfFuzz run for as long as you like: we ran for a few hours on larger benchmarks.

Interpret PerfFuzz results.

In the queue directory of the ouput directory, inputs postfixed with +max were saved because the maximized a performance key.

We provide some tools to help analyze the results. Notably, afl-showmax can print:

The total path length (default)
The maximum hotspot (-x option)
The entire performance map in a key:value format (-a option)

To build afl-showmax, run

make afl-showmax

in the PerfFuzz directory.

You might also like...

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

Hierarchical Motion Understanding via Motion Programs (CVPR 2021) This repository contains the official implementation of: Hierarchical Motion Underst

40 Dec 5, 2022

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

3.8k Jan 4, 2023

A testcase generation tool for Persistent Memory Programs.

PMFuzz PMFuzz is a testcase generation tool to generate high-value tests cases for PM testing tools (XFDetector, PMDebugger, PMTest and Pmemcheck) If

14 Jul 24, 2022

Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Chex Chex is a library of utilities for helping to write reliable JAX code. This includes utils to help: Instrument your code (e.g. assertions) Debug

506 Jan 8, 2023

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

3 Dec 2, 2022

In this project, two programs can help you take full agvantage of time on the model training with a remote server

In this project, two programs can help you take full agvantage of time on the model training with a remote server, which can push notification to your phone about the information during model training, like the model indices and unexpected interrupts. Then you can do something in time for your work.

8 Dec 27, 2022

An NVDA add-on to split screen reader and audio from other programs to different sound channels

An NVDA add-on to split screen reader and audio from other programs to different sound channels (add-on idea credit: Tony Malykh)

7 Dec 25, 2022

Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter

Programas en Python Algunos programas simples creados en Python: 📹 Webcam con c

1 Feb 15, 2022

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

397 Dec 30, 2022

Comments

test of llvm_mode fails

Hi,

On a recent Arch Linux, when building llvm_mode, I'm getting:

[email protected]:llvm_mode$ make
[*] Checking for working 'llvm-config'...
[*] Checking for working 'clang'...
[*] Checking for '../afl-showmap'...
[+] All set and ready to build.
clang -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DAFL_PATH=\"/usr/local/lib/afl\" -DBIN_PATH=\"/usr/local/bin\" -DVERSION=\"2.52b\"  afl-clang-fast.c -o ../afl-clang-fast 
ln -sf afl-clang-fast ../afl-clang-fast++
clang++ `llvm-config --cxxflags` -fno-rtti -fpic -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DVERSION=\"2.52b\" -Wno-variadic-macros -shared afl-llvm-pass.so.cc -o ../afl-llvm-pass.so `llvm-config --ldflags` 
clang -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DAFL_PATH=\"/usr/local/lib/afl\" -DBIN_PATH=\"/usr/local/bin\" -DVERSION=\"2.52b\"  -fPIC -shared afl-catch-dlclose.so.c -o ../afl-catch-dlclose.so
clang -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DAFL_PATH=\"/usr/local/lib/afl\" -DBIN_PATH=\"/usr/local/bin\" -DVERSION=\"2.52b\"  -fPIC -c afl-llvm-rt.o.c -o ../afl-llvm-rt.o
afl-llvm-rt.o.c:99:20: warning: incompatible pointer types assigning to 'u32 *' (aka 'unsigned int *') from 'u8 *' (aka 'unsigned char *') [-Wincompatible-pointer-types]
    __afl_perf_ptr = &__afl_area_ptr[MAP_SIZE];
                   ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
[*] Building 32-bit variant of the runtime (-m32)... success!
[*] Building 64-bit variant of the runtime (-m64)... success!
[*] Testing the CC wrapper and instrumentation output...
unset AFL_USE_ASAN AFL_USE_MSAN AFL_INST_RATIO; AFL_QUIET=1 AFL_PATH=. AFL_CC=clang ../afl-clang-fast -O3 -funroll-loops -Wall -D_FORTIFY_SOURCE=2 -g -Wno-pointer-sign -DAFL_PATH=\"/usr/local/lib/afl\" -DBIN_PATH=\"/usr/local/bin\" -DVERSION=\"2.52b\"  ../test-instr.c -o test-instr 
echo 0 | ../afl-showmap -m none -q -o .test-instr0 ./test-instr
echo 1 | ../afl-showmap -m none -q -o .test-instr1 ./test-instr

Oops, the instrumentation does not seem to be behaving correctly!

Please ping <[email protected]> to troubleshoot the issue.

make: *** [Makefile:105: test_build] Error 1**

It was a full normal compile, so I'm a bit confused. Is the test incorrectly set up for perffuzz and hasn't been changed/fixed?

opened by msoos 7

Prioritize maximizing values with more granularity
Some values in the key: value map may be more worth increasing than others (either more interesteing, or others may just not increase). Two ideas:

Favour based on the key achieving maximum value (similar to afl-rb's minimizing branch hits)

Favour based on whether value is actually increasing.
opened by carolemieux 3
What is Perf_Mask in the instrumentation pass?

Hey, I am trying to do some thing new on PerfFuzz. But there is one thing in the code I am confused.

What is the purpose of this Perf_Mask? https://github.com/carolemieux/perffuzz/blob/f937f370555d0c54f2109e3b1aa5763f8defe337/llvm_mode/afl-llvm-pass.so.cc#L129

I don't think it is correct to add Perf_Mask to Edge_Id to create a GEP instruction in PerfBranchPtr https://github.com/carolemieux/perffuzz/blob/f937f370555d0c54f2109e3b1aa5763f8defe337/llvm_mode/afl-llvm-pass.so.cc#L176 https://github.com/carolemieux/perffuzz/blob/f937f370555d0c54f2109e3b1aa5763f8defe337/llvm_mode/afl-llvm-pass.so.cc#L177

However, EdgeId % PERF_SIZE is acctually needed to index the perf map.

Looking forward to your reply, thanks.

opened by zhanggenex 1
Rename staleness
Find a new name for staleness which is either (1) more intuitive or (2) involves the use of the word "gradient".

Suggestions What we currently use as staleness is really the inverse of what all these things could be...

magnitude-agnostic gradient

increase gradient

binary gradient
opened by carolemieux 0

Releases(1.0)

1.0(Mar 9, 2019)

Snapshot of the version used for replicating experiments in the ISSTA 2018 paper.
Source code(tar.gz)
Source code(zip)

Owner

Caroline Lemieux

GitHub Repository

Dynamic View Synthesis from Dynamic Monocular Video

Dynamic View Synthesis from Dynamic Monocular Video Project Website | Video | Paper Dynamic View Synthesis from Dynamic Monocular Video Chen Gao, Ayus

139 Dec 28, 2022

R-package accompanying the paper "Dynamic Factor Model for Functional Time Series: Identification, Estimation, and Prediction"

dffm The goal of dffm is to provide functionality to apply the methods developed in the paper “Dynamic Factor Model for Functional Time Series: Identi

3 Dec 09, 2022

This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation This is the code relat

39 Sep 23, 2022

Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

The official code for the NeurIPS 2021 paper Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

13 Dec 22, 2022

PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

PyTorch Code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation Viraj Prabhu, Shivam Khare, Deeks

46 Dec 24, 2022

NeuralCompression is a Python repository dedicated to research of neural networks that compress data

NeuralCompression is a Python repository dedicated to research of neural networks that compress data. The repository includes tools such as JAX-based entropy coders, image compression models, video c

297 Jan 06, 2023

Select, weight and analyze complex sample data

Sample Analytics In large-scale surveys, often complex random mechanisms are used to select samples. Estimates derived from such samples must reflect

37 Dec 15, 2022

Official implementation of TMANet.

Temporal Memory Attention for Video Semantic Segmentation, arxiv Introduction We propose a Temporal Memory Attention Network (TMANet) to adaptively in

94 Dec 02, 2022

DUE: End-to-End Document Understanding Benchmark

This is the repository that provide tools to download data, reproduce the baseline results and evaluation. What can you achieve with this guide Based

21 Dec 29, 2022

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars This repository is the official implementation of Colar. In this work,

246 Dec 13, 2022

Label Hallucination for Few-Shot Classification

Label Hallucination for Few-Shot Classification This repo covers the implementation of the following paper: Label Hallucination for Few-Shot Classific

13 Nov 13, 2022

Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021] SPEC: Seeing People in the Wild with an Estimated Camera, Muhammed Kocabas, Chun-

187 Dec 26, 2022

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Data

2 Oct 06, 2022

Neighborhood Reconstructing Autoencoders

Neighborhood Reconstructing Autoencoders The official repository for Neighborhood Reconstructing Autoencoders (Lee, Kwon, and Park, NeurIPS 2021). T

24 Dec 14, 2022

Data Augmentation Using Keras and Python

Data-Augmentation-Using-Keras-and-Python Data augmentation is the process of increasing the number of training dataset. Keras library offers a simple

3 Feb 15, 2022

Technical experimentations to beat the stock market using deep learning :chart_with_upwards_trend:

DeepStock Technical experimentations to beat the stock market using deep learning. Experimentations Deep Learning Stock Prediction with Daily News Hea

449 Dec 29, 2022

Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

Overinterpretation This repository contains the code for the paper: Overinterpretation reveals image classification model pathologies Authors: Brandon

17 Dec 10, 2022

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

71 Dec 28, 2022

Individual Treatment Effect Estimation

CAPE Individual Treatment Effect Estimation Run CAPE python train_causal.py --loop 10 -m cape_cau -d NI --i_t 1 Run a baseline model python train_cau

4 Sep 02, 2022

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

74 Dec 30, 2022

PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

Related tags

Overview

PerfFuzz

Building PerfFuzz

Test PerfFuzz on Insertion Sort

Build

Run PerfFuzz

Running PerfFuzz on a program of your choice

Compile your program with PerfFuzz

Run PerfFuzz on your program.

Interpret PerfFuzz results.

You might also like...

This repository contains the code for the paper "Hierarchical Motion Understanding via Motion Programs"

TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

A testcase generation tool for Persistent Memory Programs.

Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

In this project, two programs can help you take full agvantage of time on the model training with a remote server

An NVDA add-on to split screen reader and audio from other programs to different sound channels

Some simple programs built in Python: webcam with cv2 that detects eyes and face, with grayscale filter

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Comments

test of llvm_mode fails

Prioritize maximizing values with more granularity

What is Perf_Mask in the instrumentation pass?

Rename staleness

Releases(1.0)

1.0(Mar 9, 2019)

Owner

Caroline Lemieux

Dynamic View Synthesis from Dynamic Monocular Video

R-package accompanying the paper "Dynamic Factor Model for Functional Time Series: Identification, Estimation, and Prediction"

This is the code related to "Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation" (ICCV 2021).

Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

PyTorch code for SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised DA

NeuralCompression is a Python repository dedicated to research of neural networks that compress data

Select, weight and analyze complex sample data

Official implementation of TMANet.

DUE: End-to-End Document Understanding Benchmark

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars, CVPR 2022.

Label Hallucination for Few-Shot Classification

Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

Replication Package for "An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets"

Neighborhood Reconstructing Autoencoders

Data Augmentation Using Keras and Python

Technical experimentations to beat the stock market using deep learning :chart_with_upwards_trend:

Code for Overinterpretation paper Overinterpretation reveals image classification model pathologies

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Individual Treatment Effect Estimation

Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)