A tool that automatically creates fuzzing harnesses based on a library

Overview

AutoHarness

Created by Akshat Parikh

What is this tool?

AutoHarness is a tool that automatically generates fuzzing harnesses for you. This idea stems from a concurrent problem in fuzzing codebases today: large codebases have thousands of functions and pieces of code that can be embedded fairly deep into the library. It is very hard or sometimes even impossible for smart fuzzers to reach that codepath. Even for large fuzzing projects such as oss-fuzz, there are still parts of the codebase that are not covered in fuzzing. Hence, this program tries to alleviate this problem in some capacity as well as provide a tool that security researchers can use to initially test a code base. This program only supports code bases which are coded in C and C++.

Setup/Demonstration

This program utilizes llvm and clang for libfuzzer, Codeql for finding functions, and python for the general program. This program was tested on Ubuntu 20.04 with llvm 12 and python 3. Here is the initial setup.

sudo apt-get update;
sudo apt-get install python3 python3-pip llvm-12* clang-12 git;
pip3 install pandas lief subprocess os argparse ast;

Follow the installation procedure for Codeql on https://github.com/github/codeql. Make sure to install the CLI tools and the libraries. For my testing, I have stored both the tools and libraries under one folder. Finally, clone this repository or download a release. Here is the program's output after running on nginx with the multiple argument mode set. This is the command I used.

python3 harness.py -L /home/akshat/nginx-1.21.0/objs/ -C /home/akshat/codeql-h/ -M 1 -O /home/akshat/autoharness/ -D nginx -G 1 -Y 1 -F "-I /home/akshat/nginx-1.21.0/objs -I /home/akshat/nginx-1.21.0/src/core -I /home/akshat/nginx-1.21.0/src/event -I /home/akshat/nginx-1.21.0/src/http -I /home/akshat/nginx-1.21.0/src/mail -I /home/akshat/nginx-1.21.0/src/misc -I /home/akshat/nginx-1.21.0/src/os -I /home/akshat/nginx-1.21.0/src/stream -I /home/akshat/nginx-1.21.0/src/os/unix" -X ngx_config.h,ngx_core.h

Results: image It is definitely possible to raise the success by further debugging the compilation and adding more header files and more. Note the nginx project does not have any shared objects after compiling. However, this program does have a feature that can convert PIE executables into shared libraries.

Planned Features (in order of progress)

  1. Struct Fuzzing

The current way implemented in the program to fuzz functions with multiple arguments is by using fuzzing data provider. There are some improvements to make in this integration; however, I believe I can incorporate this feature with data structures. A problem which I come across when coding this is with codeql and nested structs. It becomes especially hard without writing multiple queries which vary for every function. In short, this feature needs more work. I was also thinking about a simple solution using protobufs.

  1. Implementation Based Harness Creation

Using codeql, it is possible to use to generate a control flow graph that maps how the parameters in a function are initialized. Using that information, we can create a better harness. Another way is to look for implementations for the function that exist in the library and use that information to make an educated guess on an implementation of the function as a harness. The problems I currently have with this are generating the control flow graphs with codeql.

  1. Parallelized fuzzing/False Positive Detection

I can create a simple program that runs all the harnesses and picks up on any of the common false positives using ASAN. Also, I can create a new interface that runs all the harnesses at once and displays their statistics.

Contribution/Bugs

If you find any bugs with this program, please create an issue. I will try to come up with a fix. Also, if you have any ideas on any new features or how to implement performance upgrades or the current planned features, please create a pull request or an issue with the tag (contribution).

PSA

This tool generates some false positives. Please first analyze the crashes and see if it is valid bug or if it is just an implementation bug. Also, you can enable the debug mode if some functions are not compiling. This will help you understand if there are some header files that you are missing or any linkage issues. If the project you are working on does not have shared libraries but an executable, make sure to compile the executable in PIE form so that this program can convert it into a shared library.

References

  1. https://lief.quarkslab.com/doc/latest/tutorials/08_elf_bin2lib.html
Comments
  • Fuzzers never compile

    Fuzzers never compile

    Hello, I made it to the final step in your README:

    python3 harness.py -L /root/nginx/ -C /root/codeql/ -M 1 -O /root/autoharness/ -D nginx -G 1 -Y 1 -F "-I /root/nginx/objs -I /root/nginx/src/core -I /root/nginx/src/event -I /root/nginx/src/http -I /root/nginx/src/mail -I /root/nginx/src/misc -I /root/nginx/src/os -I /root/nginx/src/stream -I /root/nginx/src/os/unix" -X ngx_config.h,ngx_core.h

    and it appears to be doing something:

    Compiling query plan for /root/codeql/multiargfunc.ql.
    [1/1 comp 19.4s] Compiled /root/codeql/multiargfunc.ql.
    Starting evaluation of codeql-cpp/multiargfunc.ql.
    [1/1 eval 4.9s] Evaluation done; writing results to /root/autoharness/multiarg.bqrs.
    Shutting down query evaluator.
    

    however, it never builds any fuzzers, this is all that appears on screen:

                      f             g                                                  t
    0             _Exit          void                                              [int]
    1        __asprintf           int    [const char *__restrict__, char **__restrict__]
    2    __asprintf_chk           int  [int, const char *__restrict__, char **__restr...
    3        __bswap_16    __uint16_t                                       [__uint16_t]
    4        __bswap_32    __uint32_t                                       [__uint32_t]
    ..              ...           ...                                                ...
    812         waitpid       __pid_t                              [int, __pid_t, int *]
    813        wcstombs        size_t  [size_t, char *__restrict__, const wchar_t *__...
    814          wctomb           int                                  [char *, wchar_t]
    815           write       ssize_t                        [int, size_t, const void *]
    816          zError  const char *                                              [int]
    
    [817 rows x 3 columns]****
    

    Am using Ubuntu clang version 12.0.0-3ubuntu1~21.04.2 and have installed all of the requirements as mentioned in the README. Any tips would be much appreciated.

    opened by geeknik 11
  • Could not resolve type ...

    Could not resolve type ...

    Hi, The following command failed to generate harness.

    python3 harness.py -L /local/codeql-home/nginx-1.21.1/objs/ -C /local/codeql-home/codeql/ -M 1 -O /local/nginx/autoharness/ -D nginx -G 1 -Y 1 -F "-I /local/codeql-home/nginx-1.21.1/objs -I /local/codeql-home/nginx-1.21.1/src/core -I /local/codeql-home/nginx-1.21.1/src/event -I /local/codeql-home/nginx-1.21.1/src/http -I /local/codeql-home/nginx-1.21.1/src/mail -I /local/codeql-home/nginx-1.21.1/src/misc -I /local/codeql-home/nginx-1.21.1/src/os -I /local/codeql-home/nginx-1.21.1/src/stream -I /local/codeql-home/nginx-1.21.1/src/os/unix" -X ngx_config.h,ngx_core.h
    

    Error messages:

    Compiling query plan for /local/codeql-home/codeql/multiargfunc.ql.
    ERROR: Could not resolve module cpp. There should probably be a qlpack.yml file declaring dependencies in /local/codeql-home/codeql or an enclosing directory. (/local/codeql-home/codeql/multiargfunc.ql:1,8-11)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:3,1-5)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:3,30-39)
    ERROR: Could not resolve type PointerType (/local/codeql-home/codeql/multiargfunc.ql:6,40-51)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:9,1-5)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:9,27-36)
    ERROR: Could not resolve type PointerType (/local/codeql-home/codeql/multiargfunc.ql:10,65-76)
    ERROR: Could not resolve type Function (/local/codeql-home/codeql/multiargfunc.ql:13,6-14)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:13,18-22)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:14,18-27)
    ERROR: Could not resolve type Struct (/local/codeql-home/codeql/multiargfunc.ql:14,91-97)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:4,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:6,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:10,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:10,47-53)
    
    opened by JerryWang304 10
  • Master

    Master

    Lots of changes, some that I can remember:

    • Factoring out code to create bash commands into command_builder.py.
    • Change from readelf to nm to only harness dynamic, defined, exported function symbols.
    • Ignore functions with void * or array arguments. Arrays may be supported later.
    • Get parameters in the right order.
    • Handle const.

    Note: only tested using mode=1 and detect=1 on libsodium, with other settings you'll likely still get lots of errors but let's fix those by creating and resolving issues.

    opened by Jelle-Nauta 0
  • Support array parameters

    Support array parameters

    Functions with array parameters are currently ignored because they would lead to code like:

    auto data1= provider.ConsumeIntegral<char[16]>();
    

    Proposal: handle array-parameters as a special case, e.g. generating code like:

    char data1[16];
    for (size_t idx = 0; idx < 16; ++idx) {
        data1[idx] = provider.ConsumeIntegral<char>();
    }
    

    There may be a more elegant way to do this, but I don't see it at the moment.

    opened by Jelle-Nauta 0
  • Create test suite

    Create test suite

    Currently there are many different combinations of settings (mode, detect) and library-properties (exported or not, defined or not, function or other symbols, etc.) and many of these probably lead to errors, e.g. through uncompilable harness code.

    Proposal: create a minimal set of libraries with a comprehensive set of symbols and properties, and a test workflow to verify that autoharness works - or find bugs that can then be addressed.

    opened by Jelle-Nauta 0
Releases(1.0)
  • 1.0(Jul 10, 2021)

    Initial Release of AutoHarness -added executable to shared object functionality -added automatic header detection or function reconstruction -added automatic fuzzing harness creation for one argument and multiple arguments

    Source code(tar.gz)
    Source code(zip)
Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity

Date created February 10, 2022 Project Title Explore US Bikeshare Data Descripti

Thárcyla 1 Feb 14, 2022
Pipenv-local-deps-repro - Reproduction of a local transitive dependency on pipenv

Reproduction of the pipenv bug with transitive local dependencies. Clone this re

Lucas Duailibe 2 Jan 11, 2022
Meower a social media platform written in Scratch 3.0 and Python

Meower Meower is a social media platform written in Scratch 3.0 and Python, ported to HTML for self-hosting. Try Beta 4.6 Changelog for 4.6 Start impl

Meower Media Co. 23 Dec 02, 2022
Back-end API for the reternal framework

RE:TERNAL RE:TERNAL is a centralised purple team simulation platform. Reternal uses agents installed on a simulation network to execute various known

Joey Dreijer 7 Apr 15, 2022
A simple countdown timer in eazy code to show timer with python

Countdown_Timer The simple CLI countdown timer in eazy code to show timer How Work First you fill the input by int-- (Enter the time in Seconds:) for

Yasin Rezvani 3 Nov 15, 2022
Synchrosqueezing, wavelet transforms, and time-frequency analysis in Python

Synchrosqueezing is a powerful reassignment method that focuses time-frequency representations, and allows extraction of instantaneous amplitudes and frequencies

John Muradeli 382 Jan 06, 2023
A common, beautiful interface to tabular data, no matter the format

rows No matter in which format your tabular data is: rows will import it, automatically detect types and give you high-level Python objects so you can

Álvaro Justen 834 Jan 03, 2023
A variant caller for the GBA gene using WGS data

Gauchian: WGS-based GBA variant caller Gauchian is a targeted variant caller for the GBA gene based on a whole-genome sequencing (WGS) BAM file. Gauch

Illumina 16 Oct 13, 2022
An example of Connecting a MySQL Database with Python Code

An example of Connecting a MySQL Database with Python Code And How to install Table of contents General info Technologies Setup General info In this p

Mohammad Hosseinzadeh 1 Nov 23, 2021
Cross-platform .NET Core pre-commit hooks

dotnet-core-pre-commit Cross-platform .NET Core pre-commit hooks How to use Add this to your .pre-commit-config.yaml - repo: https://github.com/juan

Juan Odicio 5 Jul 20, 2021
A dashboard for your code. A build system.

NOTICE: THIS REPO IS NO LONGER UPDATED Changes Changes is a build coordinator and reporting solution written in Python. The project is primarily built

Dropbox 763 Sep 09, 2022
Tomador de ramos UC automatico para Windows, Linux y macOS

auto-ramos v2.0 Tomador de ramos UC automatico para Windows, Linux y macOS Funcion Este script de Python tiene como principal objetivo hacer que la to

Open Source eUC 13 Jun 29, 2022
A Tandy Color Computer 1, 2, and 3 assembler written in Python

CoCo Assembler and File Utility Table of Contents What is it? Requirements License Installing Assembler Assembler Usage Input File Format Print Symbol

Craig Thomas 16 Nov 03, 2022
because rico hates uuid's

terrible-uuid-lambda because rico hates uuid's sub 200ms response times! Try it out here: https://api.mathisvaneetvelde.com/uuid https://api.mathisvan

Mathis Van Eetvelde 2 Feb 15, 2022
Cylc: a workflow engine for cycling systems

Cylc: a workflow engine for cycling systems. Repository master branch: core meta-scheduler component of cylc-8 (in development); Repository 7.8.x branch: full cylc-7 system.

The Cylc Workflow Engine 205 Dec 20, 2022
A Microsoft reward automator, designed to work headless on a raspberry pi

MsReward A Microsoft reward automator, designed to work headless on a raspberry pi. Tested with a pi 3b+ and a pi 4 2Gb . Using a discord bot to log e

10 Dec 21, 2022
Coderslab Workshop Projects

Workshop Coderslab workshop projects that include: Guessing Game Lotto simulator Guessing Game vol.2 Guessing Game vol.3 Dice 2001 Game Technologies P

Szymon Połczyński 1 Nov 06, 2021
RELATE is an Environment for Learning And TEaching

RELATE Relate is an Environment for Learning And TEaching RELATE is a web-based courseware package. It is set apart by the following features: Focus o

Andreas Klöckner 311 Dec 25, 2022
Usos Semester average helper

Usos Semester average helper Dzieki temu skryptowi mozesz sprawdzic srednia ocen na kazdy odbyty przez ciebie semestr PARAMETERS required: '--username

2 Jan 17, 2022
A one place destination to check whatever is trending on the top social and news websites at present.

UpTrend A one place destination to check whatever is trending on the top social and news websites at present. Explore the docs » View Demo · Report Bu

Google Developer Student Clubs - JGEC 10 Oct 03, 2021