A plugin to introduce a generic API for Decompiler support in GEF

Overview

decomp2gef

A plugin to introduce a generic API for Decompiler support in GEF. Like GEF, the plugin is battery-included and requires no external dependencies other than Python.

decomp2gef Demo viewable here.

Quick Start

First, install the decomp2gef plugin into gef:

cp decomp2gef.py ~/.decomp2gef.py && echo "source ~/.decomp2gef.py" >> ~/.gdbinit

Alternatively, you can load it for one-time-use inside gdb with:

source /path/to/decomp2gef.py

Now import the relevant script for you decompiler:

IDA

  • open IDA on your binary and press Alt-F7
  • popup "Run Script" will appear, load the decomp2gef_ida.py script from this repo

Now use the decompiler connect command in GDB. Note: you must be in a current session of debugging something.

Usage

In gdb, run:

decompiler connect ida

If all is well, you should see:

[+] Connected to decompiler!

Now just use GEF like normal and enjoy decompilation and decompiler symbol mapping! When you change a symbol in ida, like a function name, if will be automatically reflected in gdb after just 2 steps!

Features

  • Auto-updating decompilation context view
  • Auto-syncing function names
  • Breakable/Inspectable symbols
  • Auto-syncing stack variable names
  • Auto-syncing structs

Abstract

The reverse engineering process often involves a decompiler making it fundamental to support in a debugger since context switching knowledge between the two is hard. Decompilers have a lot in common. During the reversing process there are reverse engineering artifacts (REA). These REAs are common across all decompilers:

  • stack variables
  • global variables
  • structs
  • enums
  • function headers (name and prototype)
  • comments

Knowledge of REAs can be used to do lots of things, like sync REAs across decompilers or create a common interface for a debugger to display decompilation information. GEF is currently one of the best gdb upgrades making it a perfect place to first implement this idea. In the future, it should be easily transferable to any debugger supporting python3.

Adding your decompiler

To add your decompiler, simply make a Python XMLRPC server that implements the 4 server functions found in the decomp2gef Decompiler class. Follow the code for how to return correct types.

Comments
  • Missing or invalid attribute 'comment' on Windows IDA 7.6

    Missing or invalid attribute 'comment' on Windows IDA 7.6

    IDA version 7.6 Python 3.10

    decomp2gef ida script seems to fail at Z:\...\IDA\IDA Pro 7.6\plugins\decomp2gef_ida.py: Missing or invalid attribute 'comment' and I seem to be unable to see the decomp2gef plugin being loaded successfully. any ideas?

      bytes   pages size description
    --------- ----- ---- --------------------------------------------
       532480    65 8192 allocating memory for b-tree...
       507904    62 8192 allocating memory for virtual array...
       262144    32 8192 allocating memory for name pointers...
    -----------------------------------------------------------------
      1302528            total memory allocated
    
    Loading processor module Z:\...\IDA\IDA Pro 7.6\procs\pc.dll for metapc...Initializing processor module metapc...OK
    Loading type libraries...
    Autoanalysis subsystem has been initialized.
    Z:\...\IDA\IDA Pro 7.6\plugins\decomp2gef_ida.py: Missing or invalid attribute 'comment'
    Database for file 'redacted' has been loaded.
    Hex-Rays Decompiler plugin has been loaded (v7.6.0.210427)
      License: 57-631C-7A2B-72 IDA PRO 7.6 SP1 (99 users)
      The hotkeys are F5: decompile, Ctrl-F5: decompile all.
    
      Please check the Edit/Plugins menu for more informaton.
    Z:\...\IDA\IDA Pro 7.6\plugins\decomp2gef_ida.py: Missing or invalid attribute 'comment'
    805EF50: restored microcode from idb
    805EF50: restored pseudocode from idb
    -----------------------------------------------------------------------------------------
    Python 3.10.1 (tags/v3.10.1:2cd268a, Dec  6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] 
    IDAPython v7.4.0 final (serial 0) (c) The IDAPython Team <[email protected]>
    -----------------------------------------------------------------------------------------
    
    bug 
    opened by caprinux 10
  • ELF failes to build with objcopy on some binaries

    ELF failes to build with objcopy on some binaries

    Found out about this decomp2gef script and was eager to try it out :)

    When trying it out, I encountered this error: image

    I'm currently using GDB 11.1, IDA 7.6 together with the latest gef.py script freshly pulled from the gef github. Any clue how I could fix this?

    bug 
    opened by caprinux 8
  • Improve code logic and fix errors

    Improve code logic and fix errors

    This PR aims to address a few issues in decomp2gef

    First issue addressed

    As of now, if you try to connect decompiler to debug a binary over a remote server via gdbserver, chances are you will encounter the error min() arg is an empty sequence which arises due to the following lines

    base_address = min([x.page_start for x in vmmap if x.path == get_filepath()])
    ...
    text_base = min([x.page_start for x in vmmap if x.path == get_filepath()])
    

    This arises due to inconsistency in the exact path of the binary between the remote server and the local binary.

    For example, if I try to debug a binary on remote with file path /bin/program with the local binary in /tmp/program, x.path will return /bin/program and get_filepath() will return /tmp/program which does not match and causes the array to be empty.\

    Hence a slight modification to compare the file name rather than the absolute path will solve this issue.

    base_address = min([x.page_start for x in vmmap if x.path.split('/')[-1] == get_filename()])
    ...
    text_base = min([x.page_start for x in vmmap if x.path.split('/')[-1] == get_filename()])
    

    Second issue addressed

    def update_function_data(self, addr):
        ...
        for idx, arg in args.item():
            idx = int(idx, 0)
            expr = f"""(({arg['type']}) {current_arch.function_parameters[idx]}"""
        ...
    

    Within the update_function_data(), we use current_arch.function_parameters to get the registers in which our function arguments are stored and use IDX to match the argument to the respective register/place in memory where the argument is stored.

    In x86_64, current_arch.function_parameters = ['$rdi', '$rsi', '$rdx', '$rcx', '$r8', '$r9'] and this works for functions with 6 arguments as current_arch.function_parameters[0] will match argument 1 to $rdi and so on.

    The flaw comes when we fail to consider functions with 7 or more arguments where arguments will then be found in the stack.

    However we can set this aside, for now, we don't usually encounter more than 7 arguments, right?

    The more urgent flaw is when we bring in the X86 architecture.

    In x86, current_arch.function_parameters = ['$esp']. This means that beyond the first argument, decomp2gef will break with an IndexError as it tries to access current_arch.function_parameters[1] for the 2nd argument and so on.

    I'm not sure if there's a nice way to do this but I essentially redefined current_arch.function_parameters for X86 architectures.

    current_arch.function_parameters = [f'$esp+{x}' for x in range(0, 28, 4)]
    

    This allows decomp2gef to work, but I haven't considered implications yet as it may get pretty complicated(?)

    I welcome any ideas!!

    opened by caprinux 7
  • invalid string offset for section `.strtab'

    invalid string offset for section `.strtab'

    When connecting GEF to the decompiler, gdb fails to add-symbol-file and throws an error.

    BFD: /tmp/tmp3pmay68f.c.debug: invalid string offset 16777215 >= 282 for section '.strtab'

    Although this does not break decomp2gef, it causes the debugger to be without a symbol file and hence renders some features unusable.

    Sample binary with this behavior: sample_program.zip

    bug 
    opened by caprinux 6
  • Add proper support for attaching

    Add proper support for attaching

    Currently support for using the attach command is shaky with PIE binaries, and requires the strict process of launching a new gdb instances, attaching to the target process id, and then connecting to decomp2dbg. Attempting to attach again after this will cause the attached-to binary to have a new base address, and I assume this prevents decomp2dbg from functioning properly as I lose symbols after that. On my local system, disconnecting and reconnecting doesn't fix this (if done before or after the binary is attached to for the second time).

    enhancement 
    opened by frqmod 2
  • add instruction for wsl2

    add instruction for wsl2

    I suggest to add instruction for those who want to use this tool in wsl.

    1. run ./install.sh --ida /mnt/c/xxx/IDA/plugins to install
    2. listen to 0.0.0.0:3662 in IDA
    3. add an Inbound Rules for port 3662 in Windows Firewall, private or domain network
    4. run decompiler connect ida 192.168.xxx.xxx(LAN IP) 3662 in gdb to connect
    opened by RoderickChan 1
  • add checks when forcing text size

    add checks when forcing text size

    Previously, we force text_size to 0xFFFFFF with a nasty hack which only works on 64 bit binaries.

    This means that 32-bit binaries will throw a .strtab offset error or something along those times most of the time which is rather ugly.

    Hence we implement a bit check on the binary and enforce the hack if binary is 64-bit. We should definitely implement a nicer method that caters to 32-bit binaries when possible, but until then these will suffice to preserve sanity.

    addresses #14 !!

    opened by caprinux 1
  • Fix symbol size offsets/Designate appropriate symbol size to symbols

    Fix symbol size offsets/Designate appropriate symbol size to symbols

    Addresses #15, took a while to debug but on comparing queued_sym_sizes to sym_info_list, they seemed to be correct and match accordingly.

    I tested this PR against very basic 32 and 64 bit binaries, which both seems to give me the appropriate symbol sizes upon calling readelf.

    Do have a look! Unfortunately, does not fix #14 :(

    opened by caprinux 1
  • Requires sortedcontainers

    Requires sortedcontainers

    decomp2gef actually does have a single dependency which I did not realize was a dependency: sortedcontainers. It's needed to create a fast and memory-friendly mapping for non-native symbols in gef. We should decide if we want to make decomp2gef dependent of some python packages, or try to replace the functionality of SortedDict.

    discussion 
    opened by mahaloz 1
  • Feat Request: Programmable Ports & IPs

    Feat Request: Programmable Ports & IPs

    As brought up by @caprinux (in #3), we don't support the ability to specify ports or ips for connecting GEF over. Currently, it's hardcoded to 3662.

    To allow for this, we will need a fundamental change in architecture for the server-side, since we need a way to specify port and IP.

    enhancement 
    opened by mahaloz 1
  • Register Variable Support

    Register Variable Support

    • we can now supper every variable shown in a decompiler (including the ones assigned to a variable)
    • functions args are being deprecated in favor of setting them through either register vars or stack vars
    • refactored some janky type setting code

    Closes #38

    opened by mahaloz 0
  • Stack Vars from IDA assigned to incorrect locations

    Stack Vars from IDA assigned to incorrect locations

    Here is a simple example I tested on decomp2dbg v3.1.3 and ida7.5:

    #include <stdio.h>
    
    int main()
    {
        int a = 1;
        int b;
        scanf("%d",&b);
        int c = a + b;
        printf("%d\n",c);
        return 0;
    }
    

    the disassembled result of ida is:

    int __cdecl main(int argc, const char **argv, const char **envp)
    {
      int v4; // [rsp+Ch] [rbp-14h] BYREF
      int v5; // [rsp+10h] [rbp-10h]
      int v6; // [rsp+14h] [rbp-Ch]
      unsigned __int64 v7; // [rsp+18h] [rbp-8h]
    
      v7 = __readfsqword(0x28u);
      v5 = 1;
      __isoc99_scanf(&unk_2004, &v4, envp);
      v6 = v4 + v5;
      printf("%d\n", (unsigned int)(v4 + v5));
      return 0;
    }
    

    when I executed to __isoc99_scanf(&unk_2004, &v4, envp);, I tried to show the value of v5, but it is different from $rbp - 0x10

    image

    Is this a bug or did I do something wrong?

    bug 
    opened by LioTree 2
  • binary ninja plugin manager?

    binary ninja plugin manager?

    The binary ninja plugin manager supports plugins that exist in subfolders. All it would take would be to tag and cut a release (or using release_helper) and let me know on this issue and I can add it. Then subsequent releases automatically notify us and we update the plugin manager accordingly.

    Note that I haven't looked at the current import hierarchy but because the plugin manager doesn't necessarily install things to the global namespace it might require some tweaks to how imports are done.

    opened by psifertex 1
  • Tab completion broken in Archlinux gdb

    Tab completion broken in Archlinux gdb

    After sourcing decomp2dbg in my gdbinit I have the commands available but it's not possible anymore to use tab-completion to look for help or alike. This happens with a naked gdb with source ~/.decomp2dbg.py as only entry.

    opened by ysf 6
  • Add Support for shared librairies.

    Add Support for shared librairies.

    Hello. Tested on binary it work perfect, great tools! But when i test on shared libraries started with another binary, after connecting to the server, the tool won't work. (no decompilation, breakpoints have offset errors) . I know original behavior is to start the server on binary and use the connect while debugging the binary with gdb. But in my situation I can't debug the libraries without starting the linked main binary before. (Maybe adding muliple server syncing ? Decomp2dbg can't export ida decompiled code on gdb because shared librairies is extern but maybe adding another syncing server on shared lib IDA instance + syncing correctly when jumping on shared lib can work)

    opened by 0xMirasio 16
  • Add Support for Struct Imports

    Add Support for Struct Imports

    For now, we will only support IDA since we have a clear-cut way to both get every struct and also know when they have been updated. This may also be possible in Binja, but out of question for future Ghidra support... that one will have to wait.

    IDA Changes

    In IDA we need to utilize finding all ordinal numbers, which represents each custom struct in IDA. After that, we can use idc.print_decls("1", 0) for each number to get a nice C representation of the struct. Now that we have a string that has the C-definition of the struct we need to do things in the core.

    The changes all take place in the server. It's possible this may change the old API.

    Client Changes

    Assuming we now have a series of structs that are represented in C, we actually need to compile them into an object file and then add them with the classic add-symbol-file we use on the backend for other things. The trick though is adding this symbol file before we add the big one with all the global symbols here: https://github.com/mahaloz/decomp2dbg/blob/57983617d9a14f1f2ed7b54ee07bd15f14075c45/decomp2dbg/clients/gdb/symbol_mapper.py#L243

    Since both symbol files will be loaded into the same place, there will be an overlapping main function. We either need to bake structs directly into the first file we create, or we need to make a new way to add native-struct through the symbol mapper.

    enhancement 
    opened by mahaloz 0
Releases(v3.1.3)
  • v3.1.3(Nov 21, 2022)

  • v3.1.2(Nov 21, 2022)

  • v3.1.1(Nov 18, 2022)

  • v3.1.0(Nov 17, 2022)

    What's Changed

    • Add Ghidra Demo & Refactor readme by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/47
    • Reflect config changes in pwndbg by @szymex73 in https://github.com/mahaloz/decomp2dbg/pull/48
    • Typo in README.md by @Ice1187 in https://github.com/mahaloz/decomp2dbg/pull/49

    New Contributors

    • @szymex73 made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/48
    • @Ice1187 made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/49

    Full Changelog: https://github.com/mahaloz/decomp2dbg/compare/v3.0.0...v3.1.0

    Source code(tar.gz)
    Source code(zip)
    d2d-ghidra-plugin.zip(1.41 MB)
  • v3.0.0(Oct 25, 2022)

    • Added Ghidra support
    • Refactored how installing works to full Python-only
    • Added dependence on the BinSync project

    What's Changed

    • Python Installer by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/46

    Full Changelog: https://github.com/mahaloz/decomp2dbg/compare/v2.2.0...v3.0.0

    Source code(tar.gz)
    Source code(zip)
    d2d-ghidra-plugin.zip(1.41 MB)
  • v2.2.0(Oct 25, 2022)

    What's Changed

    • Support native symbol mapping by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/1
    • added REAL native support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/2
    • fail gracefully on bad argv & fix bad elfs reads by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/6
    • Add angrmanagement support for decomp2gef by @Cl4sm in https://github.com/mahaloz/decomp2dbg/pull/8
    • DRAFT: Local Var Support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/7
    • Api refactor by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/9
    • Major Refactor: Global Vars, Programmable Ports, Packaging by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/10
    • make sure IDA Plugin has all consts defined by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/12
    • Support remote debugging by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/17
    • Fix rebasing bugs in angr-decompiler plugin by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/18
    • Update GEF API use to latest version by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/22
    • Fix symbol size offsets/Designate appropriate symbol size to symbols by @caprinux in https://github.com/mahaloz/decomp2dbg/pull/20
    • add checks when forcing text size by @caprinux in https://github.com/mahaloz/decomp2dbg/pull/21
    • minor fixes + replace all usage of GEF Elf object with pyelftools by @caprinux in https://github.com/mahaloz/decomp2dbg/pull/25
    • [WIP] Binja Support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/19
    • fixed bad sizing on binaries that generate a larger blank symbol section by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/26
    • Fix symbol duplication by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/27
    • fix another duplication bug by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/28
    • [WIP] Support Vanilla GDB by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/30
    • Fix a typo that caused the plugin not to work for Python <3.8. by @adamdoupe in https://github.com/mahaloz/decomp2dbg/pull/32
    • Fix Manual Install Instructions by @adamdoupe in https://github.com/mahaloz/decomp2dbg/pull/31
    • Fix README typos by @mborgerson in https://github.com/mahaloz/decomp2dbg/pull/33
    • Handle stack frame offset for stack variables on x86 architectures by @zolutal in https://github.com/mahaloz/decomp2dbg/pull/35
    • always refresh baseaddr on connect by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/37
    • Register Variable Support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/39
    • Support loading symbols at configurable base addresses by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/42
    • Ghidra Support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/45

    New Contributors

    • @mahaloz made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/1
    • @Cl4sm made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/8
    • @caprinux made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/20
    • @adamdoupe made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/32
    • @mborgerson made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/33
    • @zolutal made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/35

    Full Changelog: https://github.com/mahaloz/decomp2dbg/commits/v2.2.0

    Source code(tar.gz)
    Source code(zip)
    d2d-ghidra-plugin.zip(1.41 MB)
Owner
Zion
Native Hawaiian | Phd Student @sefcom | Co-captain @shellphish | President of @asu-hacking-club
Zion
A Power BI/Google Studio Dashboard to analyze previous OTC CatchUps

OTC CatchUp Dashboard A Power BI/Google Studio dashboard analyzing OTC CatchUps. File Contents * ├───data ├───old summaries ─── *.md ├

11 Oct 30, 2022
Generate YARA rules for OOXML documents using ZIP local header metadata.

apooxml Generate YARA rules for OOXML documents using ZIP local header metadata. To learn more about this tool and the methodology behind it, check ou

MANDIANT 34 Jan 26, 2022
Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021

ReasonBERT Code and pre-trained models for ReasonBert: Pre-trained to Reason with Distant Supervision, EMNLP'2021 Pretrained Models The pretrained mod

SunLab-OSU 29 Dec 19, 2022
Modified fork of CPython's ast module that parses `# type:` comments

Typed AST typed_ast is a Python 3 package that provides a Python 2.7 and Python 3 parser similar to the standard ast library. Unlike ast up to Python

Python 217 Dec 06, 2022
Canonical source repository for PyYAML

PyYAML - The next generation YAML parser and emitter for Python. To install, type 'python setup.py install'. By default, the setup.py script checks

The YAML Project 2k Jan 01, 2023
Automatically open a pull request for repositories that have no CONTRIBUTING.md file

automatic-contrib-prs Automatically open a pull request for repositories that have no CONTRIBUTING.md file for a targeted set of repositories. What th

GitHub 8 Oct 20, 2022
Always know what to expect from your data.

Great Expectations Always know what to expect from your data. Introduction Great Expectations helps data teams eliminate pipeline debt, through data t

Great Expectations 7.8k Jan 05, 2023
Leetcode Practice

LeetCode Practice Description This is my LeetCode Practice. Visit LeetCode Website for detailed question description. The code in this repository has

Leo Hsieh 75 Dec 27, 2022
Repository for learning Python (Python Tutorial)

Repository for learning Python (Python Tutorial) Languages and Tools 🧰 Overview 📑 Repository for learning Python (Python Tutorial) Languages and Too

Swiftman 2 Aug 22, 2022
A Python Package To Generate Strong Passwords For You in Your Projects.

shPassGenerator Version 1.0.6 Ready To Use Developed by Shervin Badanara (shervinbdndev) on Github Language and technologies used in This Project Work

Shervin 11 Dec 19, 2022
Que es S4K Builder?, Fácil un constructor de tokens grabbers con muchas opciones, como BTC Miner, Clipper, shutdown PC, Y más! Disfrute el proyecto. <3

S4K Builder Este script Python 3 de código abierto es un constructor del muy popular registrador de tokens que está en [mi GitHub] (https://github.com

SadicX 1 Oct 22, 2021
LotteryBuyPredictionWebApp - Lottery Purchase Prediction Model

Lottery Purchase Prediction Model Objective and Goal Predict the lottery type th

Wanxuan Zhang 2 Feb 14, 2022
300+ Python Interview Questions

300+ Python Interview Questions

Pradeep Kumar 1.1k Jan 02, 2023
Easy OpenAPI specs and Swagger UI for your Flask API

Flasgger Easy Swagger UI for your Flask API Flasgger is a Flask extension to extract OpenAPI-Specification from all Flask views registered in your API

Flasgger 3.1k Jan 05, 2023
A Python validator for SHACL

pySHACL A Python validator for SHACL. This is a pure Python module which allows for the validation of RDF graphs against Shapes Constraint Language (S

RDFLib 187 Dec 29, 2022
Coursera learning course Python the basics. Programming exercises and tasks

HSE_Python_the_basics Welcome to BAsics programming Python! You’re joining thousands of learners currently enrolled in the course. I'm excited to have

PavelRyzhkov 0 Jan 05, 2022
Hasköy is an open-source variable sans-serif typeface family

Hasköy Hasköy is an open-source variable sans-serif typeface family. Designed with powerful opentype features and each weight includes latin-extended

67 Jan 04, 2023
Toolchain for project structure and documents optimisation

ritocco Toolchain for project structure and documents optimisation

Harvey Wu 1 Jan 12, 2022
API Documentation for Python Projects

API Documentation for Python Projects. Example pdoc -o ./html pdoc generates this website: pdoc.dev/docs. Installation pip install pdoc pdoc is compat

mitmproxy 1.4k Jan 07, 2023
JMESPath is a query language for JSON.

JMESPath JMESPath (pronounced "james path") allows you to declaratively specify how to extract elements from a JSON document. For example, given this

1.7k Dec 31, 2022