A plugin to introduce a generic API for Decompiler support in GEF

Overview

decomp2gef

A plugin to introduce a generic API for Decompiler support in GEF. Like GEF, the plugin is battery-included and requires no external dependencies other than Python.

decomp2gef Demo viewable here.

Quick Start

First, install the decomp2gef plugin into gef:

cp decomp2gef.py ~/.decomp2gef.py && echo "source ~/.decomp2gef.py" >> ~/.gdbinit

Alternatively, you can load it for one-time-use inside gdb with:

source /path/to/decomp2gef.py

Now import the relevant script for you decompiler:

IDA

  • open IDA on your binary and press Alt-F7
  • popup "Run Script" will appear, load the decomp2gef_ida.py script from this repo

Now use the decompiler connect command in GDB. Note: you must be in a current session of debugging something.

Usage

In gdb, run:

decompiler connect ida

If all is well, you should see:

[+] Connected to decompiler!

Now just use GEF like normal and enjoy decompilation and decompiler symbol mapping! When you change a symbol in ida, like a function name, if will be automatically reflected in gdb after just 2 steps!

Features

  • Auto-updating decompilation context view
  • Auto-syncing function names
  • Breakable/Inspectable symbols
  • Auto-syncing stack variable names
  • Auto-syncing structs

Abstract

The reverse engineering process often involves a decompiler making it fundamental to support in a debugger since context switching knowledge between the two is hard. Decompilers have a lot in common. During the reversing process there are reverse engineering artifacts (REA). These REAs are common across all decompilers:

  • stack variables
  • global variables
  • structs
  • enums
  • function headers (name and prototype)
  • comments

Knowledge of REAs can be used to do lots of things, like sync REAs across decompilers or create a common interface for a debugger to display decompilation information. GEF is currently one of the best gdb upgrades making it a perfect place to first implement this idea. In the future, it should be easily transferable to any debugger supporting python3.

Adding your decompiler

To add your decompiler, simply make a Python XMLRPC server that implements the 4 server functions found in the decomp2gef Decompiler class. Follow the code for how to return correct types.

Comments
  • Missing or invalid attribute 'comment' on Windows IDA 7.6

    Missing or invalid attribute 'comment' on Windows IDA 7.6

    IDA version 7.6 Python 3.10

    decomp2gef ida script seems to fail at Z:\...\IDA\IDA Pro 7.6\plugins\decomp2gef_ida.py: Missing or invalid attribute 'comment' and I seem to be unable to see the decomp2gef plugin being loaded successfully. any ideas?

      bytes   pages size description
    --------- ----- ---- --------------------------------------------
       532480    65 8192 allocating memory for b-tree...
       507904    62 8192 allocating memory for virtual array...
       262144    32 8192 allocating memory for name pointers...
    -----------------------------------------------------------------
      1302528            total memory allocated
    
    Loading processor module Z:\...\IDA\IDA Pro 7.6\procs\pc.dll for metapc...Initializing processor module metapc...OK
    Loading type libraries...
    Autoanalysis subsystem has been initialized.
    Z:\...\IDA\IDA Pro 7.6\plugins\decomp2gef_ida.py: Missing or invalid attribute 'comment'
    Database for file 'redacted' has been loaded.
    Hex-Rays Decompiler plugin has been loaded (v7.6.0.210427)
      License: 57-631C-7A2B-72 IDA PRO 7.6 SP1 (99 users)
      The hotkeys are F5: decompile, Ctrl-F5: decompile all.
    
      Please check the Edit/Plugins menu for more informaton.
    Z:\...\IDA\IDA Pro 7.6\plugins\decomp2gef_ida.py: Missing or invalid attribute 'comment'
    805EF50: restored microcode from idb
    805EF50: restored pseudocode from idb
    -----------------------------------------------------------------------------------------
    Python 3.10.1 (tags/v3.10.1:2cd268a, Dec  6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] 
    IDAPython v7.4.0 final (serial 0) (c) The IDAPython Team <[email protected]>
    -----------------------------------------------------------------------------------------
    
    bug 
    opened by caprinux 10
  • ELF failes to build with objcopy on some binaries

    ELF failes to build with objcopy on some binaries

    Found out about this decomp2gef script and was eager to try it out :)

    When trying it out, I encountered this error: image

    I'm currently using GDB 11.1, IDA 7.6 together with the latest gef.py script freshly pulled from the gef github. Any clue how I could fix this?

    bug 
    opened by caprinux 8
  • Improve code logic and fix errors

    Improve code logic and fix errors

    This PR aims to address a few issues in decomp2gef

    First issue addressed

    As of now, if you try to connect decompiler to debug a binary over a remote server via gdbserver, chances are you will encounter the error min() arg is an empty sequence which arises due to the following lines

    base_address = min([x.page_start for x in vmmap if x.path == get_filepath()])
    ...
    text_base = min([x.page_start for x in vmmap if x.path == get_filepath()])
    

    This arises due to inconsistency in the exact path of the binary between the remote server and the local binary.

    For example, if I try to debug a binary on remote with file path /bin/program with the local binary in /tmp/program, x.path will return /bin/program and get_filepath() will return /tmp/program which does not match and causes the array to be empty.\

    Hence a slight modification to compare the file name rather than the absolute path will solve this issue.

    base_address = min([x.page_start for x in vmmap if x.path.split('/')[-1] == get_filename()])
    ...
    text_base = min([x.page_start for x in vmmap if x.path.split('/')[-1] == get_filename()])
    

    Second issue addressed

    def update_function_data(self, addr):
        ...
        for idx, arg in args.item():
            idx = int(idx, 0)
            expr = f"""(({arg['type']}) {current_arch.function_parameters[idx]}"""
        ...
    

    Within the update_function_data(), we use current_arch.function_parameters to get the registers in which our function arguments are stored and use IDX to match the argument to the respective register/place in memory where the argument is stored.

    In x86_64, current_arch.function_parameters = ['$rdi', '$rsi', '$rdx', '$rcx', '$r8', '$r9'] and this works for functions with 6 arguments as current_arch.function_parameters[0] will match argument 1 to $rdi and so on.

    The flaw comes when we fail to consider functions with 7 or more arguments where arguments will then be found in the stack.

    However we can set this aside, for now, we don't usually encounter more than 7 arguments, right?

    The more urgent flaw is when we bring in the X86 architecture.

    In x86, current_arch.function_parameters = ['$esp']. This means that beyond the first argument, decomp2gef will break with an IndexError as it tries to access current_arch.function_parameters[1] for the 2nd argument and so on.

    I'm not sure if there's a nice way to do this but I essentially redefined current_arch.function_parameters for X86 architectures.

    current_arch.function_parameters = [f'$esp+{x}' for x in range(0, 28, 4)]
    

    This allows decomp2gef to work, but I haven't considered implications yet as it may get pretty complicated(?)

    I welcome any ideas!!

    opened by caprinux 7
  • invalid string offset for section `.strtab'

    invalid string offset for section `.strtab'

    When connecting GEF to the decompiler, gdb fails to add-symbol-file and throws an error.

    BFD: /tmp/tmp3pmay68f.c.debug: invalid string offset 16777215 >= 282 for section '.strtab'

    Although this does not break decomp2gef, it causes the debugger to be without a symbol file and hence renders some features unusable.

    Sample binary with this behavior: sample_program.zip

    bug 
    opened by caprinux 6
  • Add proper support for attaching

    Add proper support for attaching

    Currently support for using the attach command is shaky with PIE binaries, and requires the strict process of launching a new gdb instances, attaching to the target process id, and then connecting to decomp2dbg. Attempting to attach again after this will cause the attached-to binary to have a new base address, and I assume this prevents decomp2dbg from functioning properly as I lose symbols after that. On my local system, disconnecting and reconnecting doesn't fix this (if done before or after the binary is attached to for the second time).

    enhancement 
    opened by frqmod 2
  • add instruction for wsl2

    add instruction for wsl2

    I suggest to add instruction for those who want to use this tool in wsl.

    1. run ./install.sh --ida /mnt/c/xxx/IDA/plugins to install
    2. listen to 0.0.0.0:3662 in IDA
    3. add an Inbound Rules for port 3662 in Windows Firewall, private or domain network
    4. run decompiler connect ida 192.168.xxx.xxx(LAN IP) 3662 in gdb to connect
    opened by RoderickChan 1
  • add checks when forcing text size

    add checks when forcing text size

    Previously, we force text_size to 0xFFFFFF with a nasty hack which only works on 64 bit binaries.

    This means that 32-bit binaries will throw a .strtab offset error or something along those times most of the time which is rather ugly.

    Hence we implement a bit check on the binary and enforce the hack if binary is 64-bit. We should definitely implement a nicer method that caters to 32-bit binaries when possible, but until then these will suffice to preserve sanity.

    addresses #14 !!

    opened by caprinux 1
  • Fix symbol size offsets/Designate appropriate symbol size to symbols

    Fix symbol size offsets/Designate appropriate symbol size to symbols

    Addresses #15, took a while to debug but on comparing queued_sym_sizes to sym_info_list, they seemed to be correct and match accordingly.

    I tested this PR against very basic 32 and 64 bit binaries, which both seems to give me the appropriate symbol sizes upon calling readelf.

    Do have a look! Unfortunately, does not fix #14 :(

    opened by caprinux 1
  • Requires sortedcontainers

    Requires sortedcontainers

    decomp2gef actually does have a single dependency which I did not realize was a dependency: sortedcontainers. It's needed to create a fast and memory-friendly mapping for non-native symbols in gef. We should decide if we want to make decomp2gef dependent of some python packages, or try to replace the functionality of SortedDict.

    discussion 
    opened by mahaloz 1
  • Feat Request: Programmable Ports & IPs

    Feat Request: Programmable Ports & IPs

    As brought up by @caprinux (in #3), we don't support the ability to specify ports or ips for connecting GEF over. Currently, it's hardcoded to 3662.

    To allow for this, we will need a fundamental change in architecture for the server-side, since we need a way to specify port and IP.

    enhancement 
    opened by mahaloz 1
  • Register Variable Support

    Register Variable Support

    • we can now supper every variable shown in a decompiler (including the ones assigned to a variable)
    • functions args are being deprecated in favor of setting them through either register vars or stack vars
    • refactored some janky type setting code

    Closes #38

    opened by mahaloz 0
  • Stack Vars from IDA assigned to incorrect locations

    Stack Vars from IDA assigned to incorrect locations

    Here is a simple example I tested on decomp2dbg v3.1.3 and ida7.5:

    #include <stdio.h>
    
    int main()
    {
        int a = 1;
        int b;
        scanf("%d",&b);
        int c = a + b;
        printf("%d\n",c);
        return 0;
    }
    

    the disassembled result of ida is:

    int __cdecl main(int argc, const char **argv, const char **envp)
    {
      int v4; // [rsp+Ch] [rbp-14h] BYREF
      int v5; // [rsp+10h] [rbp-10h]
      int v6; // [rsp+14h] [rbp-Ch]
      unsigned __int64 v7; // [rsp+18h] [rbp-8h]
    
      v7 = __readfsqword(0x28u);
      v5 = 1;
      __isoc99_scanf(&unk_2004, &v4, envp);
      v6 = v4 + v5;
      printf("%d\n", (unsigned int)(v4 + v5));
      return 0;
    }
    

    when I executed to __isoc99_scanf(&unk_2004, &v4, envp);, I tried to show the value of v5, but it is different from $rbp - 0x10

    image

    Is this a bug or did I do something wrong?

    bug 
    opened by LioTree 2
  • binary ninja plugin manager?

    binary ninja plugin manager?

    The binary ninja plugin manager supports plugins that exist in subfolders. All it would take would be to tag and cut a release (or using release_helper) and let me know on this issue and I can add it. Then subsequent releases automatically notify us and we update the plugin manager accordingly.

    Note that I haven't looked at the current import hierarchy but because the plugin manager doesn't necessarily install things to the global namespace it might require some tweaks to how imports are done.

    opened by psifertex 1
  • Tab completion broken in Archlinux gdb

    Tab completion broken in Archlinux gdb

    After sourcing decomp2dbg in my gdbinit I have the commands available but it's not possible anymore to use tab-completion to look for help or alike. This happens with a naked gdb with source ~/.decomp2dbg.py as only entry.

    opened by ysf 6
  • Add Support for shared librairies.

    Add Support for shared librairies.

    Hello. Tested on binary it work perfect, great tools! But when i test on shared libraries started with another binary, after connecting to the server, the tool won't work. (no decompilation, breakpoints have offset errors) . I know original behavior is to start the server on binary and use the connect while debugging the binary with gdb. But in my situation I can't debug the libraries without starting the linked main binary before. (Maybe adding muliple server syncing ? Decomp2dbg can't export ida decompiled code on gdb because shared librairies is extern but maybe adding another syncing server on shared lib IDA instance + syncing correctly when jumping on shared lib can work)

    opened by 0xMirasio 16
  • Add Support for Struct Imports

    Add Support for Struct Imports

    For now, we will only support IDA since we have a clear-cut way to both get every struct and also know when they have been updated. This may also be possible in Binja, but out of question for future Ghidra support... that one will have to wait.

    IDA Changes

    In IDA we need to utilize finding all ordinal numbers, which represents each custom struct in IDA. After that, we can use idc.print_decls("1", 0) for each number to get a nice C representation of the struct. Now that we have a string that has the C-definition of the struct we need to do things in the core.

    The changes all take place in the server. It's possible this may change the old API.

    Client Changes

    Assuming we now have a series of structs that are represented in C, we actually need to compile them into an object file and then add them with the classic add-symbol-file we use on the backend for other things. The trick though is adding this symbol file before we add the big one with all the global symbols here: https://github.com/mahaloz/decomp2dbg/blob/57983617d9a14f1f2ed7b54ee07bd15f14075c45/decomp2dbg/clients/gdb/symbol_mapper.py#L243

    Since both symbol files will be loaded into the same place, there will be an overlapping main function. We either need to bake structs directly into the first file we create, or we need to make a new way to add native-struct through the symbol mapper.

    enhancement 
    opened by mahaloz 0
Releases(v3.1.3)
  • v3.1.3(Nov 21, 2022)

  • v3.1.2(Nov 21, 2022)

  • v3.1.1(Nov 18, 2022)

  • v3.1.0(Nov 17, 2022)

    What's Changed

    • Add Ghidra Demo & Refactor readme by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/47
    • Reflect config changes in pwndbg by @szymex73 in https://github.com/mahaloz/decomp2dbg/pull/48
    • Typo in README.md by @Ice1187 in https://github.com/mahaloz/decomp2dbg/pull/49

    New Contributors

    • @szymex73 made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/48
    • @Ice1187 made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/49

    Full Changelog: https://github.com/mahaloz/decomp2dbg/compare/v3.0.0...v3.1.0

    Source code(tar.gz)
    Source code(zip)
    d2d-ghidra-plugin.zip(1.41 MB)
  • v3.0.0(Oct 25, 2022)

    • Added Ghidra support
    • Refactored how installing works to full Python-only
    • Added dependence on the BinSync project

    What's Changed

    • Python Installer by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/46

    Full Changelog: https://github.com/mahaloz/decomp2dbg/compare/v2.2.0...v3.0.0

    Source code(tar.gz)
    Source code(zip)
    d2d-ghidra-plugin.zip(1.41 MB)
  • v2.2.0(Oct 25, 2022)

    What's Changed

    • Support native symbol mapping by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/1
    • added REAL native support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/2
    • fail gracefully on bad argv & fix bad elfs reads by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/6
    • Add angrmanagement support for decomp2gef by @Cl4sm in https://github.com/mahaloz/decomp2dbg/pull/8
    • DRAFT: Local Var Support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/7
    • Api refactor by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/9
    • Major Refactor: Global Vars, Programmable Ports, Packaging by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/10
    • make sure IDA Plugin has all consts defined by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/12
    • Support remote debugging by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/17
    • Fix rebasing bugs in angr-decompiler plugin by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/18
    • Update GEF API use to latest version by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/22
    • Fix symbol size offsets/Designate appropriate symbol size to symbols by @caprinux in https://github.com/mahaloz/decomp2dbg/pull/20
    • add checks when forcing text size by @caprinux in https://github.com/mahaloz/decomp2dbg/pull/21
    • minor fixes + replace all usage of GEF Elf object with pyelftools by @caprinux in https://github.com/mahaloz/decomp2dbg/pull/25
    • [WIP] Binja Support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/19
    • fixed bad sizing on binaries that generate a larger blank symbol section by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/26
    • Fix symbol duplication by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/27
    • fix another duplication bug by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/28
    • [WIP] Support Vanilla GDB by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/30
    • Fix a typo that caused the plugin not to work for Python <3.8. by @adamdoupe in https://github.com/mahaloz/decomp2dbg/pull/32
    • Fix Manual Install Instructions by @adamdoupe in https://github.com/mahaloz/decomp2dbg/pull/31
    • Fix README typos by @mborgerson in https://github.com/mahaloz/decomp2dbg/pull/33
    • Handle stack frame offset for stack variables on x86 architectures by @zolutal in https://github.com/mahaloz/decomp2dbg/pull/35
    • always refresh baseaddr on connect by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/37
    • Register Variable Support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/39
    • Support loading symbols at configurable base addresses by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/42
    • Ghidra Support by @mahaloz in https://github.com/mahaloz/decomp2dbg/pull/45

    New Contributors

    • @mahaloz made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/1
    • @Cl4sm made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/8
    • @caprinux made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/20
    • @adamdoupe made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/32
    • @mborgerson made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/33
    • @zolutal made their first contribution in https://github.com/mahaloz/decomp2dbg/pull/35

    Full Changelog: https://github.com/mahaloz/decomp2dbg/commits/v2.2.0

    Source code(tar.gz)
    Source code(zip)
    d2d-ghidra-plugin.zip(1.41 MB)
Owner
Zion
Native Hawaiian | Phd Student @sefcom | Co-captain @shellphish | President of @asu-hacking-club
Zion
Projeto em Python colaborativo para o Bootcamp de Dados do Itaú em parceria com a Lets Code

🧾 lets-code-todo-list por Henrique V. Domingues e Josué Montalvão Projeto em Python colaborativo para o Bootcamp de Dados do Itaú em parceria com a L

Henrique V. Domingues 1 Jan 11, 2022
Beautiful static documentation generator for OpenAPI/Swagger 2.0

Spectacle The gentleman at REST Spectacle generates beautiful static HTML5 documentation from OpenAPI/Swagger 2.0 API specifications. The goal of Spec

Sourcey 1.3k Dec 13, 2022
A repository of links with advice related to grad school applications, research, phd etc

A repository of links with advice related to grad school applications, research, phd etc

Shaily Bhatt 946 Dec 30, 2022
Near Zero-Overhead Python Code Coverage

Slipcover: Near Zero-Overhead Python Code Coverage by Juan Altmayer Pizzorno and Emery Berger at UMass Amherst's PLASMA lab. About Slipcover Slipcover

PLASMA @ UMass 325 Dec 28, 2022
Spin-off Notice: the modules and functions used by our research notebooks have been refactored into another repository

Fecon235 - Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt Case-Shiller housing asset portfolio eq

Adriano 825 Dec 27, 2022
Swagger Documentation Generator for Django REST Framework: deprecated

Django REST Swagger: deprecated (2019-06-04) This project is no longer being maintained. Please consider drf-yasg as an alternative/successor. I haven

Marc Gibbons 2.6k Jan 03, 2023
🐱‍🏍 A curated list of awesome things related to Hugo themes.

awesome-hugo-themes Automated deployment @ 2021-10-12 06:24:07 Asia/Shanghai &sorted=updated Theme Author License GitHub Stars Updated Blonde wamo MIT

13 Dec 12, 2022
Quickly download, clean up, and install public datasets into a database management system

Finding data is one thing. Getting it ready for analysis is another. Acquiring, cleaning, standardizing and importing publicly available data is time

Weecology 274 Jan 04, 2023
An introduction course for Python provided by VetsInTech

Introduction to Python This is an introduction course for Python provided by VetsInTech. For every "boot camp", there usually is a pre-req, but becaus

Vets In Tech 2 Dec 02, 2021
Version bêta d'un système pour suivre les prix des livres chez Books to Scrape,

Version bêta d'un système pour suivre les prix des livres chez Books to Scrape, un revendeur de livres en ligne. En pratique, dans cette version bêta, le programme n'effectuera pas une véritable surv

Mouhamed Dia 1 Jan 06, 2022
🧙 A simple, typed and monad-based Result type for Python.

meiga 🧙 A simple, typed and monad-based Result type for Python. Table of Contents Installation 💻 Getting Started 📈 Example Features Result Function

Alice Biometrics 31 Jan 08, 2023
PowerApps-docstring is a console based, pipeline ready application that automatically generates user and technical documentation for Power Apps.

powerapps-docstring PowerApps-docstring is a console based, pipeline ready application that automatically generates user and technical documentation f

Sebastian Muthwill 30 Nov 23, 2022
Course Materials for Math 340

UBC Math 340 Materials This repository aims to be the one repository for which you can find everything you about Math 340. Lecture Notes Lecture Notes

2 Nov 25, 2021
Python solutions to solve practical business problems.

Python Business Analytics Also instead of "watching" you can join the link-letter, it's already being sent out to about 90 people and you are free to

Derek Snow 357 Dec 26, 2022
The source code that powers readthedocs.org

Welcome to Read the Docs Purpose Read the Docs hosts documentation for the open source community. It supports Sphinx docs written with reStructuredTex

Read the Docs 7.4k Dec 25, 2022
OpenTelemetry Python API and SDK

Getting Started • API Documentation • Getting In Touch (GitHub Discussions) Contributing • Examples OpenTelemetry Python This page describes the Pytho

OpenTelemetry - CNCF 1.1k Jan 08, 2023
Types that make coding in Python quick and safe.

Type[T] Types that make coding in Python quick and safe. Type[T] works best with Python 3.6 or later. Prior to 3.6, object types must use comment type

Contains 17 Aug 01, 2022
A Power BI/Google Studio Dashboard to analyze previous OTC CatchUps

OTC CatchUp Dashboard A Power BI/Google Studio dashboard analyzing OTC CatchUps. File Contents * ├───data ├───old summaries ─── *.md ├

11 Oct 30, 2022
advance python series: Data Classes, OOPs, python

Working With Pydantic - Built-in Data Process ========================== Normal way to process data (reading json file): the normal princiople, it's f

Phung Hưng Binh 1 Nov 08, 2021
Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects

CLI tool to measure the build time of different, free configurable Sphinx-Projec

useblocks 11 Nov 25, 2022