Disassembly & Decompilation

Overview

Disassembly converts machine code (binary) into assembly language instructions. Decompilation goes further, producing pseudocode that approximates the original high-level source code. Both are essential for understanding malware logic — control flow, algorithms, encryption routines, C2 protocols, and anti-analysis tricks.

Ghidra

Ghidra is an open-source reverse engineering framework developed by the NSA. It provides both disassembly and decompilation for many architectures.

GUI Workflow

1. Launch Ghidra: ghidra
2. Create or open a project
3. Import binary: File → Import File → select sample
4. Analyze: Yes to auto-analysis when prompted
5. Navigate:
   - Symbol Tree (left) — functions, imports, exports
   - Listing (center) — disassembly view
   - Decompiler (right) — pseudocode view
   - Function Graph — visual control flow

Key Ghidra Operations

Navigation:
  G                — Go to address
  Double-click      — Follow reference

Editing (Listing view):
  L                — Rename label / function (context-dependent)
  T                — Define data type at cursor
  ;                — Add comment

Search:
  S                — Search Memory (Search → Memory)
  Ctrl+Shift+E     — Search Program Text (Search → Program Text)
  Search → For Strings — no default keyboard shortcut; use the menu

Decompiler:
  Right-click var   — Rename, retype, or split variable
  Ctrl+E            — Open/focus Decompiler window (Window → Decompile)
  Ctrl+L            — Retype variable (in Decompiler context)

Headless Analysis (Scripting)

# Ghidra
# https://github.com/NationalSecurityAgency/ghidra

# Run headless analysis with a script
/usr/share/ghidra/support/analyzeHeadless /tmp/ghidra_project project_name \
  -import sample.exe \
  -postScript ListFunctions.java \
  -deleteProject

# Import and analyze without scripts
/usr/share/ghidra/support/analyzeHeadless /tmp/ghidra_project project_name \
  -import sample.exe

# Run a script on an existing project
/usr/share/ghidra/support/analyzeHeadless /tmp/ghidra_project project_name \
  -process sample.exe \
  -postScript ExportDecompiled.java

Ghidra Python Scripting (Ghidra Script Manager)

# Ghidra Script (run from Ghidra's Script Manager)
# Ghidra
# https://github.com/NationalSecurityAgency/ghidra

# List all functions and their addresses
from ghidra.program.model.listing import FunctionManager

fm = currentProgram.getFunctionManager()
for func in fm.getFunctions(True):
    print(f"{func.getEntryPoint()} : {func.getName()}")
# Ghidra Script — find cross-references to a function
# Ghidra
# https://github.com/NationalSecurityAgency/ghidra

from ghidra.program.model.symbol import RefType

target_name = "CreateRemoteThread"
sym = getSymbol(target_name, None)
if sym:
    refs = getReferencesTo(sym.getAddress())
    for ref in refs:
        print(f"  Called from: {ref.getFromAddress()}")

radare2

radare2 is a command-line reverse engineering framework.

# radare2
# https://github.com/radareorg/radare2

# Open binary with auto-analysis
r2 -A sample.exe

# Common analysis commands:
#   aa       — analyze all (basic)
#   aaa      — analyze all (aggressive)
#   aaaa     — analyze all (experimental, most thorough)

# Navigation and information
#   afl      — list all functions
#   afn name addr — rename function at addr
#   s addr   — seek to address
#   s main   — seek to main function

# Disassembly
#   pdf      — print disassembly of current function
#   pdf @main — disassembly of main
#   pd 20    — print 20 disassembly lines
#   pD 100   — disassemble 100 bytes

# Cross-references
#   axt addr — show xrefs TO an address
#   axf addr — show xrefs FROM an address

# Strings
#   iz       — strings in data sections
#   izz      — all strings
#   iz~http  — filter strings containing "http"

# Imports and exports
#   ii       — imports
#   iE       — exports
#   iS       — sections

# Visual modes
#   V        — visual mode
#   VV       — visual graph mode (function control flow)
#   V!       — visual panels mode

# Decompilation (with r2ghidra plugin)
#   pdg      — decompile current function (Ghidra decompiler)
#   pdg @main — decompile main

radare2 One-Liners

# radare2
# https://github.com/radareorg/radare2

# List all functions (non-interactive)
r2 -A -q -c 'afl' sample.exe

# List all imports (non-interactive)
r2 -q -c 'ii' sample.exe

# Disassemble main (non-interactive)
r2 -A -q -c 'pdf @main' sample.exe

# Find all strings containing "http" (non-interactive)
r2 -q -c 'izz~http' sample.exe

# Show cross-references to a function
r2 -A -q -c 'axt @sym.imp.CreateRemoteThread' sample.exe

objdump

objdump is a simpler disassembler from GNU Binutils, useful for quick one-off disassembly.

# GNU Binutils (objdump)
# https://www.gnu.org/software/binutils/

# Disassemble all executable sections
objdump -d sample

# Disassemble all sections
objdump -D sample

# Disassemble with Intel syntax (default is AT&T)
objdump -d -M intel sample

# Show interleaved source (if debug info present)
objdump -dS sample

# Disassemble a specific section
objdump -d -j .text sample

# Show relocations alongside disassembly
objdump -dr sample

Key Analysis Tasks

Finding the Entry Point

# radare2
# https://github.com/radareorg/radare2

# Show entry point
r2 -q -c 'ie' sample.exe

# In Ghidra: Navigation → Go To → entry

Identifying Key Functions

Common functions to locate during malware analysis:

Function Type What to Look For
main / WinMain Program entry point logic
Networking Calls to socket/connect/send/recv or WinINet/WinHTTP
Encryption XOR loops, AES/RC4 implementations, CryptoAPI calls
Persistence Registry writes, service creation, file copy
Process injection VirtualAllocEx + WriteProcessMemory + CreateRemoteThread
Anti-analysis IsDebuggerPresent, timing checks, VM detection
String decryption Loops that build strings character-by-character

Identifying Encryption Routines

Common patterns in disassembly:

XOR encryption:
  - Loop with XOR instruction against a key byte/word
  - Key may be hardcoded or derived at runtime

RC4:
  - Two loops: KSA (key scheduling) and PRGA (pseudo-random generation)
  - KSA initializes a 256-byte S-box
  - Look for a 256-iteration loop with swaps

AES:
  - SubBytes, ShiftRows, MixColumns, AddRoundKey operations
  - Often uses lookup tables (S-box as a 256-byte array)
  - May call CryptoAPI (CryptEncrypt/BCryptEncrypt)

Base64:
  - Lookup table with A-Z, a-z, 0-9, +, / characters
  - Processing in 3-byte input / 4-byte output blocks

Renaming and Annotating

Good practice during analysis:

  1. Rename functions as you understand them (sub_401000decrypt_config)
  2. Rename variables in the decompiler (param_1socket_fd)
  3. Add comments at key locations explaining behavior
  4. Set data types for function parameters and local variables
  5. Create bookmarks at important addresses

References

Tools

Further Reading