Disassembly & Decompilation
Overview
Disassembly converts machine code (binary) into assembly language instructions. Decompilation goes further, producing pseudocode that approximates the original high-level source code. Both are essential for understanding malware logic — control flow, algorithms, encryption routines, C2 protocols, and anti-analysis tricks.
Ghidra
Ghidra is an open-source reverse engineering framework developed by the NSA. It provides both disassembly and decompilation for many architectures.
GUI Workflow
1. Launch Ghidra: ghidra
2. Create or open a project
3. Import binary: File → Import File → select sample
4. Analyze: Yes to auto-analysis when prompted
5. Navigate:
- Symbol Tree (left) — functions, imports, exports
- Listing (center) — disassembly view
- Decompiler (right) — pseudocode view
- Function Graph — visual control flow
Key Ghidra Operations
Navigation:
G — Go to address
Double-click — Follow reference
Editing (Listing view):
L — Rename label / function (context-dependent)
T — Define data type at cursor
; — Add comment
Search:
S — Search Memory (Search → Memory)
Ctrl+Shift+E — Search Program Text (Search → Program Text)
Search → For Strings — no default keyboard shortcut; use the menu
Decompiler:
Right-click var — Rename, retype, or split variable
Ctrl+E — Open/focus Decompiler window (Window → Decompile)
Ctrl+L — Retype variable (in Decompiler context)
Headless Analysis (Scripting)
# Ghidra
# https://github.com/NationalSecurityAgency/ghidra
# Run headless analysis with a script
/usr/share/ghidra/support/analyzeHeadless /tmp/ghidra_project project_name \
-import sample.exe \
-postScript ListFunctions.java \
-deleteProject
# Import and analyze without scripts
/usr/share/ghidra/support/analyzeHeadless /tmp/ghidra_project project_name \
-import sample.exe
# Run a script on an existing project
/usr/share/ghidra/support/analyzeHeadless /tmp/ghidra_project project_name \
-process sample.exe \
-postScript ExportDecompiled.java
Ghidra Python Scripting (Ghidra Script Manager)
# Ghidra Script (run from Ghidra's Script Manager)
# Ghidra
# https://github.com/NationalSecurityAgency/ghidra
# List all functions and their addresses
from ghidra.program.model.listing import FunctionManager
fm = currentProgram.getFunctionManager()
for func in fm.getFunctions(True):
print(f"{func.getEntryPoint()} : {func.getName()}")
# Ghidra Script — find cross-references to a function
# Ghidra
# https://github.com/NationalSecurityAgency/ghidra
from ghidra.program.model.symbol import RefType
target_name = "CreateRemoteThread"
sym = getSymbol(target_name, None)
if sym:
refs = getReferencesTo(sym.getAddress())
for ref in refs:
print(f" Called from: {ref.getFromAddress()}")
radare2
radare2 is a command-line reverse engineering framework.
# radare2
# https://github.com/radareorg/radare2
# Open binary with auto-analysis
r2 -A sample.exe
# Common analysis commands:
# aa — analyze all (basic)
# aaa — analyze all (aggressive)
# aaaa — analyze all (experimental, most thorough)
# Navigation and information
# afl — list all functions
# afn name addr — rename function at addr
# s addr — seek to address
# s main — seek to main function
# Disassembly
# pdf — print disassembly of current function
# pdf @main — disassembly of main
# pd 20 — print 20 disassembly lines
# pD 100 — disassemble 100 bytes
# Cross-references
# axt addr — show xrefs TO an address
# axf addr — show xrefs FROM an address
# Strings
# iz — strings in data sections
# izz — all strings
# iz~http — filter strings containing "http"
# Imports and exports
# ii — imports
# iE — exports
# iS — sections
# Visual modes
# V — visual mode
# VV — visual graph mode (function control flow)
# V! — visual panels mode
# Decompilation (with r2ghidra plugin)
# pdg — decompile current function (Ghidra decompiler)
# pdg @main — decompile main
radare2 One-Liners
# radare2
# https://github.com/radareorg/radare2
# List all functions (non-interactive)
r2 -A -q -c 'afl' sample.exe
# List all imports (non-interactive)
r2 -q -c 'ii' sample.exe
# Disassemble main (non-interactive)
r2 -A -q -c 'pdf @main' sample.exe
# Find all strings containing "http" (non-interactive)
r2 -q -c 'izz~http' sample.exe
# Show cross-references to a function
r2 -A -q -c 'axt @sym.imp.CreateRemoteThread' sample.exe
objdump
objdump is a simpler disassembler from GNU Binutils, useful for quick one-off disassembly.
# GNU Binutils (objdump)
# https://www.gnu.org/software/binutils/
# Disassemble all executable sections
objdump -d sample
# Disassemble all sections
objdump -D sample
# Disassemble with Intel syntax (default is AT&T)
objdump -d -M intel sample
# Show interleaved source (if debug info present)
objdump -dS sample
# Disassemble a specific section
objdump -d -j .text sample
# Show relocations alongside disassembly
objdump -dr sample
Key Analysis Tasks
Finding the Entry Point
# radare2
# https://github.com/radareorg/radare2
# Show entry point
r2 -q -c 'ie' sample.exe
# In Ghidra: Navigation → Go To → entry
Identifying Key Functions
Common functions to locate during malware analysis:
| Function Type | What to Look For |
|---|---|
| main / WinMain | Program entry point logic |
| Networking | Calls to socket/connect/send/recv or WinINet/WinHTTP |
| Encryption | XOR loops, AES/RC4 implementations, CryptoAPI calls |
| Persistence | Registry writes, service creation, file copy |
| Process injection | VirtualAllocEx + WriteProcessMemory + CreateRemoteThread |
| Anti-analysis | IsDebuggerPresent, timing checks, VM detection |
| String decryption | Loops that build strings character-by-character |
Identifying Encryption Routines
Common patterns in disassembly:
XOR encryption:
- Loop with XOR instruction against a key byte/word
- Key may be hardcoded or derived at runtime
RC4:
- Two loops: KSA (key scheduling) and PRGA (pseudo-random generation)
- KSA initializes a 256-byte S-box
- Look for a 256-iteration loop with swaps
AES:
- SubBytes, ShiftRows, MixColumns, AddRoundKey operations
- Often uses lookup tables (S-box as a 256-byte array)
- May call CryptoAPI (CryptEncrypt/BCryptEncrypt)
Base64:
- Lookup table with A-Z, a-z, 0-9, +, / characters
- Processing in 3-byte input / 4-byte output blocks
Renaming and Annotating
Good practice during analysis:
- Rename functions as you understand them (
sub_401000→decrypt_config) - Rename variables in the decompiler (
param_1→socket_fd) - Add comments at key locations explaining behavior
- Set data types for function parameters and local variables
- Create bookmarks at important addresses