File Recovery & Carving
Overview
File recovery retrieves deleted files using filesystem metadata (inode/MFT entries that still reference data blocks). File carving recovers files from raw data — including unallocated space — by searching for known file headers and footers, independent of any filesystem structure. Both techniques are essential for recovering evidence that has been intentionally or accidentally deleted.
Filesystem-Based Recovery
When files are deleted, most filesystems only remove the directory entry and mark the data blocks as free. The data remains on disk until overwritten.
# The Sleuth Kit (fls, icat, tsk_recover)
# https://www.sleuthkit.org/
# List deleted files (entries marked with *)
fls -r -d -o 2048 disk.raw
# Example output:
# * r/r 256: deleted_document.docx
# * r/r 312: passwords.txt
# * d/d 400: temp_folder
# Recover a specific deleted file by inode
icat -r -o 2048 disk.raw 256 > recovered_document.docx
# Bulk recovery of all deleted files
tsk_recover -o 2048 disk.raw /output/recovered/
# Recover only allocated files (for image backup)
tsk_recover -a -o 2048 disk.raw /output/allocated/
# Recover all files (allocated + deleted)
tsk_recover -e -o 2048 disk.raw /output/all_files/
File Carving with Scalpel
Scalpel carves files from disk images based on file headers and footers defined in a configuration file.
# Scalpel
# https://github.com/sleuthkit/scalpel
# Basic carving (uses default scalpel.conf)
scalpel -o /output/carved/ disk.raw
# Use a custom configuration file
scalpel -c /path/to/custom.conf -o /output/carved/ disk.raw
# Carve from unallocated space only (extract with blkls first)
blkls -o 2048 disk.raw > unallocated.bin
scalpel -o /output/carved/ unallocated.bin
# Verbose output
scalpel -v -o /output/carved/ disk.raw
Scalpel Configuration (scalpel.conf):
The configuration file defines file types by their headers and footers. Uncomment the file types you want to carve.
# Format: extension case_sensitive max_size header footer
jpg y 200000000 \xff\xd8\xff\xe0\x00\x10 \xff\xd9
png y 200000000 \x89\x50\x4e\x47 \x49\x45\x4e\x44
pdf y 200000000 \x25\x50\x44\x46 \x25\x25\x45\x4f\x46
doc y 200000000 \xd0\xcf\x11\xe0\xa1\xb1
zip y 200000000 \x50\x4b\x03\x04 \x50\x4b\x05\x06
File Carving with Foremost
Foremost carves files using header and footer signatures.
# Foremost
# https://foremost.sourceforge.net/
# Basic carving
foremost -i disk.raw -o /output/carved/
# Carve specific file types only
foremost -t jpeg,png,pdf -i disk.raw -o /output/carved/
# Verbose output
foremost -v -i disk.raw -o /output/carved/
# Quick mode (search on 512-byte boundaries only)
foremost -q -i disk.raw -o /output/carved/
# Audit only (report what would be carved without extracting)
foremost -w -i disk.raw -o /output/carved/
File Carving with PhotoRec
PhotoRec recovers files from disk images, partitions, and damaged media. It supports over 480 file formats.
# PhotoRec (TestDisk suite)
# https://www.cgsecurity.org/wiki/PhotoRec
# Interactive mode (recommended for first-time use)
photorec disk.raw
# Specify output directory
photorec /d /output/recovered/ disk.raw
# PhotoRec interactive steps:
# 1. Select the disk/image
# 2. Choose partition type (Intel/GPT)
# 3. Select partition or whole disk
# 4. Choose filesystem type (ext2/3/4 or Other)
# 5. Choose search scope (Free space / Whole partition)
# 6. Select output directory
Bulk Extractor
bulk_extractor scans disk images for specific data patterns (email addresses, URLs, credit card numbers, phone numbers, etc.) without parsing the filesystem.
# bulk_extractor
# https://github.com/simsong/bulk_extractor
# Basic scan
bulk_extractor -o /output/bulk/ disk.raw
# Enable specific scanners only (note: "url" is not a scanner name;
# url.txt output is produced by the "email" scanner)
bulk_extractor -e net -e email -o /output/bulk/ disk.raw
# Scan with all scanners
bulk_extractor -o /output/bulk/ disk.raw
# Output files include:
# email.txt — extracted email addresses (email scanner)
# url.txt — extracted URLs (email scanner)
# domain.txt — extracted domain names (email scanner)
# ip.txt — extracted IP addresses (net scanner)
# ether.txt — extracted MAC addresses (net scanner)
# tcp.txt — extracted TCP session data (net scanner)
# telephone.txt — extracted phone numbers (accts scanner)
# ccn.txt — credit card numbers (accts scanner)
# exif.txt — EXIF metadata from images (exif scanner)
# zip.txt — ZIP file components (zip scanner)
# json.txt — JSON data fragments (json scanner)
# Set page size (default 16MB — increase for large images)
bulk_extractor -o /output/bulk/ -G 1073741824 disk.raw
# Specify number of threads
bulk_extractor -o /output/bulk/ -j 4 disk.raw
Recovering Data from Slack Space
Slack space is the area between the end of a file's data and the end of its last allocated cluster/block. It may contain data from previously deleted files.
# The Sleuth Kit (blkls, icat)
# https://www.sleuthkit.org/
# Extract all slack space from a partition
blkls -s -o 2048 disk.raw > slack.bin
# Extract slack space from a specific file
icat -s -o 2048 disk.raw 128 > file_slack.bin
# Search slack space for strings
strings slack.bin | grep -iE 'password|secret|key|http'
Verifying Recovered Files
# Check file type of recovered files
file recovered_file.dat
# Compute hash for evidence tracking
sha256sum recovered_file.dat
# Check if a file is corrupt (attempt to open with appropriate tool)
# For images:
identify recovered_image.jpg 2>&1 # ImageMagick
# For PDFs:
pdfinfo recovered_document.pdf 2>&1
# For ZIP/Office documents:
unzip -t recovered_archive.zip 2>&1
Recovery Limitations
| Scenario | Recovery Likelihood |
|---|---|
| Deleted, not overwritten | High — metadata and data intact |
| Deleted, partially overwritten | Partial — some data recoverable |
| SSD with TRIM enabled | Very low — TRIM zeroes blocks |
| Full disk encryption (unmounted) | None — data encrypted at rest |
| Secure wipe / zero fill | None — data destroyed |
| Formatted (quick format) | High — only metadata cleared |
| Formatted (full format) | Low — data may be zeroed |