What is the Heap?
Last updated
Last updated
Memory comes in different types:
ELF .text: where the code lives
ELF .plt: where library function stubs live
ELF .got: where pointers to imported symbols live
ELF .bss: used for uninitialized global writable data (such as global arrays without initial values)
ELF .data: used for pre-initialized global writable data (such as global arrays with initial values)
ELF .rodata: used for global read-only data (such as string constants)
Stack: local variables, temporary storage, call stack metadata
But what if you needed a place to store long-lived dynamic memory, for example, a variable-length list of NPCs in a game? What if you needed dynamic memory allocation?
mmap()
What if we mmap()
ed memory as we need it? For example:
Pros:
Allows dynamic allocation/deallocation according to changing program needs.
Allocated memory survives across functions.
Cons:
Inflexible allocation size (must be multiples of 4096 bytes).
Crazy slow (requires kernel involvement).
What if we wrote a library that mmap()
ed a bunch of memory and handed out small chunks of it on demand!
The library could be used like:
Pictorially:
This idea is called a dynamic allocator in the real world.
We're not the first to have this idea:
General Purpose:
Doug Lea (pictured) releases dlmalloc into public domain in 1987.
Linux:
ptmalloc (Posix Thread aware fork of dlmalloc)
FreeBSD:
jemalloc (also used in Firefox, Android)
Windows:
Segment Heap
NT Heap
Kernel allocators:
kmalloc (Linux kernel memory allocator)
kalloc (iOS kernel memory allocator)
In CTF, we are interested in ptmalloc since we are dealing with Linux binaries most of the times.
The memory space managed by a dynamic allocator is colloquially known as "The Heap". It has nothing to do with the heap data structure.
The heap, as implemented by ptmalloc/glibc (and analogues), provides:
malloc()
: allocate some memory (chunk)
free()
: free a prior allocated chunk
And some auxiliary functions:
realloc()
: change the size of an allocation
calloc()
: allocate and zero-out memory
These functions are used, extensively, by practically every single non-trivial piece of software.
ptmalloc actually does not use mmap()
!
The Data Segment:
Historic oddity from segmented memory spaces of yore with ASLR, placed randomly into memory near-ish the PIE base
Starts out with a size of 0
Managed by the brk and sbrk system calls:
sbrk(NULL)
returns the end of the data segment
sbrk(delta)
expands the end of the data segment by delta bytes
brk(addr)
expands the end of the data segment to addr
Under the hood, this is managed just like mmap()
. ptmalloc slices off bits of the data segment for small allocations, and uses mmap()
for large allocations.
What can go wrong?
The heap is:
Used by imperfect human programmers
Humans forget to free memory
Humans forget all the spots where they store pointers to data
Humans forget what they've freed
A library that strives for performance
Allocation and deallocation needs to be fast, or programs will slow down
Optimizations often leave security as an afterthought
Bugs caused by #1 become security issues due to #2 if not caught!
How to detect issues?
Valgrind can detect heap misuse (if your testcases trigger it)
glibc itself has some hardening techniques:
export MALLOC_CHECK_=1
export MALLOC_PERTURB_=1
export MALLOC_MMAP_THRESHOLD_=1
There are various "more secure" allocators being developed (but not really deployed)
Like many other issues, no general techniques exist for detecting dynamic allocation errors.