What is the Heap?
Lecture
Recap: Types of Memory
Memory comes in different types:
ELF .text: where the code lives
ELF .plt: where library function stubs live
ELF .got: where pointers to imported symbols live
ELF .bss: used for uninitialized global writable data (such as global arrays without initial values)
ELF .data: used for pre-initialized global writable data (such as global arrays with initial values)
ELF .rodata: used for global read-only data (such as string constants)
Stack: local variables, temporary storage, call stack metadata
But what if you needed a place to store long-lived dynamic memory, for example, a variable-length list of NPCs in a game? What if you needed dynamic memory allocation?
One Idea: mmap()
mmap()
What if we mmap()
ed memory as we need it? For example:
Pros:
Allows dynamic allocation/deallocation according to changing program needs.
Allocated memory survives across functions.
Cons:
Inflexible allocation size (must be multiples of 4096 bytes).
Crazy slow (requires kernel involvement).
Smarter Solution
What if we wrote a library that mmap()
ed a bunch of memory and handed out small chunks of it on demand!
The library could be used like:
Pictorially:
This idea is called a dynamic allocator in the real world.
Dynamic Allocators Exist!
We're not the first to have this idea:
General Purpose:
Doug Lea (pictured) releases dlmalloc into public domain in 1987.
Linux:
ptmalloc (Posix Thread aware fork of dlmalloc)
FreeBSD:
jemalloc (also used in Firefox, Android)
Windows:
Segment Heap
NT Heap
Kernel allocators:
kmalloc (Linux kernel memory allocator)
kalloc (iOS kernel memory allocator)
In CTF, we are interested in ptmalloc since we are dealing with Linux binaries most of the times.
The memory space managed by a dynamic allocator is colloquially known as "The Heap". It has nothing to do with the heap data structure.
What Does the Heap Do?
The heap, as implemented by ptmalloc/glibc (and analogues), provides:
malloc()
: allocate some memory (chunk)free()
: free a prior allocated chunk
And some auxiliary functions:
realloc()
: change the size of an allocationcalloc()
: allocate and zero-out memory
These functions are used, extensively, by practically every single non-trivial piece of software.
How Does the Heap Work?
ptmalloc actually does not use mmap()
!
The Data Segment:
Historic oddity from segmented memory spaces of yore with ASLR, placed randomly into memory near-ish the PIE base
Starts out with a size of 0
Managed by the brk and sbrk system calls:
sbrk(NULL)
returns the end of the data segmentsbrk(delta)
expands the end of the data segment by delta bytesbrk(addr)
expands the end of the data segment to addr
Under the hood, this is managed just like mmap()
. ptmalloc slices off bits of the data segment for small allocations, and uses mmap()
for large allocations.
Dangers of the Heap
What can go wrong?
The heap is:
Used by imperfect human programmers
Humans forget to free memory
Humans forget all the spots where they store pointers to data
Humans forget what they've freed
A library that strives for performance
Allocation and deallocation needs to be fast, or programs will slow down
Optimizations often leave security as an afterthought
Bugs caused by #1 become security issues due to #2 if not caught!
Here Lies Danger
How to detect issues?
Valgrind can detect heap misuse (if your testcases trigger it)
glibc itself has some hardening techniques:
export MALLOC_CHECK_=1
export MALLOC_PERTURB_=1
export MALLOC_MMAP_THRESHOLD_=1
There are various "more secure" allocators being developed (but not really deployed)
Like many other issues, no general techniques exist for detecting dynamic allocation errors.
Last updated