# What is the Heap?

## Lecture

{% embed url="<https://youtu.be/coAJ4KyrWmY>" %}
What is the Heap?
{% endembed %}

## Recap: Types of Memory

Memory comes in different types:

* **ELF .text:** where the code lives
* **ELF .plt:** where library function stubs live
* **ELF .got:** where pointers to imported symbols live
* **ELF .bss:** used for uninitialized global writable data (such as global arrays without initial values)
* **ELF .data:** used for pre-initialized global writable data (such as global arrays with initial values)
* **ELF .rodata:** used for global read-only data (such as string constants)
* **Stack:** local variables, temporary storage, call stack metadata

But what if you needed a place to store long-lived dynamic memory, for example, a variable-length list of NPCs in a game? What if you needed dynamic memory allocation?

## One Idea: `mmap()`

What if we `mmap()`ed memory as we need it? For example:

```c
mmap(0, num_pages*0x100, ...)
```

**Pros:**

* Allows dynamic allocation/deallocation according to changing program needs.
* Allocated memory survives across functions.

**Cons:**

* Inflexible allocation size (must be multiples of 4096 bytes).
* Crazy slow (requires kernel involvement).

## Smarter Solution

What if we wrote a library that `mmap()`ed a bunch of memory and handed out small chunks of it on demand!

The library could be used like:

```c
char *firstname = allocate_memory(128);
char *lastname = allocate_memory(256);
scanf("%s %s", firstname, lastname);
printf("Hello %s %s!", firstname, lastname);
free_memory(firstname);
free_memory(lastname);
```

Pictorially:

```
--------------------------------------------------
| firstname | lastname | mmap()ed but unassigned |
--------------------------------------------------
```

This idea is called a **dynamic allocator** in the real world.

## Dynamic Allocators Exist!

We're not the first to have this idea:

* **General Purpose:**
  * Doug Lea (pictured) releases dlmalloc into public domain in 1987.
* **Linux:**
  * ptmalloc (Posix Thread aware fork of dlmalloc)
* **FreeBSD:**
  * jemalloc (also used in Firefox, Android)
* **Windows:**
  * Segment Heap
  * NT Heap
* **Kernel allocators:**
  * kmalloc (Linux kernel memory allocator)
  * kalloc (iOS kernel memory allocator)

In CTF, we are interested in **ptmalloc** since we are dealing with Linux binaries most of the times.

{% hint style="info" %}
The memory space managed by a dynamic allocator is colloquially known as **"The Heap"**. It has nothing to do with the heap data structure.
{% endhint %}

## What Does the Heap Do?

The heap, as implemented by ptmalloc/glibc (and analogues), provides:

* `malloc()`: allocate some memory (chunk)
* `free()`: free a prior allocated chunk

And some auxiliary functions:

* `realloc()`: change the size of an allocation
* `calloc()`: allocate and zero-out memory

These functions are used, extensively, by practically every single non-trivial piece of software.

## How Does the Heap Work?

ptmalloc actually does not use `mmap()`!

**The Data Segment:**

* Historic oddity from segmented memory spaces of yore with ASLR, placed randomly into memory near-ish the PIE base
* Starts out with a size of 0
* Managed by the brk and sbrk system calls:
  * `sbrk(NULL)` returns the end of the data segment
  * `sbrk(delta)` expands the end of the data segment by delta bytes
  * `brk(addr)` expands the end of the data segment to addr

Under the hood, this is managed just like `mmap()`. ptmalloc slices off bits of the data segment for small allocations, and uses `mmap()` for large allocations.

## Dangers of the Heap

What can go wrong?

The heap is:

1. Used by imperfect human programmers
   * Humans forget to free memory
   * Humans forget all the spots where they store pointers to data
   * Humans forget what they've freed
2. A library that strives for performance
   * Allocation and deallocation needs to be fast, or programs will slow down
   * Optimizations often leave security as an afterthought

Bugs caused by #1 become security issues due to #2 if not caught!

## Here Lies Danger

How to detect issues?

* **Valgrind** can detect heap misuse (if your testcases trigger it)
* **glibc itself** has some hardening techniques:
  * `export MALLOC_CHECK_=1`
  * `export MALLOC_PERTURB_=1`
  * `export MALLOC_MMAP_THRESHOLD_=1`
* There are various **"more secure" allocators** being developed (but not really deployed)

Like many other issues, no general techniques exist for detecting dynamic allocation errors.
