# Memory Allocation

## Program Break

A process can allocate memory by increasing the size of the heap, a variable-size segment of contiguous virtual memory that begins just after the uninitialized data segment of a process and grows and shrinks as memory is allocated and freed. **The current limit of the heap is referred to as the program break.**

Resizing the heap (i.e., allocating or deallocating memory) is actually as simple as telling the kernel to adjust its idea of where the process's program break is. **Initially, the program break lies just past the end of the** `.bss` **segment, as in the following diagram:**

![Typical memory layout of a process on Linux /x86-32](/files/gdxCehmAQfHmMdFQ4grC)

## `brk()` and `sbrk()`

Traditionally, the UNIX system has provided two syscalls for manipulating the program break, and these are both available on Linux: `brk()` and `sbrk()`. Although these syscalls are seldom used directly in programs, understanding them helps clarify how memory allocation works.

![brk() and sbrk()](/files/z3vLl2I3KC7mmLhDVhCy)

![Description and return value](/files/PYudXP4Q2JJj4Q4VMDk9)

**Key Ideas:**

* The `brk()` syscall sets the program break to the location specified by `addr`. Since virtual memory is allocated in units of pages, `addr` is effectively rounded to the next page boundary.
* A call to `sbrk()` adjusts the program break by adding increament to it. On Linux, `sbrk()` is a library function implemented on top of `brk()`. On success, sbrk() returns the previous address of the program break. In other word, if we have increased the program break, the return value points to the start of the newly allocated block of memory.
* A call to `sbrk(0)` returns the current setting of the program break without changing it.

## `malloc()` and `free()`

In general, C programs use the malloc family of functions to allocate and deallocate memory on the heap. These functions offer several advantages over `brk()` and `sbrk()`. In particular:

* They are standardized as part of the C language
* They are easier to use in threaded programs
* They provide a simple interface that allows memory to be allocated in small units
* They allow us to arbitrarily deallocate blocks of memory, which are maintained on a free list and recycled in future calls to allocate memory.

![malloc() family](/files/AipElsr7mkmVbwAzGkR3)

![Description](/files/CWGPg0qKY8s1SdmU0eQx)

![Return value](/files/1INylD4hipbCBekPSFa9)

**Key Ideas:**

* The `malloc(size)` function allocate `size` bytes from the heap and returns a pointer to the start of the newly allocated block of memory. The allocated memory is not initialized.
* Because `malloc()` returns `void *`, we can assign it to any type of C pointers. The block of memory returned by malloc() is always aligned on a byte boundary suitable for efficient access to any type of C data structure. In practice, this means that it is allocated on an 8-byte or 16-byte boundary on most architectures.
* `malloc(0)` returns a pointer to a small piece of memory that can (and should) be freed with `free()`.
* The `free()` function deallocates the block of memory pointed to by its `ptr` argument, which should be an address previously returned by `malloc()`.
* **In general,** `free()` **doesn't lower the program break, but instead adds the block of memory to a list of free blocks that are recycled by future calls to** `malloc()`**. This is done for several reasons:**
  * The block of memory being freed is typically somewhere in the middle of the heap, rather than at the end, so that lowering the program break is not possible.
  * It minimizes the number of `sbrk()` calls that the program must perform. Syscalls have a small but significant overhead.
  * In many cases, lowering the break would not help programs that allocate large amounts of memory, since they typically tend to hold on to allocated memory or repeatedly release and reallocate memory, rather than release it all and then continue to run for an extended period of time.

## Implementation of `malloc()` and `free()`

Although `malloc()` and `free()` provide an interface for allocating memory that is much easier to use than `brk()` and `sbrk()`, it is still possible to make various programming errors when using them. Understanding how `malloc()` and `free()` are implemented provides us with insights into the causes of these errors and how we can avoid them.

The implementation of `malloc()` is straightforward. **It first scans the list of memory blocks previously released by** `free()` **in order to find one whose size is larger than or equal to its requirements.** Different strategies may be employed for this scan, depending on the implementation; for example, **first-fit** or **best-fit**. If the block is exactly the right size, then it is returned to the caller. If it is larger, then it is split, so that a block of the correct size is returned to the caller and a smaller free block is left on the free list.

**If no block on the free list is large enough, then** `malloc()` **calls** `sbrk()` **to allocate more memory. To reduce the number of calls to** `sbrk()`**, rather than allocating exactly the number of bytes required,** `malloc()` **increases the program break in larger units (some multiple of the virtual memory page size), putting the excess memory onto the free list.**

Looking at the implementation of `free()`, things start to become more interesting. When free() places a block of memory onto the free list, how does it know what size that block is? This is done via a trick. **When `malloc()` allocates the block, it allocates extra bytes to hold an integer containing the size of the block. This integer is located at the beginning of the block; the address actually returned to the caller points to the location just past this length value, as shown in the following diagram:**

![Memory block returned by malloc()](/files/pqqiGaEchAvKjrN6XgNf)

When a block is placed on the (doubly linked) free list, `free()` uses the bytes of the block itself in order to add the block to the list, as shown in the following diagram:

![A block on the free list](/files/NUkN6FBJQURshCwvEE4a)

As blocks are deallocated and reallocated over time, the blocks of the free list will become intermingled with blocks of allocated, in-use memory, as shown in the following diagram:

![Heap containing allocated blocks and a free list](/files/OouxFxnvxaCYNbMQQ4E3)

Now consider the fact that C allows us to create pointers to any location in the heap, and modify the locations they point to, including the **length**, **previous free block**, and **next free block** pointers maintained by `free()` and `malloc()`. Add this to the preceding description, and we have a fairly combustible mix when it comes to creating obscure programming bugs. For example, if, via a misdirected pointer, we accidentally increase one of the length values preceding an allocated block of memory (**metadata corruption**), and subsequently deallocate that block, then `free()` will record the wrong size block of memory on the free list. Subsequently, `malloc()` may reallocate this block, leading to a scenario where the program has pointers to two blocks of allocated memory that it understands to be distinct, but which actually overlap. Numerous other pictures of what could go wrong can be drawn.

* After we allocate a block of memory, we should be careful not to touch any bytes outside the range of that block. This could occur, for example, as a result of faulty pointer arithmetic or off-by-one errors in loops updating the contents of a block.
* It is an error to free the same piece of allocated memory more than once. With glibc on Linux, we often get a segmentation violation (`SIGSEGV` signal). This is good, because it alerts us that we’ve made a programming error. However, more generally, freeing the same memory twice leads to unpredictable behavior (**double free**).
* We should never call `free()` with a pointer value that wasn’t obtained by a call to one of the functions in the `malloc` package.
* If we are writing a long-running program (e.g., a shell or a network daemon process) that repeatedly allocates memory for various purposes, then we should ensure that we deallocate any memory after we have finished using it. Failure to do so means that the heap will steadily grow until we reach the limits of available virtual memory, at which point further attempts to allocate memory fail. Such a condition is known as a **memory leak**.

## Reference

{% embed url="<https://man7.org/tlpi>" %}
The Linux Programming Interface
{% endembed %}

{% embed url="<https://man7.org/linux/man-pages/man2/sbrk.2.html>" %}
brk(2)
{% endembed %}

{% embed url="<https://man7.org/linux/man-pages/man3/free.3.html>" %}
malloc(3)
{% endembed %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ret2basic.gitbook.io/ctfnote/computer-science/the-linux-programming-interface/processes/memory-allocation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
