# The Process API

## Overview

The Process API contains the following 4 APIs:

* The `fork()` syscall allows one process, the parent, to create a new process, the child. This is done by making the new child process an (almost) exact duplicate of the parent: the child obtains copies of the parent’s stack, data, heap, and text segments. The term **fork** derives from the fact that we can envisage the parent process as dividing to yield two copies of itself.
* The `exit(status)` library function terminates a process, making all resources (memory, open file descriptors, and so on) used by the process available for subsequent reallocation by the kernel. The `status` argument is an integer that determines the termination status for the process. Using the `wait()` syscall, the parent can retrieve this status.
* The `wait(&status)` syscall has two purposes. First, if a child of this process has not yet terminated by calling `exit()`, then `wait()` suspends execution of the process until one of its children has terminated. Second, the termination status of the child is returned in the status argument of `wait()`.
* The `execve(pathname, argv, envp)` syscall loads a new program (`pathname`, with argument list `argv`, and environment list `envp`) into a process’s memory. The existing program text is discarded, and the stack, data, and heap segments are freshly created for the new program. This operation is often referred to as execing a new program. Later, we’ll see that several library functions are layered on top of `execve()`, each of which provides a useful variation in the programming interface. Where we don’t care about these interface variations, we follow the common convention of referring to these calls generically as `exec()`, but be aware that there is no system call or library function with this name.

Pictorially:

![Overview of the use of fork(), exit(), wait(), and execve()](/files/8V7Qm2rQ3NQMVKxvKoTd)

## `fork()`

The `fork()` syscall is used to craete a new process. Consider the following program:

```c
// p1.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    printf("hello world (pid:%d)\n", (int) getpid());
    int rc = fork();

    if (rc < 0)
    {
        fprintf(stderr, "Fork failed.\n");
        exit(1);
    }
    // Child process
    else if (rc == 0)
    {
        printf("hello, I am child (pid:%d)\n", (int) getpid());
    }
    // Parent process
    else
    {
        printf("hello, I am parent of %d (pid:%d)\n", rc, (int) getpid());
    }

    return 0;
}
```

**Key Ideas:**

* The process calls the `fork()` syscall, which the OS provides as a way to create a new process. The process that is created is an (almost) exact copy of the calling process. The "caller" is the **parent** and the "callee" is the **child**.
* The newly-created process doesn't start running at `main()`, rather, it just comes into life as if it had called `fork()` itself.
* While the parent receives the PID of the newly-created child, the child receives a return code of 0. This differentiation is useful, because it is simple then to write the code that handles the two different cases (as above).
* Note that this program is non-deterministic: the parent may printf first, or the child may printf, depending on the CPU scheduler.

## `wait()`

The `wait()` syscall asks the parent to wait for a child process to finish what it has been doing. Consider the following program:

```c
// p2.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char *argv[])
{
    printf("hello world (pid:%d)\n", (int) getpid());
    int rc = fork();

    if (rc < 0)
    {
        fprintf(stderr, "Fork failed.\n");
        exit(1);
    }
    // Child process
    else if (rc == 0)
    {
        printf("hello, I am child (pid:%d)\n", (int) getpid());
    }
    // Parent process
    else
    {
        // Delay the parent process execution
        // until the child finishes executing.
        int rc_wait = wait(NULL);
        printf("hello, I am parent of %d (rc_wait:%d) (pid:%d)\n", rc, rc_wait, (int) getpid());
    }

    return 0;
}
```

**Key Ideas:**

* This time the program is deterministic: the child process will always printf first because of `wait()`.
* If the child runs first, then it is all good; if the parent runs first, it will wait for the child.

## `exec()`

The `exec()` syscall is useful when you want to run a program that is different from the calling program. Consider the following program:

```c
// p3.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>

int main(int argc, char *argv[])
{
    printf("hello world (pid:%d)\n", (int) getpid());
    int rc = fork();

    if (rc < 0)
    {
        fprintf(stderr, "Fork failed.\n");
        exit(1);
    }
    // Child process
    else if (rc == 0)
    {
        printf("hello, I am child (pid:%d)\n", (int) getpid());
        char *myargs[3];
        // Program: "wc" (word count)
        myargs[0] = strdup("wc");
        // Argument: file to count
        myargs[1] = strdup("p3.c");
        // Marks end of array
        myargs[2] = NULL;
        // Runs word count
        execvp(myargs[0], myargs);
        printf("This shouldn't print out.");
    }
    // Parent process
    else
    {
        // Delay the parent process execution
        // until the child finishes executing.
        int rc_wait = wait(NULL);
        printf("hello, I am parent of %d (rc_wait:%d) (pid:%d)\n", rc, rc_wait, (int) getpid());
    }

    return 0;
}
```

**Key Ideas:**

* Given the name of an executable (e.g., `wc`), and some arguments (e.g., `p3.c`), it loads code (and static data) from that executable and overwrites its current code segment (and current static data) with it.
* `exec()` does not create a new process; rather, it transforms the currently running program (formerly `p3`) into a different running program (`wc`).
* After the `exec()` in the child, it is almost as if `p3.c` never ran; a successful call to `exec()` never returns.

## Motivating the API

The separation of `fork()` and `exec()` is essential in building a UNIX shell, because it lets the shell run code after the call to `fork()` but before the call to `exec()`.

Image we are interacting with a UNIX shell. You type a command into it, the shell will do the following things:

1. Figures out where in the file system the executable resides through `$PATH` environment variable.
2. Calls `fork()` to create a new child process to run the command.
3. Calls some variant of `exec()` to run the command.
4. Waits for the command to complete by calling `wait()`.
5. When the child completes, the shell returns from `wait()` and prints out a prompt again, ready for your next command.

The separation of `fork()` and `exec()` allows the shell do a whole bunch of useful things rather easily. For example:

```shell
wc p3.c > newfile.txt
```

When the child is created, before calling `exec()`, the shell closes standard output and opens the file `newfile.txt`. By doing so, any output from the soon-to-be-running program `wc` are sent to the file instead of the screen. This operation will be impossible if `fork()` and `exec()` are merged as one syscall.

We can actually implement this redirection feature using `fork()` and `exec()`:

```c
// p4.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h>
#include <sys/wait.h>

int main(int argc, char *argv[])
{
    int rc = fork();

    if (rc < 0)
    {
        fprintf(stderr, "Fork failed.\n");
        exit(1);
    }
    // Child process: redirect standard output to a file
    else if (rc == 0)
    {
        close(STDOUT_FILENO);
        open("./p4.output", O_CREAT|O_WRONLY|O_TRUNC, S_IRWXU);

        // Now exec "wc"...
        char *myargs[3];
        // Program: "wc" (word count)
        myargs[0] = strdup("wc");
        // Argument: file to count
        myargs[1] = strdup("p4.c");
        // Marks end of array
        myargs[2] = NULL;
        // Runs word count
        execvp(myargs[0], myargs);
    }
    // Parent process
    else
    {
        // Delay the parent process execution
        // until the child finishes executing.
        int rc_wait = wait(NULL);
    }

    return 0;
}
```

UNIX **pipes** are implemented in a similar way, but with the pipe() syscall. In this case, the output of one process is connected to that same pipe; thus, the output of one process seamlessly is used as input to the next, and long and useful chains of commands can be strung together. For example:

```bash
grep -o foo file | wc -l
```

## Reference

{% embed url="<https://man7.org/tlpi>" %}
The Linux Programming Interface
{% endembed %}

{% embed url="<https://pages.cs.wisc.edu/~remzi/OSTEP>" %}
OSTEP
{% endembed %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ret2basic.gitbook.io/ctfnote/computer-science/the-linux-programming-interface/processes/the-process-api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
