The architecture of most modern processors, with the exception of some embedded systems, involves a security model. For example, the rings model specifies multiple privilege levels under which software may be executed: a program is usually limited to its own address space so that it cannot access or modify other running programs or the operating system itself, and is usually prevented from directly manipulating hardware devices (e.g. the frame buffer or network devices).
Ring 3 => User Mode
Ring 0 => Kernel Mode.
The smaller the number, the higher the privilege.
However, many applications need access to these components, so system calls are made available by the operating system to provide well-defined, safe implementations for such operations. The operating system executes at the highest level of privilege, and allows applications to request services via system calls, which are often initiated via interrupts. An interrupt automatically puts the CPU into some elevated privilege level and then passes control to the kernel, which determines whether the calling program should be granted the requested service. If the service is granted, the kernel executes a specific set of instructions over which the calling program has no direct control, returns the privilege level to that of the calling program, and then returns control to the calling program. Pictorially:
Interrupt
(syscall)
User Mode ---------> Kernel Mode
Syscalls
Almost all programs have to interact with the outside world! This is primarily done via system calls (man syscalls). Each system call is well-documented in section 2 of the man pages (i.e., man 2 open).
System calls (on amd64) are triggered by:
Set rax to the system call number.
Store arguments in rdi, rsi, etc (more on this later).
Call the syscall instruction.
Below are some important syscalls.
fork
The fork() syscall creates an almost-the-same copy of the calling process (addresses, registers and PC will differ). The original process is called the parent and the newly-created process is called the child. Pictorially:
If the forking process failed, it returns a negative number. For the parent, fork() returns the PID of the child; for the child, fork() returns 0. Therefore, we can distinguish parent and child by simple if statement:
When a parent process calls fork(), it can then call wait() to wait for the child finish its execution. Definition:
intwait(int* stat_addr);
exec
The exec() syscall executes another program in the current process, maintaining the same PID. You can think of fork() as creating a box containing some stuff and think of exec() as replacing the stuff inside. Usually we will call fork() and then call exec(), but these two syscalls shouldn't be merged into one. The reason why it is true will be explained in the "Process" section: