Syscall
Motivation 1: OS-level API
Syscall is the OS-level "API" for implementing programs. For example, we can implement ls and shell using syscalls:
ls
Implement ls:
// ls.c
main()
{
char buf[512];
fd = open(".");
while(read(fd, buf, 512) > 0)
{
printf("%s", buf);
}
}The syscalls used are:
open()read()write():printf()actually callswrite(1, buf, len).
shell
Implement shell:
The syscalls used are:
scanf()fork()exec()wait()
Motivation 2: User Mode => Kernel Mode
The architecture of most modern processors, with the exception of some embedded systems, involves a security model. For example, the rings model specifies multiple privilege levels under which software may be executed: a program is usually limited to its own address space so that it cannot access or modify other running programs or the operating system itself, and is usually prevented from directly manipulating hardware devices (e.g. the frame buffer or network devices).
However, many applications need access to these components, so system calls are made available by the operating system to provide well-defined, safe implementations for such operations. The operating system executes at the highest level of privilege, and allows applications to request services via system calls, which are often initiated via interrupts. An interrupt automatically puts the CPU into some elevated privilege level and then passes control to the kernel, which determines whether the calling program should be granted the requested service. If the service is granted, the kernel executes a specific set of instructions over which the calling program has no direct control, returns the privilege level to that of the calling program, and then returns control to the calling program. Pictorially:
Syscalls
Almost all programs have to interact with the outside world! This is primarily done via system calls (man syscalls). Each system call is well-documented in section 2 of the man pages (i.e., man 2 open).
System calls (on amd64) are triggered by:
Set rax to the system call number.
Store arguments in rdi, rsi, etc (more on this later).
Call the syscall instruction.
Below are some important syscalls.
fork
The fork() syscall creates an almost-the-same copy of the calling process (addresses, registers and PC will differ). The original process is called the parent and the newly-created process is called the child. Pictorially:
If the forking process failed, it returns a negative number. For the parent, fork() returns the PID of the child; for the child, fork() returns 0. Therefore, we can distinguish parent and child by simple if statement:
wait
When a parent process calls fork(), it can then call wait() to wait for the child finish its execution. Definition:
exec
The exec() syscall executes another program in the current process, maintaining the same PID. You can think of fork() as creating a box containing some stuff and think of exec() as replacing the stuff inside. Usually we will call fork() and then call exec(), but these two syscalls shouldn't be merged into one. The reason why it is true will be explained in the "Process" section:
exec() contains a family of functions. The most common one in pwn is execve():
Oftentime we need to call execve("/bin/sh", 0, 0) to spawn a shell.
exit
The exit() syscall ends a process. Definition:
By convention, exit(0) means success and exit(1) means error.
open
The open() syscall opens a file. Definition:
The return value is a newly-created file descriptor.
read
The read() syscall reads the content from a file descriptor into a buffer. Definition:
The return value is the actual number of bytes gets read.
write
The write() syscall writes the content from a buffer into a file descriptor. Definition:
The return value is the actual number of bytes gets written.
strace
We can trace process syscalls using strace. For example, we can run strace whoami and observe the output:
The whoami command executes /usr/bin/whoami:
Then it opens libc:
reads from /etc/passwd:
and prints out my name:
Reference
Last updated
Was this helpful?
