Escaping seccomp

Lecture

Breaking out

Recall that we have discussed the "trust boundary" in the "Into the Jail" section: when a child process needs to perform a privileged action, it must ask the parent process for permission. In other word, to do anything useful, a sandboxed process needs to communicate with the privileged process. That is, the sandboxed process needs to use some of the syscalls. This relaxation opens up some attack vectors:

  1. Permissive policies

  2. Syscall confusion

  3. Kernel vulnerabilities (in the syscall handlers)

Attack Vector 1: Permissive Policies

System calls are complex, and there are a lot of them. Developers might avoid breaking functionality by erring on the side of permissiveness.

A well-known example is ptrace(). Depending on system configuration, allowing the ptrace() system call could let a sandboxed process to "puppet" a non-sandboxed process.

Some less well-known effects include:

  • sendmsg() can transfer file descriptors between processes.

  • prctl() has bizarre possible effects.

  • process_vm_writev() allows direct access to other process' memory.

Attack Vector 2: Syscall Confusion

Policies that allow both 32-bit and 64-bit syscalls can fail to properly sandbox one or the other mode.

Many 64-bit architectures are backwards compatible with their 32-bit ancestors. For example:

  • amd64 / x86_64 => x86

  • aarch64 => arm

  • mips64 => mips

  • powerpc64 => ppc

  • sparc64 => sparc

On some systems (including amd64), you can switch between 32-bit mode and 64-bit mode in the same process, so the kernel must be ready for either. However, syscalls numbers differ between architectures, including 32-bit and 64-bit variants of the same architecture. For example, the syscall number for execve is 0xb on x86 and 0x3b on x86-64; the syscall number for exit is 0x1 on x86 and 0x3c on x86-64. This behavior causes the potential "syscall confusion" when the both 32-bit and 64-bit syscalls are allowed.

Attack Vector 3: Kernel Vulnerabilities

Even if the seccomp sandbox is correctly configured, attackers can still interact with whitelisted syscalls.

As long as attackers can use some of the syscalls, they are able to trigger vulnerabilities in the kernel. For real-world examples, check out Chrome sandbox escape exploit:

Attack Vector 4: Side Channel Attacks

Think: what is your goal as an attacker? Is it always code execution?

Not really. Often, your goal is data exfiltration (like /flag!). Even if you can't directly communicate with the outside world, often you can send "smoke signals":

  • Runtime of a process (see sleep(x) system call) can convey a lot of data.

  • Clean termination or a crash? This can convey one bit.

  • Return value of a program (exit(x)) can convey one byte.

For a real-world example, attackers use DNS queries to bypass network egress filters. As long as you can communicate 1 bit, you can repeat the attack to get more and more bits!

Last updated