Threads

Motivation: Concurrency within a Process

In the "Process" section, we were convinced that switching between many processes can make CPU buzy and therefore boosts CPU efficiency. The same can be applied within the scope of one process. For example, when we visit a web page, the text data will display first, and then small images, and then large images. This design makes sense because images take longer time to load. At the beginning of web page rendering, we should let the users see something at least, even though the web page is still incomplete.

In the actual implementation, the algorithm will be:

  • Call GetData() to download the text data.

  • Call ShowText() to display the text data.

  • Call GetData() to download the image data.

  • Call ProcessImage() to decompress the image data.

  • Call ShowImage() to display the image data.

Each function can be called in a thread since the funtionality of each function is independent from each other: we can process the image while processing text, they don't really conflict. This idea is called multithreading. In pseudocode:

void WebExplorer()
{
    char URL[] = "http://www.wikipedia.org";
    char buf[1024];
    thread_create(GetData, URL, buf);
    thread_create(ShowText, buf);
    ...
}

void GetData(char *URL, char *buf)
{
    ...
}

void ShowText(char *buf)
{
    ...
}

You may ask, why do we use 4 threads instead of 4 processes? There are two major cons about the process model:

  1. In our example, GetData() writes data to a buffer and ShowText() reads data from this buffer. In other word, they need shared resources. If we assign a process for each function, we need to worry about IPC and therefore it seems like an overkill.

  2. The cost of creating and maintaining processes is high: resource/PCB allocation/deallocation, context switching, etc.

Thread is the solution to these two problems.

Thread vs. Process

A process may contains multiple threads.

  • Thread is the atom of scheduling.

  • Process is the atom of resource allocation.

Detailed comparison:

The following data are public for all the threads (owned by the process):

  • Global variables

  • Heap

  • Static variables

  • Code

  • Open files

The following data are private for each thread:

  • Local variables

  • Stack

  • Registers

  • Function arguments

  • Thread Local Storage (TLS) data

Thread Implementation

There are three ways to implement threads:

  • User Thread

    • Create and manage threads in user mode.

  • Kernel Thread

    • Create and manage threads in kernel mode.

  • LightWeight Process

User Thread

Kernel Thread

LightWeight Thread

The Thread API

The Thread API contains the following syscalls:

  • pthread_create()

  • pthread_join()

The pthread_create() Syscall

Definition:

#include <pthread.h>

int pthread_create(pthread_t *restrict thread,
                   const pthread_attr_t *restrict attr,
                   void *(*start_routine)(void *),
                   void *restrict arg);

Arguments:

  • thread: A pointer to a structure of type pthread_t. We'll use this structure to interact with this thread, and thus we need to pass it to pthread_create() in order to initialize it.

  • attr: Specifies any attributes this thread might have. Some examples include setting the stack size or perhaps information about the scheduling priority of the thread.

  • start_routine: Which function should this thread start running in?

  • arg: The argument to be passed to the function where the thread begins execution.

Thread Safety

Reference

Last updated