Linux, tasks, threads, and processes.

Linux, tasks, threads, and processes.

I was recently in a discussion with a developer who was adamant that Linux only has one abstraction for running things, and that is "a task". It was difficult in the spur of the moment to explain why this was not strictly true, at least not without writing that otherwise friendly developer on the nose. In light of my background as a BSD kernel hacker, I thought I'd post a short writeup that I hope can clarify some things.

An operating system (OS) kernel is made up of code which is tagged so that the CPU runs it in a mode called "Ring 0". In this mode, the kernel is allowed to manage the hardware interrupts used to switch between things that are running on CPUs.

To decide when and where things should run, the kernel implements a scheduler. The modern Linux scheduler implements a "task" structure which contains CPU masks and other things that are important for deciding priorities. However, because the every computer architecture handles running code slightly differently, Linux maps that task onto a schedulable entity called a "thread".

A thread provides everything needed for code execution, including CPU flags, timers, counters, a stack, and more. So Linux schedules a task and its associated thread is run. All threads operate within the confines of a "process".

A process provides threads with a shared heap in a given virtual address space and provides lots of useful abstractions that a programmer might want such as security info, file handles, the working directory, and more.

Everything up until now has been about kernel code. However, on top of * the kernel-thread you have an associated user-thread, which is basically some state and your program code. The features provided by user-threads are ultimately defined by the programming language, but most modern Linux distributions ship with a POSIX compliant threading library called NPTL, which is a part of glibc. 

A source of confusion stems from how Linux uses a clever copy-on-write system which always creates a task, thread, and process together as 1:1:1 - leading some developers to erroneously believe that there's no sense in distinguishing them. However, the system call clone() provides flags to control which data is copied and which address space is used, meaning not all such creations are equivalent. This is what differentiates an ordinary call to fork() from a call to pthread_create(), both in terms of abstraction and performance.

Back in the day, some Linux distributions shipped with a library called GNU Pth which let you create multiple user-threads mapped to a single kernel-thread. This was referred to as N:1 threading and would require a user scheduler in addition to the kernel scheduler. There are many downsides to this approach. For example, you can get priority inversion, where the user-scheduler runs an important user-thread, and the kernel-scheduler preempts the associated kernel-thread in order to run something useless. You can also experience latency problems where a user-thread gets stuck doing IO and all the other user-threads are blocked. Lastly, given that multiple user-threads are bound to a single kernel-thread, such a program can only utilize one CPU.

Some kernel programmers tried to solve this by building an advanced ABI that would allow the kernel-scheduler and user-scheduler to communicate. In theory this would allow an arbitrary number of user-threads to be mapped onto an arbitrary number of kernel-threads. After many years of experimentation by the BSDs, Microsoft, IBM, HP, Sun, and Linux, they all decided that it was easiest to have one scheduler in the kernel and map kernel to user threads in a 1:1 fashion. It's not that M:N systems weren't performant in certain benchmarks, it's mostly that it was extremely hard to make the system reliable under all circumstances, which a general-purpose OS must be.

Programming language developers (and even application developers) have been tantalized by the promise of managing execution contexts without the help of the kernel, both for the potential performance and from an abstraction level. A lot of learnings from OS thread-development tend to be forgotten and later rediscovered (e.g. M:N experiments in Rust) but it's not always an all-or-nothing decision. For instance, some developers use thread-pools to solve the blocking-IO problem, albeit introducing another form of complexity and overhead. Linux provides io_uring for the same reason, but within the kernel of course. Some language developers target a use-case that is so narrow in scope that general performance is not important. For instance, Erlang implements extremely lightweight user-threads that are appropriate for data-parallel workloads with minimal shared state. However, this comes at the cost of reduced performance on task parallel workloads with complex load patterns.

Modern operating system kernels can sometimes refer to things a bit differently, but in practice they are all surprisingly similar. With all of that said, it's not important to argue about words, however it does pays to design your algorithms with a better understanding of the stack of technology it will be running on. Happy hacking!

* I suppose it depends if you think of memory as starting from 0x0000 or 0xFFFF :)

Thanks to your description I remember my university course in operative system from Lund university 1993. It was not yesterday😁

  • No alternative text description for this image

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics