In order to virtualize the CPU, the operating system needs to somehow share the physical CPU
among many jobs running seemingly at the same time. The basic idea is simple: run one process
for a little while, then run another one, and so forth. By time sharing the CPU in this manner,
virtualization is achieved. There are a few challenges, however, in building such virtualization
machinery. The first is performance: how can we implement virtualization withour adding excessive
overhead to the system? The second is control: how can we run processes efficiently while retaining
control over the CPU? Control is particularly important to the OS, as it is in charge of resources;
without control, a process could simply run forever and take over the machine, or access information
that it should not be allowed to access. Attaining performance while maintaining control is thus one
of the central challenges in building an operating system.
Basic Technique: Limited Direct Execution
To make a program runs as fast as one might expect, not surprisingly OS developers came up with
a technique, which we called Limited Direct Execution. The "direct execution" part of the idea is simple:
just run the program directly on the CPU. Thus, when the OS wished to start a program, it creates a process
entry for it in a process list, allocates some memory for it, loads the program code into memory (from
disk), locates its entry point (i.e. the main() routine or something similar), jumps to it, and starts running
the user's code. Sounds simple, no? But this approach gives rise to a few problems in our quest to
virtualize the CPU. The first is simple: if we just run a program, how can the OS make sure the program
does not do anything that we do not want it to do, while still running it efficiently? The second: when
we are running a process, how does the operating system stop it from running and switch to another
process; thus implementing the time sharing we require to virtualize the CPU? In answering these questions
below, we will get a much better sense of what is needed to virtualize the CPU. In developing these
techniques, we will also see where the "limited" part of the name arises from; without limits on running
program the OS would not be in control of anything and thus would be "just a library" --- a very sad state
of affairs for an aspiring operating system.
Restricted Operations
Direct execution has the obvious advantage of being fast, the program runs natively on the hardware CPU
and thus executes as quickly as one would expects. But running on the CPU introducing a problem: what if
the process wishes to perform some kind of restricted operation, such as issuing an I/O request to a disk,
or gaining access to more resources such as CPU or memory?
Tip: Use Protected Control Transfer
The hardware assists the OS by providing different modes of execution. In user mode, applications do not
have full access to hardware resources. In kernel mode, the OS has access to the full resources of the
machine. Special instructions to trap into the kernel and return-from-trap back to user mode programs
are also provided, as well instructions that allow the OS to tell the hardware where the trap table in the
memory.
One approach would simply be to let any process do whatever it wants in terms of I/O and other related
operations. However, doing so would prevent the construction of many kinds of systems that are desired.
For example, if we wish to build a file system that checks permissions before granting access to a file, we
can not simply let user process issue I/O to the disk; If we did, a process could simply read or write the
entire disk and thus all protections would be lost.
Thus, the approach we take to is introduce a new processor
mode, known as user mode; code that runs in user mode is restricted in what it can do. For example,
when running in user mode, a process can not issue I/O request; doing so would result in the processor
raising an exception; the OS would then likely kill the process.
In contrast to user mode is kernel mode, which the operating system (or kernel) runs in. In this mode,
code that runs can do what it likes, including privileged operations such as issuing I/O requests and
executing all types of restricted instructions.
We are still left with a challenge, however, what should a user process do when it wishes to perform some
kind of privileged operation, such as reading from disk? To enable this, virtually all modern hardware
provides the ability for user programs to perfrom a system call. Pioneered on ancient machines such as the
Atlas, system calls allow the kernel to carefully expose certain key pieces of functionality to user programs,
such as accessing the file system, creating and destroying processes, communicating with other processes,
and allocating more memory. Most operating systems provide a few hundred of calls; early Unix systems
exposed a more concise subset of around twenty calls.
To execute a system call, a program must execute a special trap instruction. This instruction simultaneously
jumps into the kernel and raises the privilege level to kernel mode; once in the kernel, the system can now
perfrom whatever privileged operations are needed (if allowed), and thus do the required work for the calling
process. When finished, the OS calls a special return-from-trap instruction, which, as you might expect, returns
into the calling user program while simultaneously reducing the privilege level back to user mode.
The hardware needs to be a bit careful when executing a trap, in that it must make sure to save enough of the
caller's registers in order to be able to return correctly when the OS issues the return-from-trap instruction.
On x86, for example, the processor will push the program counter, flags, and a few other registers onto a
per-process kernel stack; the return-from-trap will pop these values off the stack and resume execution of
the user-mode program. Other hardware systems use different conventions, but the basic concept are similar
across platforms.
There is one important detail left out of this discussion: how does the trap konw which code to run inside the
OS? Clearly, the calling process can't specify an address to jump to (as you would when making a procedure
call); doing so would allow programs to jump anywhere into the kernel which clearly is a bad idea (imagine
jumping into code to access a file, but just after a permission check; in fact, it is likely such an ability would
enable a wily programmer to get the kernel to run arbitrary code sequence). Thus the kernel must carefully
control what code executes upon a trap.
The kernel does so by setting up a trap table at boot time. When the machine boots up, it does so in privileged
kernel code, and thus is free to configure machine hardware as need be. One of the first things the OS thus
does is to tell the hardware what code to run when certain exceptional events occur. For example, what code
should run when a hard-disk interrupt takes place, when a keyboard interrupt occurs, or when program makes
a system call? The OS informs the hardware of the locations of these trap handlers, usually with some kind of
special instruction. Once the hardware is informed, it remembers the location of these handlers until the machine
is next rebooted, and thus the hardware knows what to do (what code to jump to) when system calls and other
exceptional events take place.
One last aside: being able to execute the instruction to tell the hardware where the trap tables are is a very
powerful capability. Thus, as you might have guessed, it is also a privileged operation. If you try to execute this
instruction in user mode, the hardware won't let you, and you can probably guess what will happen.
There are two phases in the LDE protocal. In the first (at boot time), the kernel initializes the trap table, and
the CPU remembers its location for subsequent use. The kernel does so via a privileged instruction. In the second (
when running a process), the kernel sets up a few things (e.g., allocating a node on the process list, allocating
memory) before using a return-from-trap instruction to start the execution of the process; this switches the cpu
to user mode and begins running the process. When the process wishes to issue a system call, it traps back into
the OS, which handles it and once again returns control via a return-from-trap to the process. The process then
completes its work, and returns from main(); this usually will return to the stub code which will properly exit the
program (say, by calling the exit() system call, which traps into the OS). At this point, the OS cleans up and we
are done.