An Overview of Signals
A significant apsect to how an operating system manages a multitude asynchronous processes simultaneously is the notion of signals. Signals are the means by which processes (including those associated with the operating system) indicate to one another when some event of interest occurs or requires attention. Signals and signal handling is a completely asynchronous process, which is to say that a signal may be delivered to a process arbitrarily, inturrupting whatever the process was doing at the time, and must be handled immediately. It is also possible to ignore most signals in which case the arrival of a signal has no effect. Figure 1 provides a conceptual picture of this concept in which the operating system is sending signals to various processes, in addition to processes sending signals to one another (strictly speaking, all signals are generated by the operating system, however processes can initiate signals via system calls and it can be more convenient to think of the process sending the signal).
Most people are actually already familiar with at least one aspect of signals, whether they realize it is a signal handling issue or not: inturrupts. There is a signal called SIGINT, for which all processes have a default handler---provided by the system library---which causes the process to exit. If you have a process running in the forground and press <control>+C, the OS sends a SIGINT to the process, and it terminates. It is possible to write and attach your own handler to the receipt of this signal and take any action desired, for example saving the state of the application before exiting (we will return to this particular example shortly).
The most pressing issue when a programmer begins to consider attaching their own handlers to process signals is ensuring the process can go back to what it was doing after being inturrupted at an arbitrary moment to deal with the received signal. It is possible for a signal to inturrupt a process that is blocked on a system call for example, and it is not possible for the OS to restart many of them (e.g. accept(2), read(2), write(2), ...). In such cases there is additional logic required in invoking the system call to make sure it restarts the system call on which it was blocked after processing a signal.
The focus of this tutorial is to provide a crude understanding of the function of signals and signal handling in order for the programmer to incorporate signal handling into their code. More specifically, we want to show how to set up the appropriate signal handlers to allow for a program to "checkpoint" itself and save whatever information is required to allow it to be restarted if the system needs to kill it (for example, if the system is going down for maintenance, or if your run-time exceeds the estimate you provided at the time of submission).
Our schedulers are fairly well behaved with respect to how they terminate a process; specific signals are sent to your process before it is forcibly killed. If you can take advantage of the fact that you will receive these signals prior to termination, you can take steps to ensure you do not lose your progress (assuming the program is amenable to checkpointing and restarting; however, this is something most users should strive to incorporate into their designs so as to maximize the benefit of available run-time, even where the process has to be terminated prior to completing its computation).
The next section will focus on the basics of signal handling from a programming perspective, before we turn to considerations of setting up our own handlers in order to process signals, concluding with a specific example illustrating how to set up a signal handler to catch the signals sent by our scheduler before a process is terminated for the purpose of checkpointing.