From Documentation
Jump to: navigation, search

An Overview of Signals

A significant aspect to how an operating system manages multitude asynchronous processes simultaneously is the notion of signals. Signals are the means by which processes (including those associated with the operating system) indicate to one another when some event of interest occurs or requires attention. Signals and signal handling is a completely asynchronous process, which is to say that a signal may be delivered to a process arbitrarily, interrupting whatever the process was doing at the time, and must be handled immediately. It is also possible to ignore most signals in which case the arrival of a signal has no effect. Figure 1 provides a conceptual picture of this concept in which the operating system is sending signals to various processes, in addition to processes sending signals to one another (strictly speaking, all signals are generated by the operating system, however processes can initiate signals via system calls and it can be more convenient to think of the process sending the signal).

Figure 1: Conceptual picture of signals

Most people are actually already familiar with at least one aspect of signals, whether they realize it is a signal handling issue or not: interrupts. There is a signal called SIGINT, for which all processes have a default handler---provided by the system library---which causes the process to exit. If you have a process running in the foreground and press <control>+C, the OS sends a SIGINT to the process, and it terminates. It is possible to write and attach your own handler to the receipt of this signal and take any action desired, for example saving the state of the application before exiting (we will return to this particular example shortly).

The most pressing issue when a programmer begins to consider attaching their own handlers to process signals is ensuring the process can go back to what it was doing after being interrupted at an arbitrary moment to deal with the received signal. It is possible for a signal to interrupt a process that is blocked on a system call for example, and it is not possible for the OS to restart many of them (e.g. accept(2), read(2), write(2), ...). In such cases there is additional logic required in invoking the system call to make sure it restarts the system call on which it was blocked after processing a signal.

The focus of this tutorial is to provide a crude understanding of the function of signals and signal handling in order for the programmer to incorporate signal handling into their code. More specifically, we want to show how to set up the appropriate signal handlers to allow for a program to "checkpoint" itself and save whatever information is required to allow it to be restarted if the system needs to kill it (for example, if the system is going down for maintenance, or if your run-time exceeds the estimate you provided at the time of submission).

It is a relatively simple matter to incorporate an alarm timer into your program which, by using signals, can asynchronously interrupt execution and execute whatever code you like. If you take advantage of the fact that you can periodically interrupt your program, you can take steps to ensure you do not lose your progress (assuming the program is amenable to checkpointing and restarting; however, this is something most users should strive to incorporate into their designs so as to maximize the benefit of available run-time, even where the process has to be terminated prior to completing its computation).

The next section will focus on the basics of signal handling from a programming perspective, before we turn to considerations of setting up our own handlers in order to process signals, concluding with a specific example illustrating how to set up a signal handler in order to make use of the alarm system call for the purpose of checkpointing.

Signal Basics

We will begin by considering the kinds of signals available. POSIX.1 standard defines a set of signals which are found on the vast majority of systems (there are additional classes of signals, however they are typically more esoteric, and may not be present on all systems (depending just how POSIX compliant they are); we will not speak of non-POSIX.1 signals any further, however all signals are typically handled in the same way as is outlined in this document. Should you find a need to deal with those signals the basics in this tutorial will still apply.

All signals have a default handler which is automatically associated with the specific signal (even if that handler's action is to "ignore" the signal and do nothing). When your process receives one of these signals, the default handler is invoked unless you have specifically attached your own handler. A full list of signals available on a Linux platform can be obtained from section 7 of the manual (i.e. at a command prompt: man 7 signal). The following table presents a list of the POSIX.1 signals:

Signal Value Default Handler Note/cause
SIGHUP 1 terminate controlling terminal/process exits
SIGINT 2 terminate keyboard inturrupt (ctl+C)
SIGQUIT 3 core dump + terminate keyboard quit (ctl+/)
SIGILL 4 core dump + terminate illegal instruction executed
SIGABRT 6 core dump + terminate abort signal sent by abort(3) call
SIGFPE 8 core dump + terminate floating point exception
SIGKILL 9 terminate kill signal
SIGSEGV 11 core dump + terminate invalid memory reference
SIGPIPE 13 terminate broken pipe
SIGALRM 14 terminate timer signal sent by alarm(2) call
SIGTERM 15 terminate termination signal
SIGUSR1 10 terminate user-defined signal 1
SIGUSR2 12 terminate user-defined signal 2
SIGCHLD 17 ignore child process stopped or exited/terminated
SIGCONT 18 N/A continue, if stopped
SIGSTOP 19 stop stop (suspend execution of) process
SIGTSTP 20 stop stop process - typed at terminal
SIGTTIN 21 stop tty input for background process
SIGTTOU 22 stop tty output for background process

Note that signals have both a symbolic name as well as a number. The designations are largely interchangeable in use, although it is possible for some of of the signal numbers to differ on different architectures so it is preferable to use the symbolic name where possible.

The primary motivation for this tutorial is to permit the programmer to catch appropriate signals for the purpose of checkpointing their code. Before we delve into the catching and processing of signals, we should briefly consider how signals are sent. POSIX defines a standard function for sending signals in C:

C API
 #include <sys/types.h>
 #include <signal.h>
 
 int kill(pid_t pid, int sig);

This function sends the signal designated by sig to the process with id pid. Note that a process is only able to successfully send signals to processes running with the same user ID. The shell supports a command, kill, that you can run from the command line to invoke this function as well.

kill [-s sigspec | -n signnum | -sigspec] [pid1[, pid2, ...]]

sigspec is either a case-insensitive signal name (e.g. SIGKILL; it is possible to omit the SIG part of the name in most cases), or the number of the signal. signum can only be the signal number. If a signal isn't specified, the command sends a SIGTERM. If you have ever had to terminate a running process, you probably used this command to do it much like the example below:

Terminal kill.png

If a program won't terminate when kill is used to send the default SIGTERM, we sometimes need to step up to a full SIGKILL, which cannot be ignored and will forcibly shut down the process. Either of the following examples show the kill command being invoked to send a SIGKILL; both forms depicted are identical (recall that SIGKILL is signal number 9):

Terminal kill9.png

Note that on some SHARCNET systems (e.g. requin), you can use the --signal option to sqkill to send arbitrary signals to jobs running under control of the scheduler; sqkill simply has the same default as the built-in kill command when you do not provide this argument: SIGTERM.

Any signal can be sent to a process in this manner, however given the default handlers associated with these signals it isn't very interesting to look at. Let us turn now to consider the issue of setting up our own signal handlers so that we can start to consider more sophisticated behaviour with respect to signal processing. Suggested references are presented at the end of this tutorial and the reader is encouraged to consult them for additional information.

Signal Handlers

ANSI signal handling

Let's take a look at the function we need to use to set up our own handlers for specific signals:

C API
#include <signal.h>
 
typedef void (*sighandler_t)(int);
 
sighandler_t signal(int signum, sighandler_t handler);

signhandler_t is a pointer to a function that returns nothing and takes a single integer argument. The programmer must define a function matching this definition, and register it with the operating system as the handler for a specific signal. Once a function is registered as a handler for a given signal, program execution will be interrupted upon receipt of that signal and the function provided as the handler will be invoked with the signal number that triggered the invocation as the integer argument to the function (which is helpful if you attach the same function as the handler for multiple signals).

The signal function is used to perform the registration of the signal handler. signum is the signal for which we are attaching a handler, and handler is the pointer to the function that is to be invoked when that signal is received (the name of a function is a pointer to that function).

Referring to the table of signals in the previous section, you can see that there are default handlers attached to all signals when the program is started. When we invoke signal to attach our own handler, we are over-riding the default behaviour of the program in response to that signal. Specifically, if we attach a handler to SIGINT, the program will no longer terminate when you press <ctl>+C (or send the program a SIGINT by any other means); rather, the function specified as the handler will be invoked instead which will define the behaviour of the program in response to that signal. If you wish to restore the default behaviour of a program in response to a given signal, there is a pre-defined function SIG_DFL which, if provided as the handler argument, will reset the handler to the appropriate default. Should you wish to ignore a signal completely, another pre-defined function SIG_IGN can be specified as the handler. In the absence of redefining or resetting a signal handler, the handler you specify will remain in effect for the duration of program execution.

Setting up a signal handler

The concepts in the previous section are most clearly illustrated by an example. This code attaches a handler function to the SIGINT signal, which outputs a basic message to demonstrate the asynchronous transfer of control that occurs when a handled signal is received:

C CODE: catchsigdemo.c
 #include <stdio.h>
 #include <unistd.h>
 #include <signal.h>
 
 
 void handler_function(int);
 
 int main()
 {
     /*
      * attach handler for SIGINT
      */
     signal(SIGINT,handler_function);
 
     while(1)
     {
         printf("Program executing (1s delay between updates)...\n");
         sleep(1);
     }
 
     return(0);
 }
 
 
 void handler_function(int signum)
 {
     printf(">>> Caught signal: %d; executing handler <<<\n", signum);
 }

Compile and run the above program, and demonstrate for yourself how the signal handler is being executed anytime the process receives a SIGINT signal. In the example shown, the first SIGINT is sent via the kill built-in on the command-line, and <ctl>+C is then pressed a number of times while the shell in which the program is running has keyboard focus. Note that the result is the same in either case: execution of the program is interrupted, the handler function is invoked to process the signal, and normal flow of control resumes at the point where it was interrupted by the signal---recall the exception for interrupted system calls which are not restarted (see also below). The main routine in this example isn't doing much; however, the operation of the signal handler as described should be apparent.

Note also that we are no longer able to kill this program by pressing <ctl>+C as we have explicitly over-ridden the default behaviour in response to a received SIGINT. We can still terminate this process using SIGKILL, as illustrated; similarly, the handler could still explicitly terminate the program if that is the desired behaviour in response to the signal. Feel free to play with this example to empirically expand your understanding of signal handling; e.g. try attaching the same handler to more than one signal, define more handlers and attach them to a variety of signals, etc.---note that there are some signals that cannot have their default handler over-ridden (see odds and ends section below).

Catchsigdemo1.png
Catchsigdemo2.png

There is a very subtle issue illustrated in this example, although it may not be immediately apparent. The sleep call in this example is a system call. The program (thread) invoking sleep will be suspended for the number of seconds provided as an argument. While blocked on this call, the program will still be correctly interrupted when a signal is received; however, system calls will not typically be restarted after being interrupted by a signal---the system call will appear to simply fail, returning an appropriate error condition. You can see this behaviour clearly using the above example by increasing the number of seconds the program sleeps for between updates. If the program receives a signal while asleep, you will see the "Program executing..." message immediately after the signal is handled, regardless of how much time would be remaining in the sleep call otherwise.

The programmer is cautioned to pay attention to this issue if they are relying on the correct execution of a system call that could be interrupted by a signal. An incomplete system call that was interrupted by a signal will appear to return failure in the code in which it appears. In these situations it is important to check the value of the library-defined errno value. If the value of errno is EINTR, then the reason the system call failed was because it was interrupted. Typically we enclose such calls in a loop so that the system call is restarted if interrupted by a signal, and the error handled normally otherwise. This caveat only applies to interrupted system calls; statements in the user code will otherwise be restarted correctly if interrupted by a signal.

This issue is illustrated in the following code fragment demonstrating a server listening for incoming connections from network clients (the accept call used to process remote connections is a blocking system call that behaves exactly as described above).

C CODE
 ...
 while (1)
 {
     if (accept(ssock, &saddr, &saddrlen) == -1)
     {
         /*
          * this code is executed when accept(2) returns an error; need an
          * extern declaration to access the library-defined errno variable
          */
         extern int errno;
 
         if (errno == EINTR)
         {
             /* error code is EINTR; system call inturrupted: just restart it */
             continue;
         }
         else
         {
             /* actual error in system call: process error as appropriate */
             process_actual_error();
         }
     }
     else
     {
         /*
          * accept(2) returned successfully...carry on
          */
         ...
     }
 }
 
 ...

Odds and Ends

Signals are not only generated explicitly by the user
Most of the signals defined above are automatically utilized and generated by the system, so care should be taken if you are using them for your own purposes. In many cases you do wish to over-ride the default behaviour of a program in response to automatically delivered signals; in other cases you you are looking to leverage signal handling for your own purpose. The programmer must pay attention to what signals they are over-riding in the context of when these signals are being generated automatically (automatically generated signals will still appear normally, you are only over-riding what happens when that signal is received).

For example, a SIGCHLD signal is automatically sent to a process when there is a change of status in a child process, the default behaviour for which is to do nothing---the signal is ignored. While you can easily attach a handler for this signal, be aware that handler will be invoked any time a SIGCHLD is received, regardless of whether it was sent deliberately by design, or generated automatically by the OS in response to a child process exiting. Note that there are two user-defined signals provided, SIGUSR1 and SIGUSR2 which are guaranteed to be unused by the system, and thus are safe for arbitrary use by the programmer.

Uncatchable signals
There are two signals that cannot be caught by the user. These are SIGKILL and SIGSTOP. It is not possible to redefine the handlers for these signals.

Signals in parallel code
Care must be taken in parallel code to ensure that only the process(es) of interest catch and handle signals, lest you end up with mangled results from multiple processes attempting to do an action that should only be performed by one. Fortunately, due to the asynchronous nature of signals, this can be easily addressed by only attaching a handler in the process that is to process a given signal. It may be necessary to set the handler to SIG_IGN in the other processes if the signal you are using would cause an unwanted program termination.

Signal Handling for Checkpointing

Having put the basics of signal handling behind us, we can now consider how to leverage signals to help with checkpointing long-running jobs on SHARCNET systems. As was mentioned in the overview section, this is particularly useful in the face of regular maintenance, system problems that require a cluster or node to be taken offline, or runtime exceeding the provided estimate that results in the job being killed by the system.

Note that this tutorial does not address the act of checkpointing itself, as that is obviously specific to the application being run. Some software may already provide functionality that can be used to save state for later reloading---these are the easiest to modify as the handler can invoke a function as provided by the API of the program in question. Otherwise, this is an issue that must be considered in program design as the program will both need to save off all relevant state information allowing it to pick up where it left off, as well as be able to initialize itself from this saved data rather than starting from scratch.

This is considered a high priority issue in the interest of maximizing the efficiency and availability of SHARCNET clusters as there is a non-trivial cost associated with the wasted run-time of long-running jobs that do not checkpoint. If a job consumes several weeks worth of resources with no results (as it has to be restarted), that represents cycles denied to the user community in general which only serves to lengthen wait time in queues, and makes it hard to justify that SHARCNET users are making good use of the resources.

All users are strongly encouraged to develop checkpointing strategies for their code; this will result in faster results for the user who can restart a terminated job, and less wait time for users overall with fewer wasted cycles due to lack of the ability to checkpoint-and-restart long-running jobs.

System/Scheduler Generated Signals

An important caveat: the discussion in this section mostly applies to our clusters using LSF scheduler (requin). The newer clusters (orca, saw etc.) use a different scheduler (Maui/Moab) which doesn't handle user-initiated signals properly; a work-around for newer clusters is discussed at the end of this section.

The process by which you can expect a program to be terminated is as follows: the system sends a SIGTERM to the process; if the process in question does not exit within some window of time it is sent a SIGKILL which cannot be ignored or over-ridden. This allows us to incorporate a more robust approach to checkpointing our code by catching the SIGTERM signal and set up a handler to perform the necessary saving of state to enable a later restart prior to exiting (note that if your handler does not explicitly exit, the program will still be terminated by the subsequent SIGKILL).

The above termination sequence can be expected to be followed for a wide range of job-killing events, including but not necessarily limited to:

  • system shutdown
  • exceeding the run-time limit of the submission queue
  • exceeding the time estimate provided at job submission
  • use of sqkill to kill a running job

This sequence can not be expected to be followed when the system fails randomly. If a system hangs unexpectedly, or suffers a sudden power failure, a normal shutdown process will not necessarily occur and a job can still be lost.

If we assume that we encapsulate checkpointing behaviour in a function, we can broaden the checkpointing functionality of our code by attaching a handler that calls this function and exits to not just SIGTERM, but also SIGINT (or any other catchable signal that would normally terminate the program) so that we can ensure a checkpoint is generated for the widest possible range of process killing events. For our own convenience, we could attach another handler to SIGHUP that only generates the checkpoint---i.e. does not terminate program execution---so that we can generate a checkpoint on demand by simply sending the job a SIGHUP (it is safe to use this signal in this manner). Our scheduler LSF installed on some of our systems (requin) even has support for automatically triggering a checkpoint action in jobs running under its control which can be leveraged to automatically generate periodic checkpoints during the run of the job (assuming you have a handler set up to checkpoint on a SIGHUP, the checkpoint action specified to the scheduler would be the kill built-in command: kill -SIGHUP).

The following code illustrates an example of the above which could be used as a template for introducing signal-based checkpointing into a program. This code specifically does the following:

  • defines two handler functions, one which calls the checkpointing routine and terminates, and one that only calls for checkpointing
  • attaches the checkpoint-and-terminate hander to SIGTERM, SIGINT, SIGABRT and SIGALRM
  • attaches the checkpoint-only hander to SIGHUP

The code doesn't do anything productive, however it will allow you to play with the above concepts on SHARCNET systems in the abstract---real code would need to implement a meaningful save of state, rather than just the output of a string as the checkpoint routine does in this example. Try modifying this code to output the time as part of the "Executing..." message to see how the sleep time is cut short when interrupted by a signal (recall that an interrupted system call will not be restarted automatically).

C CODE: checkpointdemo.c
 #include <stdlib.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <time.h>
 #include <signal.h>
 
 #define CP_NAME    "checkpoint"
 #define MAXSTR     128
 #define ITERATIONS 100
 #define INTERVAL   5
 
 /*
  * function prototypes
  */
 void do_checkpoint(void);
 void checkpoint_term(int);
 void checkpoint_only(int);
 
 int main()
 {
     int i;        /* loop index */
 
     /*
      * attach signal handlers as described in the tutorial (note we are
      * attaching the same handler to multiple signals)
      */
     signal(SIGTERM,checkpoint_term);
     signal(SIGINT,checkpoint_term);
     signal(SIGABRT,checkpoint_term);
     signal(SIGALRM,checkpoint_term);
     signal(SIGHUP,checkpoint_only);
 
     /*
      * implementation of program would appear here...
      *
      * in the interest of simplicity, this simply outputs a message every
      * few seconds for a set period of time (for demonstration purposes)
      */
     for(i = 0; i < ITERATIONS; i++)
     {
         printf("Executing (%ds interval between reports)...\n", INTERVAL);
         sleep(INTERVAL);
     }
 
     return(0);
 }
 
 
 void do_checkpoint(void)
 {
     static int cp_num = 1;    /* sequential numbering of checkpoints */
     char filename[MAXSTR];    /* buffer for construction of filenames */
     time_t timeval;           /* current time */
     FILE *fp;                 /* reference to file for checkpoint ouptut */
 
     /*
      * construct a unique file name (numbered sequentially in this example)
      * for this checkpoint output; this allows for multiple checkpoints from
      * a single run that do not interfere with one another
      */
     snprintf(filename, MAXSTR, "%s_%03d", CP_NAME, cp_num);
     if ((fp = fopen(filename,"w")) == NULL)
     {
         fprintf(stderr,"[ERROR]: unable to open file (%s) for "
                 "checkpoint output\n", filename);
         return;
     }
     else
     {
         /*
          * output relevant checkpointing data here (this example
          * records the date/time of the checkpoint, and outputs
          * the sequential checkpoint number)
          */
         timeval = time(NULL);
         fprintf(fp,"- CHECKPOINT -\n");
         fprintf(fp,"time: %s",ctime(&timeval));
         fprintf(fp,"checkpoint number = %d\n",cp_num);
         fclose(fp);
         cp_num++;
     }
 }
 
 
 void checkpoint_term(int signum)
 {
     do_checkpoint();
     exit(1);
 }
 
 
 void checkpoint_only(int signum)
 {
     do_checkpoint();
 }

LSF (requin) only: The above program can be submitted to the test queue as a serial job (for immediate execution) in order to explore these concepts empirically on a live system. Recall that you can use sqkill to send arbitrary signals to a job under control of a SHARCNET scheduler. An example of the results of a sample run appears below illustrating the execution of the program and the end result of the signal-based checkpointing.

Checkpointdemo1.gif
Checkpointdemo2.gif

Non-LSF schedulers (orca, saw etc.) Unfortunately, the scheduler used on most of our systems (Maui/Moab) doesn't handle signals properly. Even though a special command to send a signal to a job does exist (qsig -s SIGNAL jobid), an attempt to use it on a job will terminate the job, regardless of the signal sent, and without any user-level signal processing. But there is a workaround: if one can obtain the node and PID of the process, then one can send any signal directly to the process (bypassing the scheduler) using ssh and kill commands. One can use the following bash script for this purpose:

Bash: sqsignal
#!/bin/bash
# Sending a signal to a user code directly (bypassing the scheduler)
 
if test $# -ne 2
 then
 echo "Syntax: sqsignal  jobid  SIGNAL"
 exit
 fi
 
jobid=$1
SIGNAL=$2
 
# Given $jobid, find out the node:
node=`sqjobs -l $jobid |grep nodes|awk '{print $2}'`
 
# PID for a root node process (not the user code):
pid0=`qstat -f $jobid |grep session_id|awk '{print $3}'`
 
# PID of the user code:
pid=`ssh $node pstree -p $pid0 |cut -d\( -f4|cut -d\) -f1`
 
# Sending the SIGNAL to the user code:
ssh $node kill -$SIGNAL $pid

The above procedure will work for serial and threaded jobs; mpi jobs might require some tinkering.

Signal Handling in Fortran

Designed solely for numerical computation, the Fortran standard, until the late 1990s, did not have intrinsic procedures or APIs defined for signal handling. Instead, exceptions, primarily of interest to floating point arithmetic such as division by zero, illegal arguments to mathematical functions, etc. are handled by runtime libraries. Upon the occurrence of such exceptions, errors are reported and the process is terminated, unless default actions are predefined, mostly at compile time.

While handling exceptions by runtime libraries greatly minimizes the programming efforts of Fortran programmers, there are situations that being able to deal with the exceptions by users before the programme quits abnormally appears to be valuable. For instance, in the event that a programme is being terminated by the system when the programme has exceeded specified runtime, one would like to gracefully shutdown the programme by saving the states of variables and intermediate results from the computation before the programme is killed. In order to do so, the programme must be able to first catch the signal and then take necessary actions accordingly.

Signal Handling as An Extension in Some Fortran Implementations

Some vendor supplied Fortran implementations, including for example digital, IBM, Sun and Intel, had the extension that allows the user to do signal handling in Fortran as in C. The interface for installing a signal handler appears the same

call signal(signum, handler)

where signum is the value of signal defined for the targeted architecture, as shown in the table below, and handler is a user defined procedure in the form of Fortran subroutine.

 1) SIGHUP    2) SIGINT     3) SIGQUIT   4) SIGILL
 5) SIGTRAP   6) SIGABRT    7) SIGBUS    8) SIGFPE
 9) SIGKILL  10) SIGUSR1   11) SIGSEGV  12) SIGUSR2
13) SIGPIPE  14) SIGALRM   15) SIGTERM  16) SIGSTKFLT
17) SIGCHLD  18) SIGCONT   19) SIGSTOP  20) SIGTSTP
21) SIGTTIN  22) SIGTTOU   23) SIGURG   24) SIGXCPU
25) SIGXFSZ  26) SIGVTALRM 27) SIGPROF  28) SIGWINCH
29) SIGIO    30) SIGPWR    31) SIGSYS   34) SIGRTMIN

Table 1: Symbolic names and values of common signals as returned from command kill -l for Linux i686.

The signal numbers are architecture dependent. The follow shows a different set of values defined on Linux Alpha.

 1) SIGHUP    2) SIGINT     3) SIGQUIT   4) SIGILL
 5) SIGTRAP   6) SIGABRT    7) SIGEMT    8) SIGFPE
 9) SIGKILL  10) SIGBUS    11) SIGSEGV  12) SIGSYS
13) SIGPIPE  14) SIGALRM   15) SIGTERM  16) SIGURG
17) SIGSTOP  18) SIGTSTP   19) SIGCONT  20) SIGCHLD
21) SIGTTIN  22) SIGTTOU   23) SIGIO    24) SIGXCPU
25) SIGXFSZ  26) SIGVTALRM 27) SIGPROF  28) SIGWINCH
29) SIGINFO  30) SIGUSR1   31) SIGUSR2  32) SIGRTMIN

Table 2: Symbolic names and values of common signals as returned from command kill -l on Linux Alpha.

In order for the code to be portable, one should use the symbolic names instead of values.

The interface signal() requires that parameters that are passed to the handler, if any, be visible to the handler as global, except for signum, which is the only argument that can be passed the the handler as a dummy argument.

GNU Fortran has defined intrinsic signal() as an extension

call signal(signum, handler[, status])

where the third argument, which is optional featured by the support of polymorphism in the Fortran 90 standard, stores the return value of the call to system function signal(2).

The behaviour of the Fortran extension to signal() is implementation dependent. In the following example,

program fsignal_test
   ... ...
   external warning_sigint ! Must declare as external
   call signal(SIGINT, warning_sigint)
   call sleep(30)
end program

subroutine warning_sigint
   print *, ’Process interrupted (SIGINT), exiting...’
   return
end subroutine warning_sigint

the programme is compiled using Intel Fortran compiler 10.0. When the programme is interrupted from the command line by Ctrl+C key stroke, the handler warning_sigint is not invoked. Instead, one will see the following

forrtl: error (69): process interrupted (SIGINT)
Image            PC   Routine      Line    Source
a.out      0808EBAF   Unknown   Unknown   Unknown
a.out      0808E1CF   Unknown   Unknown   Unknown
a.out      0806B66A   Unknown   Unknown   Unknown
a.out      0805DAB8   Unknown   Unknown   Unknown
a.out      0804A3FD   Unknown   Unknown   Unknown
.          002FD420   Unknown   Unknown   Unknown
a.out      08049D2C   Unknown   Unknown   Unknown
libc.so.6  00469F70   Unknown   Unknown   Unknown
a.out      08049AC1   Unknown   Unknown   Unknown

The signal SIGINT is intercepted by the Fortran runtime library. In order to catch the signal SIGINT and take actions defined in the handler warning_sigint, one needs to add the following C code

void sigclear_(int *signum)
{
   signal(*signum, NULL);
}

and call it in the Fortran code before the installation of signal handler

program fsignal_test
   ... ...
   external warning_sigint ! Must declare as external
   call sigclear_(SIGINT)
   call signal(SIGINT, warning_sigint)
   call sleep(30)
end program

A set of exception handling intrinsic functions have been introduced to Fortran 95 and later standards. References and discussions on Fortran exception and signal handling can be found, for instance, in John Reid’s historical notes and his book, as well as and the documentations of the latest Fortran standards concerning exception handling of IEEE floating point arithmetic.

A User Level Approach

The support for signal handling in Fortran can be achieved at user level as well with minimum efforts involved. Assume we want to have the same interface as supported as an extension in some Fortran flavours, i.e.

call signal(SIGTERM, action sigterm)

We need to write a C routine that makes a call to system function signal(). The C code, stored in a separate file csigfun.c will look as simple as the following:

/* in "csigfun.c" */
#include <signal.h>

typedef void (*sighandler_t)(int);
void signal_( int* signum, sighandler_t handler)
{
    signal(*signum, handler);
}

The follwing Fortran code shows a simple example of calling the C function signal() to install signal handlers

! in "fsignal_test.f90"
program fsignal_test
   ... ...
   external warning_sigterm ! Must declare as external
   external warning_sigint ! Must declare as external

   ! Install signal handlers, return immediately
   call signal(SIGTERM, warning_sigterm)
   call signal(SIGINT, warning_sigint)

   ! Do something that will take some time
   call sleep(30)
end program

subroutine warning_sigterm
   print *, ’Process interrupted (SIGTERM), exiting...’
   return
end subroutine warning_sigterm

subroutine warning_sigint
   print *, ’Process interrupted (SIGINT), exiting...’
   return
end subroutine warning_sigint

Each call to the C routine signal() installs a signal handler, defined as a subroutine in the same file, for the specified signum (SIGTERM and SIGINT). The "installed" subroutine will be called once the specified signal is received. Note that the call to signal() is non-blocking. That is, it returns immediately, thus the programme continues to execute the subsequent instructions. We show in the following how to compile and run the test programme that contains parts written in mixed Fortran and C languages. Without loss of generality, we assume the name for the C compiler is cc and the name for the Fortran compiler is fort We compile the C code first to obtain an object file using command

cc -c csigfun.c

This will create a object file named csigfun.o. We then compile the Fortran code,

fort -c fsignal_test.f90

and finally and link with the C object

fort fsignal_test.f90 csigfun.o -o fsignal_test

to generate the executable fsignal_test.

Start the executable from command line

./fsignal_test

Note that the execution of the call to sleep(30) will put the programme in sleep mode for about 30 second, which gives us enough time to open another terminal to find the process ID and issue an signal from command line.

The following show a screen capture of the execution

[bge@mobile-hpc]$ ps -ef | grep a.out
bge 4314 4282 0 20:47 pts/1 00:00:00 fsignal_test
bge 4316 3094 0 20:47 pts/0 00:00:00 grep fsignal_test
[bge@mobile-hpc]
[bge@mobile-hpc]$ kill -s INT 4314

As soon as the interrupt signal INT is issued from the command line, in the terminal from which we started the programme, the signal handler warning_sigint() is invoked and a message from the subroutine warning_sigint() is printed and the programme then exits.

[bge@mobile-hpc]$
Process interrupted (SIGINT), exiting...

Two things here need special attention. First there is a name convention universally accepted today in C/Fortran mixed language programming. A C routine called from within Fortran programmes needs to have a trailing underscore in its name, as shown in the above example. If a C routine’s name already contains an underscore or multiple underscores, adding one or two trailing underscore(s) is compiler dependent. GNU by default assumes two trailing underscores, while some other compilers such as Intel’s assume a single underscore. Nevertheless most compilers have the option to allow users to specify whether to use a single or double underscore(s). Second in C function calls, arguments are passed by values, while in Fortran, arguments are passed by references. To call a C function from within a Fortran code, the arguments of the C function should be passed as pointers.

Comments and Further Reading

This tutorial has presented a overview of signals and the signal handling facilities specified by the ANSI standard. While the basics of signal handling are well presented here, there are a number of issues that have been touched upon at best, but may be relevant as you find more sophisticated needs for signal handling (real-time signals, POSIX sigaction-style signal handling. system-specific signals, programs sending signals to other programs, etc.).

The following are recommended books and online references for those interested in more detail on the concepts we've discussed in this tutorial, and to continue learning about the more advanced features available to you through signal handling facilities.

  • W. Richard Stevens and Stephen A. Rago. Advanced Programming in the UNIX Environment (2e). Addison Wesley, 2005.
    • Comprehensive UNIX system programming reference with in-depth coverage and examples for a wealth of concepts, including signal handling.
  • Dave Curry. Using C on the UNIX System. O'Reilly, 1989.
    • Excellent, compact coverage of intermediate UNIX system programming, including signals (out of print, and some code examples in the client/server section have misleading errors, but its still great coverage of the relevant material if you can find it)
  • Michael Metcalf and John K. Reid. Fortran 90/95 Explained. Oxford University Press, 2 edition, 1999.
  • John Reid. Exception handling in Fortran. ACM Fortran Forum, 14, 9-15, 1995.
  • GNU library signal handling documentation
  • Wikipedia entry for signals, including additional references