Source Code 1992 March

home *** CD-ROM | disk | FTP | other *** search

/ Source Code 1992 March / Source_Code_CD-ROM_Walnut_Creek_March_1992.iso / usenet / altsrcs / 1 / 1938 / README < prev next >

Wrap

Text File | 1990-12-28 | 6.4 KB | 139 lines

pmckpt 0.95, 10/10/90. Placed into the public domain by Daniel J. Bernstein. Comments to him at brnstnd@nyu.edu. Please let him know if pmckpt works on your machine, and what applications you might have found for it. This is a beta release of pmckpt, the poor man's checkpointer. The idea of a checkpointer is to save the state of a running process in a file, so that you can restore the process later where it left off. pmckpt is a cooperative checkpointer: you have to add checkpointing to your program explicitly. You can't just checkpoint a random executable without the source code. Checkpointing has many uses. One is to have processes survive a crash. You just checkpoint the process periodically, and restart it if the system crashes. Another is to avoid long, complex initializations on startup; if you checkpoint after initialization, you won't have to waste the time again. Another is to transfer running programs between computers with the same architecture. pmckpt is (should be, at least) much more portable than other available checkpointing systems, including undump and Condor. It just saves the data, stack, and heap (allocated memory) to a file. It doesn't try to read the infinite variety of core file formats. Most importantly, it doesn't use setjmp() or longjmp(), so your compiler can put variables and intermediate values into registers without any risk of destroying the values of those variables (as longjmp() usually does). It handles stacks in either direction. It also restores file positions and signal handlers. In any routine where you want to allow checkpointing: Put a CKPT VARS flush left on a line by itself at the top, right after the {. Put a CKPT TOPS flush left on a line by itself after all the variables. (Actually, you can have statements above CKPT TOPS; see below.) Put a CKPT POINT x y flush left on a line by itself anywhere that you want to allow a checkpoint. x is declared as a variable, and y is declared as a label, so be careful to use unique names. Put a CKPT BOTS flush left on a line by itself right before the final }. You must #include "pmckpt.h" at the top of the file. This may seem like a lot of work for a checkpointer, but remember that the most advanced control structure used by pmckpt is goto. A setjmp()-based system may be easier to use, but it'll also lose variable values when you least expect it. IMPORTANT RULE: You *MUST* have a CKPT POINT immediately before calling any subroutine that's checkpointed. (See the main() call of sub1(), line 39 of test.c.) There must be absolutely no side effects (and maybe no computation at all, depending on your compiler) between the CKPT POINT line and the call. You may have to rewrite the function call to achieve this. You schedule a checkpoint by calling ckpt_schedule(). At the next CKPT POINT, your program will save some crucial information, followed by its text, data, and stack, to the CHECKPOINT file. To set this filename to the value of the CKPTFN environment variable instead, call ckpt_init(). (This also sets a temporary file name, default CHECKPOINT.TEMP, to the value of CKPTFNTEMP. The temporary file is used to ensure atomic checkpointing.) test.c shows how you can schedule a checkpoint on any interrupt. In practice you probably want to checkpoint at regular intervals, with whatever your system uses for a timer. To run your program starting from the checkpoint, run % checkpoint prog CHECKPOINT where prog is the program name and CHECKPOINT is the checkpoint file name. If you're lucky, everything will work. You can have statements after the variables and before CKPT TOPS. These statements form the ``preamble.'' They're executed every time the function is entered, whether in normal execution or as part of a restore. You should be very careful with statements in a preamble, as you will lose any variable values set in a preamble during a restore. One safe use of the preamble is at the top of main(), to open files (perhaps passed as arguments to the program) in a fixed order. In fact, if you don't do this, files opened within the program won't be reopened on a restore. To compile a pmckpt program, such as test.c, run the following: % ckptcvt < test.c > tmp.c % cc -c tmp.c % cc -o tmp pmckpt.o tmp.o pmceot.o tmp can be any name. Make sure that pmckpt.o comes before all other .o's loaded (except maybe crt0---though this probably leads to bugs), and pmceot.o comes after all .o's loaded. You shouldn't have to worry about dynamic loading on Suns, or other weird schemes; pmckpt is pretty portable, for a checkpointer. Finally, to complete our outside-in tour of pmckpt, you have to compile the pmckpt library itself before using it in programs as above. To do this, edit the options in Makefile and type ``make''. To test pmckpt, compile test.c into tmp by the above instructions. Run tmp. You should see 13 lines of output. Run it again; a CKPT POINT comes after each output, so if you type ^C, the program will save state after the next output in CHECKPOINT. Try typing ^C at any moment. If you run % checkpoint tmp CHECKPOINT, the program should restart from where it left off. ^C works after a restore too, so a single program can checkpoint and restart any number of times. For a more sophisticated test, try redirecting the output of tmp to a file. tail -f file & to see what's happening. Type ^C after a few seconds; after a few seconds more, to simulate a system crash, interrupt the process with ^\ (or whatever your interrupt key is). tmp will dump core. Kill the tail, and restore tmp from CHECKPOINT, redirecting output to the same file with >>. When it finishes, look at the file. tmp should have moved back to the location of the first checkpoint, so that output between the checkpoint and the ``crash'' won't have been written twice. In other words, you shouldn't be able to tell from looking at the file that tmp had crashed at all. Try running the last test again, but don't kill the tail. What should happen is that what shows up on your tty is the correct output in order, and what ends up in foo is the correct output in order---even though some output is written between the last checkpoint and the crash! The reason that tail doesn't write anything twice to your tty is that it doesn't go backwards in the file when pmckpt does. Internally, pmckpt wends its way down through the saved process stack to get to where it was before. Run a post-ckptcvt program through cpp if you want to see what's going on.