home *** CD-ROM | disk | FTP | other *** search
- pmckpt 0.95, 10/10/90.
- Placed into the public domain by Daniel J. Bernstein.
- Comments to him at brnstnd@nyu.edu. Please let him know if pmckpt works
- on your machine, and what applications you might have found for it.
-
-
- This is a beta release of pmckpt, the poor man's checkpointer.
-
- The idea of a checkpointer is to save the state of a running process in
- a file, so that you can restore the process later where it left off.
- pmckpt is a cooperative checkpointer: you have to add checkpointing to
- your program explicitly. You can't just checkpoint a random executable
- without the source code.
-
- Checkpointing has many uses. One is to have processes survive a crash.
- You just checkpoint the process periodically, and restart it if the
- system crashes. Another is to avoid long, complex initializations on
- startup; if you checkpoint after initialization, you won't have to waste
- the time again. Another is to transfer running programs between
- computers with the same architecture.
-
- pmckpt is (should be, at least) much more portable than other available
- checkpointing systems, including undump and Condor. It just saves the
- data, stack, and heap (allocated memory) to a file. It doesn't try to
- read the infinite variety of core file formats. Most importantly, it
- doesn't use setjmp() or longjmp(), so your compiler can put variables
- and intermediate values into registers without any risk of destroying
- the values of those variables (as longjmp() usually does). It handles
- stacks in either direction. It also restores file positions and signal
- handlers.
-
-
- In any routine where you want to allow checkpointing:
-
- Put a CKPT VARS flush left on a line by itself at the top, right after
- the {.
-
- Put a CKPT TOPS flush left on a line by itself after all the variables.
- (Actually, you can have statements above CKPT TOPS; see below.)
-
- Put a CKPT POINT x y flush left on a line by itself anywhere that you
- want to allow a checkpoint. x is declared as a variable, and y is
- declared as a label, so be careful to use unique names.
-
- Put a CKPT BOTS flush left on a line by itself right before the final }.
-
- You must #include "pmckpt.h" at the top of the file. This may seem like
- a lot of work for a checkpointer, but remember that the most advanced
- control structure used by pmckpt is goto. A setjmp()-based system may be
- easier to use, but it'll also lose variable values when you least expect
- it.
-
-
- IMPORTANT RULE:
-
- You *MUST* have a CKPT POINT immediately before calling any subroutine
- that's checkpointed. (See the main() call of sub1(), line 39 of test.c.)
- There must be absolutely no side effects (and maybe no computation at
- all, depending on your compiler) between the CKPT POINT line and the
- call. You may have to rewrite the function call to achieve this.
-
-
- You schedule a checkpoint by calling ckpt_schedule(). At the next CKPT
- POINT, your program will save some crucial information, followed by its
- text, data, and stack, to the CHECKPOINT file. To set this filename to
- the value of the CKPTFN environment variable instead, call ckpt_init().
- (This also sets a temporary file name, default CHECKPOINT.TEMP, to the
- value of CKPTFNTEMP. The temporary file is used to ensure atomic
- checkpointing.) test.c shows how you can schedule a checkpoint on any
- interrupt. In practice you probably want to checkpoint at regular
- intervals, with whatever your system uses for a timer.
-
-
- To run your program starting from the checkpoint, run
-
- % checkpoint prog CHECKPOINT
-
- where prog is the program name and CHECKPOINT is the checkpoint file
- name. If you're lucky, everything will work.
-
- You can have statements after the variables and before CKPT TOPS. These
- statements form the ``preamble.'' They're executed every time the
- function is entered, whether in normal execution or as part of a
- restore. You should be very careful with statements in a preamble, as
- you will lose any variable values set in a preamble during a restore.
- One safe use of the preamble is at the top of main(), to open files
- (perhaps passed as arguments to the program) in a fixed order. In fact,
- if you don't do this, files opened within the program won't be reopened
- on a restore.
-
-
- To compile a pmckpt program, such as test.c, run the following:
-
- % ckptcvt < test.c > tmp.c
- % cc -c tmp.c
- % cc -o tmp pmckpt.o tmp.o pmceot.o
-
- tmp can be any name. Make sure that pmckpt.o comes before all other .o's
- loaded (except maybe crt0---though this probably leads to bugs), and
- pmceot.o comes after all .o's loaded. You shouldn't have to worry about
- dynamic loading on Suns, or other weird schemes; pmckpt is pretty
- portable, for a checkpointer.
-
- Finally, to complete our outside-in tour of pmckpt, you have to compile
- the pmckpt library itself before using it in programs as above. To do
- this, edit the options in Makefile and type ``make''.
-
-
- To test pmckpt, compile test.c into tmp by the above instructions. Run
- tmp. You should see 13 lines of output. Run it again; a CKPT POINT comes
- after each output, so if you type ^C, the program will save state after
- the next output in CHECKPOINT. Try typing ^C at any moment. If you run
- % checkpoint tmp CHECKPOINT, the program should restart from where it
- left off. ^C works after a restore too, so a single program can
- checkpoint and restart any number of times.
-
- For a more sophisticated test, try redirecting the output of tmp to a
- file. tail -f file & to see what's happening. Type ^C after a few
- seconds; after a few seconds more, to simulate a system crash, interrupt
- the process with ^\ (or whatever your interrupt key is). tmp will dump
- core. Kill the tail, and restore tmp from CHECKPOINT, redirecting output
- to the same file with >>. When it finishes, look at the file. tmp should
- have moved back to the location of the first checkpoint, so that output
- between the checkpoint and the ``crash'' won't have been written twice.
- In other words, you shouldn't be able to tell from looking at the file
- that tmp had crashed at all.
-
- Try running the last test again, but don't kill the tail. What should
- happen is that what shows up on your tty is the correct output in order,
- and what ends up in foo is the correct output in order---even though
- some output is written between the last checkpoint and the crash! The
- reason that tail doesn't write anything twice to your tty is that it
- doesn't go backwards in the file when pmckpt does.
-
-
- Internally, pmckpt wends its way down through the saved process stack to
- get to where it was before. Run a post-ckptcvt program through cpp if
- you want to see what's going on.
-