NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / lang / lisp / 2924 < prev next >

Wrap

Text File | 1992-11-20 | 11.9 KB | 233 lines

Xref: sparky comp.lang.lisp:2924 comp.ai:4354 Newsgroups: comp.lang.lisp,comp.ai Path: sparky!uunet!cs.utexas.edu!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!aplcen.apl.jhu.edu!aplcenmp!hall From: hall@aplcenmp.apl.jhu.edu (Marty Hall) Subject: Automatic Memoization Facility Available Message-ID: <By0tIx.Dst@aplcenmp.apl.jhu.edu> Organization: AAI Corp AI Lab, JHU P/T CS Faculty Date: Fri, 20 Nov 1992 15:25:44 GMT Lines: 222 This message is to announce the availability of a facility for automatic memoization in Common LISP. Automatic memoization is a technique by which an existing function can be transformed into one that "remembers" previous arguments and their associated results, yielding large performance gains for certain types of applications. This code, in beta version, was inspired by the section on automatic memoization in Peter Norvig's _Paradigms of AI Programming_ (which in turn was inspired by Abelson and Sussman's _Structure and Interpretation of Computer Programs_). However, the idea of this facility is to extend those ideas into what is needed for a practical facility for use in a large program. As such, it adds many facilities for bookkeeping, timing, evaluating the timing advantages memoization provides, saving hash tables to disk for automatic reuse in a later session, etc. This utility has been used over the last year on the DARPA Signature Management System, resulting in performance improvements of 10-20 fold in several key modules. These utilities, along with an overview of memoization and its applications, are described below. The code itself is available via anonymous FTP from archive.cs.umbc.edu, in /pub/Memoization. If you are interested but do not have FTP access, I can send the code via email. I am quite interested in suggestions, comments, corrections, and experiences of anyone who tries out the package. Marty Hall Artificial Intelligence Lab AAI Corporation PO Box 126 Hunt Valley, MD 21030 hall@aplcen.apl.jhu.edu (410) 683-6455 =================== Lengthy Description Follows ============================= WHAT IS AUTOMATIC MEMOIZATION? ============================== The idea of memoization is that it allows a function to "remember" previous invocations, returning the previously calculated values (rather than recalculating) if it is called with exactly the same arguments as in a previous execution. *Automatic* memoization means that an existing function can be transformed into a memoized one without requiring any changes in the code for the function itself. This can result in tremendous speedups if calculations are repeated at various points in a program's execution, yet while remaining somewhat transparent to the users of the code. APPLICATIONS ============ (A) There are four basic applications of memoization. The first, which is illustrated below, is when a single routine calls some subroutine (or itself recursively) more than is needed, resulting in extra calculations. By memoizing, these results are returned immediately for subsequent calls, with the effect of dynamic programming. In fact, this first case can be thought of as a tool for automatic dynamic programming, but without the need to build the subpieces in the correct order. This can frequently reduce the time of exponential algorithms to polynomial or even linear time. Given enough thought, this can be solved without an automatic memoization facility by either building up the subpieces in the proper order or maintaining a special purpose local data structure to retain the results (ie "manual" memoization). The advantage of doing it automatically is that less debugging and testing is required if the simple algorithm has been already tested, the versions can be changed back and forth interactively at run time, it is more transparent, and most importantly it is simple and easy to use. To illustrate this first case, consider a naive implementation of a function to calculate the Nth Fibonacci number: (defun Fib (N) (if (<= N 1) 1 (+ (Fib (- N 1)) (Fib (- N 2)))) ) Once (Fib (- N 1)) is calculated, there should be no need to repeat the calculation of (Fib (- N 2)), since it has already been performed as part of the calculation of (Fib (- N 1)). These wasted calculations result in exponential time to calculate (Fib N), growing as the golden ratio to the Nth power. On almost any machine, N of 40 or 50 is the maximum tractable calculation. Calling (Memoize 'Fib) then calling Fib results in linear time, so that N in the hundreds still only requires fractions of a second on most machines. Later calculations require only constant time if they calculate (Fib M) for an M less than or equal to a previously calculated value. Of course, one doesn't need memoization to get an efficient calculation of Fibonacci numbers. Simple tail-recursive or iterative implementations will give the same performance, and there is even a closed form. But there are many real-life problems where the more efficient algorithm is not so obvious, and it is far simpler to memoize the obvious algorithm than to determine a better algorithm. For instance, in the Memoization-Examples file included in the distribution, a slightly less obvious illustration is given of an algorithm to calculated divided differences. As further illustrations, Peter Norvig showed that a memoized version of a simple recursive descent parser yields the same performance as chart parsing or Earley's algorithm for parsing context free languages. Paul McNamee has shown that memoizing the simple brute-force approaches to the 0/1 knapsack problem or the matrix chain problem gives the same performance as the complex dynamic programming implementations. Other similar examples abound. (B) The second case is for invocations of a function that are repeated over time, but from scattered places in the program, or even when revoked repeatedly by a user in an interactive program. This generally yields a speedup by a constant factor, but that factor may be large. Without an automatic memoization facility, the only alternative is maintaining a special purpose global data structure, requiring testing and debugging, and much extra effort for something that at best is equally efficient as memoization. (C) The third case is when a function is so expensive that you want to perform the calculations off-line and save the results for a later session. The automatic memoization facility provides a simple and transparent method to save the results and have them associated with the function automatically in a later session. See Save-Memo-Table in the following section for an explanation of how to do that. (D) The final case is when using memoization as a tool in conventional performance profiling and optimization. Many implementations provide some sort of a metering system, and this should be used for major test cases. However, there is often tremendous overhead involved, with 20-30x slower performance when a routine is fully metered. For quick test cases, it is often useful to know if speeding up a particular routine will have much effect on the top-level timing. By using Memoized-Time or With-Memoization, a user can memoize the routines in question then run the same test case multiple times. If the identical test case runs only, say 5% faster even during the second memoized run, then this suggests that no amount of memoization in the routines in question will make more than a 5% difference in the performance of the test case, and that this is likely not the place to begin the optimization efforts. MAIN USER ROUTINES ================== Define-Memo-Function: a macro that can be used just like "defun", but which has the result of defining a memoized function. Also updates the doc string and the results of calling "Arglist" (if available in current LISP implementation) on that function name. Any of the keywords acceptable to Memoize can optionally be passed on, resulting in specialized versions of memoization that seed their initial hash tables from the disk, use particular hash table tests, etc. With-Memoization: a macro that takes a list of function names and any number of LISP forms and executes them in a context where the functions are temporarily memoized. (With-Memoization (Foo Bar Baz) (Form-1) (Form-2)) results in executing the two forms while functions named Foo, Bar, and Baz are memoized. Useful for getting a quick feel for the potential speed benefits of memoization. Without-Memoization: a macro that executes LISP forms in a context where all memoization is temporarily turned off. (Without-Memoization (Form-1) (Form-2)) executes the two forms with no functions memoized. Memoize: Takes a function name and changes its function definition to be a memoized function. (defun Foo (Args) <Body of Foo>) followed by (Memoize 'Foo) has the same effect as doing (Define-Memo-Function Foo (Args) <Body of Foo>), but calling "Memoize" directly is sometimes more convenient when testing things out, as it requires no changes in the preexisting code. Save-Memo-Table: Saves to disk a definition of the hash table associated with a given memoized function name. By defining a memoized function via (Define-Memo-Function Foo (<Args>) <Body>) running the time-consuming cases off-line, calling (Save-Memo-Table '<Function-Name>) then using (Define-Memo-Function Foo (<Args>) (:Hash-Table-Source :Disk) <Body>) or by calling Memoize with the :Hash-Table-Source set to :Disk, you can have a function "remember" the values it calculated, not only in the current run but also in the previous off-line run. Clear-Memo-Table: Takes a function name and clears out the memo table associated with the function. Useful when some internal change makes the previously stored values incorrect. Unmemoize: Takes a function name and returns it to the unmemoized form. Useful for timing and for debugging, especially tracing recursive routines. In combination with "Memoize", this lets you switch back and forth between memoized and normal versions without changing or reloading the code. Similarly, Unmemoize-Functions takes a list instead of a single one, and Unmemoize-All-Functions unmemoizes everything. Rememoize: Takes the name of a function that is currently unmemoized, but which had previously been memoized. Memoizes it again, but uses the previous hash table instead of creating a new one. Similarly, Rememoize-Functions applies to a list. Memoized-Function-Call-Count: Given the name of a memoized function, tells how many times that function has been called, and which of those were simple table lookups that had been stored from a previous invocation, vs how many were newly calculated using the original function. For a normal memoized function, lets the user see if memoization is paying off after a long period of time. For a function whose memo table was stored on disk, lets the user see if the stored values covered all or most of the cases. Memoized-Time: Takes a list of functions and a single form and evaluates and times the form 3 times, once without memoization, once with memoization and an empty cache, and once with memoization but the full cache from the previous run. *Memoized-Function-Names*: a list of the currently memoized functions. "Memoize", "Memo", and "Clear-Memo-Table" were based on code in Peter Norvig's outstanding book _Paradigms of AI Programming: Case Studies in Common LISP_, Morgan Kaufmann, 1992, which in turn was inspired by code in ex. 3.27 of Abelson, Sussman, and Sussman's _Structure and Interpretation of Computer Programs_, MIT Press, 1985. Comments and suggestions on the code were given by Jim Mayfield (University of Maryland Baltimore County), J. Paul McNamee (AAI Corporation), Peter Norvig (Sun Microsystems), and