home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.lang.lisp:2924 comp.ai:4354
- Newsgroups: comp.lang.lisp,comp.ai
- Path: sparky!uunet!cs.utexas.edu!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!aplcen.apl.jhu.edu!aplcenmp!hall
- From: hall@aplcenmp.apl.jhu.edu (Marty Hall)
- Subject: Automatic Memoization Facility Available
- Message-ID: <By0tIx.Dst@aplcenmp.apl.jhu.edu>
- Organization: AAI Corp AI Lab, JHU P/T CS Faculty
- Date: Fri, 20 Nov 1992 15:25:44 GMT
- Lines: 222
-
-
- This message is to announce the availability of a facility for
- automatic memoization in Common LISP. Automatic memoization is a
- technique by which an existing function can be transformed into one
- that "remembers" previous arguments and their associated results,
- yielding large performance gains for certain types of applications.
- This code, in beta version, was inspired by the section on automatic
- memoization in Peter Norvig's _Paradigms of AI Programming_ (which in
- turn was inspired by Abelson and Sussman's _Structure and
- Interpretation of Computer Programs_). However, the idea of this
- facility is to extend those ideas into what is needed for a practical
- facility for use in a large program. As such, it adds many facilities
- for bookkeeping, timing, evaluating the timing advantages memoization
- provides, saving hash tables to disk for automatic reuse in a later
- session, etc. This utility has been used over the last year on the
- DARPA Signature Management System, resulting in performance
- improvements of 10-20 fold in several key modules.
-
- These utilities, along with an overview of memoization and its
- applications, are described below. The code itself is available via
- anonymous FTP from archive.cs.umbc.edu, in /pub/Memoization. If you
- are interested but do not have FTP access, I can send the code via email.
-
- I am quite interested in suggestions, comments, corrections, and
- experiences of anyone who tries out the package.
-
- Marty Hall
- Artificial Intelligence Lab
- AAI Corporation
- PO Box 126
- Hunt Valley, MD 21030
- hall@aplcen.apl.jhu.edu
- (410) 683-6455
-
- =================== Lengthy Description Follows =============================
-
- WHAT IS AUTOMATIC MEMOIZATION?
- ==============================
-
- The idea of memoization is that it allows a function to "remember"
- previous invocations, returning the previously calculated values
- (rather than recalculating) if it is called with exactly the same
- arguments as in a previous execution. *Automatic* memoization means
- that an existing function can be transformed into a memoized one
- without requiring any changes in the code for the function itself.
- This can result in tremendous speedups if calculations are repeated at
- various points in a program's execution, yet while remaining somewhat
- transparent to the users of the code.
-
- APPLICATIONS
- ============
-
- (A) There are four basic applications of memoization. The first, which is
- illustrated below, is when a single routine calls some subroutine (or
- itself recursively) more than is needed, resulting in extra calculations.
- By memoizing, these results are returned immediately for subsequent calls,
- with the effect of dynamic programming. In fact, this first case can be
- thought of as a tool for automatic dynamic programming, but without the
- need to build the subpieces in the correct order. This can frequently
- reduce the time of exponential algorithms to polynomial or even linear
- time. Given enough thought, this can be solved without an automatic
- memoization facility by either building up the subpieces in the proper
- order or maintaining a special purpose local data structure to retain the
- results (ie "manual" memoization). The advantage of doing it automatically
- is that less debugging and testing is required if the simple algorithm has
- been already tested, the versions can be changed back and forth
- interactively at run time, it is more transparent, and most importantly it
- is simple and easy to use.
-
- To illustrate this first case, consider a naive implementation of a
- function to calculate the Nth Fibonacci number:
-
- (defun Fib (N)
- (if (<= N 1)
- 1
- (+ (Fib (- N 1)) (Fib (- N 2)))) )
-
- Once (Fib (- N 1)) is calculated, there should be no need to repeat the
- calculation of (Fib (- N 2)), since it has already been performed as part of
- the calculation of (Fib (- N 1)). These wasted calculations result in
- exponential time to calculate (Fib N), growing as the golden ratio to
- the Nth power. On almost any machine, N of 40 or 50 is the maximum tractable
- calculation. Calling (Memoize 'Fib) then calling Fib results in linear time,
- so that N in the hundreds still only requires fractions of a second on most
- machines. Later calculations require only constant time if they calculate
- (Fib M) for an M less than or equal to a previously calculated value.
-
- Of course, one doesn't need memoization to get an efficient calculation of
- Fibonacci numbers. Simple tail-recursive or iterative implementations will
- give the same performance, and there is even a closed form. But there are
- many real-life problems where the more efficient algorithm is not so obvious,
- and it is far simpler to memoize the obvious algorithm than to determine
- a better algorithm. For instance, in the Memoization-Examples file included
- in the distribution, a slightly less obvious illustration is given of an
- algorithm to calculated divided differences. As further illustrations, Peter
- Norvig showed that a memoized version of a simple recursive descent parser
- yields the same performance as chart parsing or Earley's algorithm for parsing
- context free languages. Paul McNamee has shown that memoizing the simple
- brute-force approaches to the 0/1 knapsack problem or the matrix chain
- problem gives the same performance as the complex dynamic programming
- implementations. Other similar examples abound.
-
- (B) The second case is for invocations of a function that are repeated over
- time, but from scattered places in the program, or even when revoked
- repeatedly by a user in an interactive program. This generally yields a
- speedup by a constant factor, but that factor may be large. Without an
- automatic memoization facility, the only alternative is maintaining a
- special purpose global data structure, requiring testing and debugging,
- and much extra effort for something that at best is equally efficient as
- memoization.
-
- (C) The third case is when a function is so expensive that you want to
- perform the calculations off-line and save the results for a later
- session. The automatic memoization facility provides a simple and
- transparent method to save the results and have them associated with the
- function automatically in a later session. See Save-Memo-Table in the
- following section for an explanation of how to do that.
-
- (D) The final case is when using memoization as a tool in conventional
- performance profiling and optimization. Many implementations provide
- some sort of a metering system, and this should be used for major test
- cases. However, there is often tremendous overhead involved, with
- 20-30x slower performance when a routine is fully metered. For quick
- test cases, it is often useful to know if speeding up a particular
- routine will have much effect on the top-level timing. By using
- Memoized-Time or With-Memoization, a user can memoize the routines in
- question then run the same test case multiple times. If the identical
- test case runs only, say 5% faster even during the second memoized
- run, then this suggests that no amount of memoization in the routines
- in question will make more than a 5% difference in the performance of
- the test case, and that this is likely not the place to begin the
- optimization efforts.
-
- MAIN USER ROUTINES
- ==================
-
- Define-Memo-Function: a macro that can be used just like "defun", but
- which has the result of defining a memoized function. Also updates
- the doc string and the results of calling "Arglist" (if available in
- current LISP implementation) on that function name. Any of the keywords
- acceptable to Memoize can optionally be passed on, resulting in
- specialized versions of memoization that seed their initial hash
- tables from the disk, use particular hash table tests, etc.
-
- With-Memoization: a macro that takes a list of function names and any
- number of LISP forms and executes them in a context where the
- functions are temporarily memoized.
- (With-Memoization (Foo Bar Baz)
- (Form-1)
- (Form-2)) results in executing the two forms while functions
- named Foo, Bar, and Baz are memoized. Useful for getting a quick feel
- for the potential speed benefits of memoization.
-
- Without-Memoization: a macro that executes LISP forms in a context
- where all memoization is temporarily turned off.
- (Without-Memoization
- (Form-1)
- (Form-2)) executes the two forms with no functions memoized.
-
- Memoize: Takes a function name and changes its function definition to
- be a memoized function.
- (defun Foo (Args) <Body of Foo>) followed by
- (Memoize 'Foo) has the same effect as doing
- (Define-Memo-Function Foo (Args) <Body of Foo>), but calling
- "Memoize" directly is sometimes more convenient when testing things
- out, as it requires no changes in the preexisting code.
-
- Save-Memo-Table: Saves to disk a definition of the hash table
- associated with a given memoized function name. By defining a
- memoized function via
- (Define-Memo-Function Foo (<Args>)
- <Body>)
- running the time-consuming cases off-line, calling
- (Save-Memo-Table '<Function-Name>)
- then using
- (Define-Memo-Function Foo (<Args>)
- (:Hash-Table-Source :Disk)
- <Body>)
- or by calling Memoize with the :Hash-Table-Source set to :Disk,
- you can have a function "remember" the values it calculated, not only
- in the current run but also in the previous off-line run.
-
- Clear-Memo-Table: Takes a function name and clears out the memo table
- associated with the function. Useful when some internal change makes
- the previously stored values incorrect.
-
- Unmemoize: Takes a function name and returns it to the unmemoized form.
- Useful for timing and for debugging, especially tracing recursive
- routines. In combination with "Memoize", this lets you switch back
- and forth between memoized and normal versions without changing or
- reloading the code. Similarly, Unmemoize-Functions takes a list
- instead of a single one, and Unmemoize-All-Functions unmemoizes
- everything.
-
- Rememoize: Takes the name of a function that is currently unmemoized,
- but which had previously been memoized. Memoizes it again, but uses
- the previous hash table instead of creating a new one. Similarly,
- Rememoize-Functions applies to a list.
-
- Memoized-Function-Call-Count: Given the name of a memoized function,
- tells how many times that function has been called, and which of
- those were simple table lookups that had been stored from a previous
- invocation, vs how many were newly calculated using the original
- function. For a normal memoized function, lets the user see if
- memoization is paying off after a long period of time. For a function
- whose memo table was stored on disk, lets the user see if the stored
- values covered all or most of the cases.
-
- Memoized-Time: Takes a list of functions and a single form and evaluates
- and times the form 3 times, once without memoization, once with
- memoization and an empty cache, and once with memoization but the
- full cache from the previous run.
-
- *Memoized-Function-Names*: a list of the currently memoized functions.
-
- "Memoize", "Memo", and "Clear-Memo-Table" were based on code in Peter
- Norvig's outstanding book _Paradigms of AI Programming: Case Studies in
- Common LISP_, Morgan Kaufmann, 1992, which in turn was inspired by code in
- ex. 3.27 of Abelson, Sussman, and Sussman's _Structure and Interpretation
- of Computer Programs_, MIT Press, 1985. Comments and suggestions on the
- code were given by Jim Mayfield (University of Maryland Baltimore County),
- J. Paul McNamee (AAI Corporation), Peter Norvig (Sun Microsystems), and
-