SGI Freeware 2001 May

home *** CD-ROM | disk | FTP | other *** search

/ SGI Freeware 2001 May / SGI Freeware 2001 May - Disc 3.iso / dist / fw_elisp-intro.idb / usr / freeware / info / emacs-lisp-intro.info-9.z / emacs-lisp-intro.info-9

Wrap

GNU Info File | 1998-10-28 | 48.7 KB | 1,180 lines

This is Info file emacs-lisp-intro.info, produced by Makeinfo version 1.67 from the input file emacs-lisp-intro.texi. This is an introduction to `Programming in Emacs Lisp', for people who are not programmers. Edition 1.05, 21 October 1997 Copyright (C) 1990, '91, '92, '93, '94, '95, '97 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled "Copying" and "GNU General Public License" are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Free Software Foundation. File: emacs-lisp-intro.info, Node: Whitespace Bug, Prev: count-words-region, Up: count-words-region The Whitespace Bug in `count-words-region' ------------------------------------------ The `count-words-region' command described in the preceding section has two bugs, or rather, one bug with two manifestations. First, if you mark a region containing only whitespace in the middle of some text, the `count-words-region' command tells you that the region contains one word! Second, if you mark a region containing only whitespace at the end of the buffer or the accessible portion of a narrowed buffer, the command displays an error message that looks like this: Search failed: "\\w+\\W*" If you are reading this in Info in GNU Emacs, you can test for these bugs yourself. First, evaluate the function in the usual manner to install it. Here is a copy of the definition. Place your cursor after the closing parenthesis and type `C-x C-e' to install it. ;; First version; has bugs! (defun count-words-region (beginning end) "Print number of words in the region. Words are defined as at least one word-constituent character followed by at least one character that is not a word-constituent. The buffer's syntax table determines which characters these are." (interactive "r") (message "Counting words in region ... ") ;;; 1. Set up appropriate conditions. (save-excursion (goto-char beginning) (let ((count 0)) ;;; 2. Run the while loop. (while (< (point) end) (re-search-forward "\\w+\\W*") (setq count (1+ count))) ;;; 3. Send a message to the user. (cond ((zerop count) (message "The region does NOT have any words.")) ((= 1 count) (message "The region has 1 word.")) (t (message "The region has %d words." count)))))) If you wish, you can also install this keybinding by evaluating it, too: (global-set-key "\C-c=" 'count-words-region) To conduct the first test, set mark and point to the beginning and end of the following line and then type `C-c =' (or `M-x count-words-region' if you have not bound `C-c ='): one two three Emacs will tell you, correctly, that the region has three words. Repeat the test, but place mark at the beginning of the line and place point just *before* the word `one'. Again type the command `C-c =' (or `M-x count-words-region'). Emacs should tell you that the region has no words, since it is composed only of the whitespace at the beginning of the line. But instead Emacs tells you that the region has one word! For the third test, copy the sample line to the end of the `*scratch*' buffer and then type several spaces at the end of the line. Place mark right after the word `three' and point at the end of line. (The end of the line will be the end of the buffer.) Type `C-c =' (or `M-x count-words-region') as you did before. Again, Emacs should tell you that the region has no words, since it is composed only of the whitespace at the end of the line. Instead, Emacs displays an error message saying `Search failed'. The two bugs stem from the same problem. Consider the first manifestation of the bug, in which the command tells you that the whitespace at the beginning of the line contains one word. What happens is this: The `M-x count-words-region' command moves point to the beginning of the region. The `while' tests whether the value of point is smaller than the value of `end', which it is. Consequently, the regular expression search looks for and finds the first word. It leaves point after the word. `count' is set to one. The `while' loop repeats; but this time the value of point is larger than the value of `end', the loop is exited; and the function displays a message saying the number of words in the region is one. In brief, the regular expression search looks for and finds the word even though it is outside the marked region. In the second manifestation of the bug, the region is whitespace at the end of the buffer. Emacs says `Search failed'. What happens is that the true-or-false-test in the `while' loop tests true, so the search expression is executed. But since there are no more words in the buffer, the search fails. In both manifestations of the bug, the search extends or attempts to extend outside of the region. The solution is to limit the search to the region--this is a fairly simple action, but as you may have come to expect, it is not quite as simple as you might think. As we have seen, the `re-search-forward' function takes a search pattern as its first argument. But in addition to this first, mandatory argument, it accepts three optional arguments. The optional second argument bounds the search. The optional third argument, if `t', causes the function to return `nil' rather than signal an error if the search fails. The optional fourth argument is a repeat count. (In Emacs, you can get a function's documentation by typing `C-h f', the name of the function, and then <RET>.) In the `count-words-region' definition, the value of the end of the region is held by the variable `end' which is passed as an argument to the function. Thus, we can add `end' as an argument to the regular expression search expression: (re-search-forward "\\w+\\W*" end) However, if you make only this change to the `count-words-region' definition and then test the new version of the definition on a stretch of whitespace, you will receive an error message saying `Search failed'. What happens is this: the search is limited to the region, and fails as you expect because there are no word-constituent characters in the region. Since it fails, we receive an error message. But we do not want to receive an error message in this case; we want to receive the message that "The region does NOT have any words." The solution to this problem is to provide `re-search-forward' with a third argument of `t', which causes the function to return `nil' rather than signal an error if the search fails. However, if you make this change and try it, you will see the message "Counting words in region ... " and ... you will keep on seeing that message ..., until you type `C-g' (`keyboard-quit'). Here is what happens: the search is limited to the region, as before, and it fails because there are no word-constituent characters in the region, as expected. Consequently, the `re-search-forward' expression returns `nil'. It does nothing else. In particular, it does not move point, which it does as a side effect if it finds the search target. After the `re-search-forward' expression returns `nil', the next expression in the `while' loop is evaluated. This expression increments the count. Then the loop repeats. The true-or-false-test tests true because the value of point is still less than the value of end, since the `re-search-forward' expression did not move point. ... and the cycle repeats ... The `count-words-region' definition requires yet another modification, to cause the true-or-false-test of the `while' loop to test false if the search fails. Put another way, there are two conditions that must be satisfied in the true-or-false-test before the word count variable is incremented: point must still be within the region and the search expression must have found a word to count. Since both the first condition and the second condition must be true together, the two expressions, the region test and the search expression, can be joined with an `and' function and embedded in the `while' loop as the true-or-false-test, like this: (and (< (point) end) (re-search-forward "\\w+\\W*" end t)) (*Note forward-paragraph::, for information about `and'.) The `re-search-forward' expression returns `t' if the search succeeds and as a side effect moves point. Consequently, as words are found, point is moved through the region. When the search expression fails to find another word, or when point reaches the end of the region, the true-or-false-test tests false, the `while' loop exists, and the `count-words-region' function displays one or other of its messages. After incorporating these final changes, the `count-words-region' works without bugs (or at least, without bugs that I have found!). Here is what it looks like: ;;; Final version: `while' (defun count-words-region (beginning end) "Print number of words in the region." (interactive "r") (message "Counting words in region ... ") ;;; 1. Set up appropriate conditions. (save-excursion (let ((count 0)) (goto-char beginning) ;;; 2. Run the while loop. (while (and (< (point) end) (re-search-forward "\\w+\\W*" end t)) (setq count (1+ count))) ;;; 3. Send a message to the user. (cond ((zerop count) (message "The region does NOT have any words.")) ((= 1 count) (message "The region has 1 word.")) (t (message "The region has %d words." count)))))) File: emacs-lisp-intro.info, Node: recursive-count-words, Next: Counting Exercise, Prev: count-words-region, Up: Counting Words Count Words Recursively ======================= You can write the function for counting words recursively as well as with a `while' loop. Let's see how this is done. First, we need to recognize that the `count-words-region' function has three jobs: it sets up the appropriate conditions for counting to occur; it counts the words in the region; and it sends a message to the user telling how many words there are. If we write a single recursive function to do everything, we will receive a message for every recursive call. If the region contains 13 words, we will receive thirteen messages, one right after the other. We don't want this! Instead, we must write two functions to do the job, one of which (the recursive function) will be used inside of the other. One function will set up the conditions and display the message; the other will return the word count. Let us start with the function that causes the message to be displayed. We can continue to call this `count-words-region'. This is the function that the user will call. It will be interactive. Indeed, it will be similar to our previous versions of this function, except that it will call `recursive-count-words' to determine how many words are in the region. We can readily construct a template for this function, based on our previous versions: ;; Recursive version; uses regular expression search (defun count-words-region (beginning end) "DOCUMENTATION..." (INTERACTIVE-EXPRESSION...) ;;; 1. Set up appropriate conditions. (EXPLANATORY MESSAGE) (SET-UP FUNCTIONS... ;;; 2. Count the words. RECURSIVE CALL ;;; 3. Send a message to the user. MESSAGE PROVIDING WORD COUNT)) The definition looks straightforward, except that somehow, the count returned by the recursive call must be passed to the message displaying the word count. A little thought suggests that this can be done by making use of a `let' expression: we can bind a variable in the varlist of a `let' expression to the number of words in the region, as returned by the recursive call; and then the `cond' expression, using binding, can display the value to the user. Often, one thinks of the binding within a `let' expression as somehow secondary to the `primary' work of a function. But in this case, what you might consider the `primary' job of the function, counting words, is done within the `let' expression. Using `let', the function definition looks like this: (defun count-words-region (beginning end) "Print number of words in the region." (interactive "r") ;;; 1. Set up appropriate conditions. (message "Counting words in region ... ") (save-excursion (goto-char beginning) ;;; 2. Count the words. (let ((count (recursive-count-words end))) ;;; 3. Send a message to the user. (cond ((zerop count) (message "The region does NOT have any words.")) ((= 1 count) (message "The region has 1 word.")) (t (message "The region has %d words." count)))))) Next, we need to write the recursive counting function. A recursive function has at least three parts: the `do-again-test', the `next-step-expression', and the recursive call. The do-again-test determines whether the function will or will not be called again. Since we are counting words in a region and can use a function that moves point forward for every word, the do-again-test can check whether point is still within the region. The do-again-test should find the value of point and determine whether point is before, at, or after the value of the end of the region. We can use the `point' function to locate point. Clearly, we must pass the value of the end of the region to the recursive counting function as an argument. In addition, the do-again-test should also test whether the search finds a word. If it does not, the function should not call itself again. The next-step-expression changes a value so that when the recursive function is supposed to stop calling itself, it stops. More precisely, the next-step-expression changes a value so that at the right time, the do-again-test stops the recursive function from calling itself again. In this case, the next-step-expression can be the expression that moves point forward word by word. The third part of a recursive function is the recursive call. Somewhere, also, we also need a part that does the `work' of the function, a part that does the counting. A vital part! But already, we have an outline of the recursive counting function: (defun recursive-count-words (region-end) "DOCUMENTATION..." DO-AGAIN-TEST NEXT-STEP-EXPRESSION RECURSIVE CALL) Now we need to fill in the slots. Let's start with the simplest cases first: if point is at or beyond the end of the region, there cannot be any words in the region, so the function should return zero. Likewise, if the search fails, there are no words to count, so the function should return zero. On the other hand, if point is within the region and the search succeeds, the function should call itself again. Thus, the do-again-test should look like this: (and (< (point) region-end) (re-search-forward "\\w+\\W*" region-end t)) Note that the search expression is part of the do-again-test--the function returns `t' if its search succeeds and `nil' if it fails. (*Note The Whitespace Bug in `count-words-region': Whitespace Bug, for an explanation of how `re-search-forward' works.) The do-again-test is the true-or-false test of an `if' clause. Clearly, if the do-again-test succeeds, the then-part of the `if' clause should call the function again; but if it fails, the else-part should return zero since either point is outside the region or the search failed because there were no words to find. But before considering the recursive call, we need to consider the next-step-expression. What is it? Interestingly, it is the search part of the do-again-test. In addition to returning `t' or `nil' for the do-again-test, `re-search-forward' moves point forward as a side effect of a successful search. This is the action that changes the value of point so that the recursive function stops calling itself when point completes its movement through the region. Consequently, the `re-search-forward' expression is the next-step-expression. In outline, then, the body of the `recursive-count-words' function looks like this: (if DO-AGAIN-TEST-AND-NEXT-STEP-COMBINED ;; then RECURSIVE-CALL-RETURNING-COUNT ;; else RETURN-ZERO) How to incorporate the mechanism that counts? If you are not used to writing recursive functions, a question like this can be troublesome. But it can and should be approached systematically. We know that the counting mechanism should be associated in some way with the recursive call. Indeed, since the next-step-expression moves point forward by one word, and since a recursive call is made for each word, the counting mechanism must be an expression that adds one to the value returned by a call to `recursive-count-words'. Consider several cases: * If there are two words in the region, the function should return a value resulting from adding one to the value returned when it counts the first word, plus the number returned when it counts the remaining words in the region, which in this case is one. * If there is one word in the region, the function should return a value resulting from adding one to the value returned when it counts that word, plus the number returned when it counts the remaining words in the region, which in this case is zero. * If there are no words in the region, the function should return zero. From the sketch we can see that the else-part of the `if' returns zero for the case of no words. This means that the then-part of the `if' must return a value resulting from adding one to the value returned from a count of the remaining words. The expression will look like this, where `1+' is a function that adds one to its argument. (1+ (recursive-count-words region-end)) The whole `recursive-count-words' function will then look like this: (defun recursive-count-words (region-end) "DOCUMENTATION..." ;;; 1. do-again-test (if (and (< (point) region-end) (re-search-forward "\\w+\\W*" region-end t)) ;;; 2. then-part: the recursive call (1+ (recursive-count-words region-end)) ;;; 3. else-part 0)) Let's examine how this works: If there are no words in the region, the else part of the `if' expression is evaluated and consequently the function returns zero. If there is one word in the region, the value of point is less than the value of `region-end' and the search succeeds. In this case, the true-or-false-test of the `if' expression tests true, and the then-part of the `if' expression is evaluated. The counting expression is evaluated. This expression returns a value (which will be the value returned by the whole function) that is the sum of one added to the value returned by a recursive call. Meanwhile, the next-step-expression has caused point to jump over the first (and in this case only) word in the region. This means that when `(recursive-count-words region-end)' is evaluated a second time, as a result of the recursive call, the value of point will be equal to or greater than the value of region end. So this time, `recursive-count-words' will return zero. The zero will be added to one, and the original evaluation of `recursive-count-words' will return one plus zero, which is one, which is the correct amount. Clearly, if there are two words in the region, the first call to `recursive-count-words' returns one added to the value returned by calling `recursive-count-words' on a region containing the remaining word--that is, it adds one to one, producing two, which is the correct amount. Similarly, if there are three words in the region, the first call to `recursive-count-words' returns one added to the value returned by calling `recursive-count-words' on a region containing the remaining two words--and so on and so on. With full documentation the two functions look like this: The recursive function: (defun recursive-count-words (region-end) "Number of words between point and REGION-END." ;;; 1. do-again-test (if (and (< (point) region-end) (re-search-forward "\\w+\\W*" region-end t)) ;;; 2. then-part: the recursive call (1+ (recursive-count-words region-end)) ;;; 3. else-part 0)) The wrapper: ;;; Recursive version (defun count-words-region (beginning end) "Print number of words in the region. Words are defined as at least one word-constituent character followed by at least one character that is not a word-constituent. The buffer's syntax table determines which characters these are." (interactive "r") (message "Counting words in region ... ") (save-excursion (goto-char beginning) (let ((count (recursive-count-words end))) (cond ((zerop count) (message "The region does NOT have any words.")) ((= 1 count) (message "The region has 1 word.")) (t (message "The region has %d words." count)))))) File: emacs-lisp-intro.info, Node: Counting Exercise, Prev: recursive-count-words, Up: Counting Words Exercise: Counting Punctuation ============================== Using a `while' loop, write a function to count the number of punctuation marks in a region--period, comma, semicolon, colon, exclamation mark, question mark. Do the same using recursion. File: emacs-lisp-intro.info, Node: Words in a defun, Next: Readying a Graph, Prev: Counting Words, Up: Top Counting Words in a `defun' *************************** Our next project is to count the number of words in a function definition. Clearly, this can be done using some variant of `count-word-region'. *Note Counting Words: Repetition and Regexps: Counting Words. If we are just going to count the words in one definition, it is easy enough to mark the definition with the `C-M-h' (`mark-defun') command, and then call `count-word-region'. However, I am more ambitious: I want to count the words and symbols in every definition in the Emacs sources and then print a graph that shows how many functions there are of each length: how many contain 40 to 49 words or symbols, how many contain 50 to 59 words or symbols, and so on. I have often been curious how long a typical function is, and this will tell. * Menu: * Divide and Conquer:: Split a daunting project into parts. * Words and Symbols:: What to count? * Syntax:: What constitutes a word or symbol? * count-words-in-defun:: Very like `count-words'. * Several defuns:: Counting several defuns in a file. * Find a File:: Do you want to look at a file? * lengths-list-file:: A list of the lengths of many definitions. * Several files:: Counting in definitions in different files. * Several files recursively:: Recursively counting in different files. * Prepare the data:: Prepare the data for display in a graph. File: emacs-lisp-intro.info, Node: Divide and Conquer, Next: Words and Symbols, Prev: Words in a defun, Up: Words in a defun Divide and Conquer ================== Described in one phrase, the histogram project is daunting; but divided into numerous small steps, each of which we can take one at a time, the project becomes less fearsome. Let us consider what the steps must be: * First, write a function to count the words in one definition. This includes the problem of handling symbols as well as words. * Second, write a function to list the numbers of words in each function in a file. This function can use the `count-words-in-defun' function. * Third, write a function to list the numbers of words in each function in each of several files. This entails automatically finding the various files, switching to them, and counting the words in the definitions within them. * Fourth, write a function to convert the list of numbers that we created in step three to a form that will be suitable for printing as a graph. * Fifth, write a function to print the results as a graph. This is quite a project! But if we take each step slowly, it will not be difficult. File: emacs-lisp-intro.info, Node: Words and Symbols, Next: Syntax, Prev: Divide and Conquer, Up: Words in a defun What to Count? ============== When we first start thinking about how to count the words in a function definition, the first question is (or ought to be) what are we going to count? When we speak of `words' with respect to a Lisp function definition, we are actually speaking, in large part, of `symbols'. For example, the following `multiply-by-seven' function contains the five symbols `defun', `multiply-by-seven', `number', `*', and `7'. In addition, in the documentation string, it contains the four words `Multiply', `NUMBER', `by', and `seven'. The symbol `number' is repeated, so the definition contains a total of ten words and symbols. (defun multiply-by-seven (number) "Multiply NUMBER by seven." (* 7 number)) However, if we mark the `multiply-by-seven' definition with `C-M-h' (`mark-defun'), and then call `count-words-region' on it, we will find that `count-words-region' claims the definition has eleven words, not ten! Something is wrong! The problem is twofold: `count-words-region' does not count the `*' as a word, and it counts the single symbol, `multiply-by-seven', as containing three words. The hyphens are treated as if they were interword spaces rather than intraword connectors: `multiply-by-seven' is counted as if it were written `multiply by seven'. The cause of this confusion is the regular expression search within the `count-words-region' definition that moves point forward word by word. In the canonical version of `count-words-region', the regexp is: "\\w+\\W*" This regular expression is a pattern defining one or more word constituent characters possibly followed by one or more characters that are not word constituents. What is meant by `word constituent characters' brings us to the issue of syntax, which is worth a section of its own. File: emacs-lisp-intro.info, Node: Syntax, Next: count-words-in-defun, Prev: Words and Symbols, Up: Words in a defun What Constitutes a Word or Symbol? ================================== Emacs treats different characters as belonging to different "syntax categories". For example, the regular expression, `\\w+', is a pattern specifying one or more *word constituent* characters. Word constituent characters are members of one syntax category. Other syntax categories include the class of punctuation characters, such as the period and the comma, and the class of whitespace characters, such as the blank space and the tab character. (For more information, see *Note Syntax: (emacs)Syntax, and, *Note Syntax Tables: (elisp)Syntax Tables.) Syntax tables specify which characters belong to which categories. Usually, a hyphen is not specified as a `word constituent character'. Instead, it is specified as being in the `class of characters that are part of symbol names but not words.' This means that the `count-words-region' function treats it in the same way it treats an interword white space, which is why `count-words-region' counts `multiply-by-seven' as three words. There are two ways to cause Emacs to count `multiply-by-seven' as one symbol: modify the syntax table or modify the regular expression. We could redefine a hyphen as a word constituent character by modifying the syntax table that Emacs keeps for each mode. This action would serve our purpose, except that a hyphen is merely the most common character within symbols that is not typically a word constituent character; there are others, too. Alternatively, we can redefine the regular expression used in the `count-words' definition so as to include symbols. This procedure has the merit of clarity, but the task is a little tricky. The first part is simple enough: the pattern must match "at least one character that is a word or symbol constituent". Thus: \\(\\w\\|\\s_\\)+ The `\\(' is the first part of the grouping construct that includes the `\\w' and the `\\s_' as alternatives, separated by the `\\|'. The `\\w' matches any word-constituent character and the `\\s_' matches any character that is part of a symbol name but not a word-constituent character. The `+' following the group indicates that the word or symbol constituent characters must be matched at least once. However, the second part of the regexp is more difficult to design. What we want is to follow the first part with "optionally one or more characters that are not constituents of a word or symbol". At first, I thought I could define this with the following: \\(\\W\\|\\S_\\)*" The upper case `W' and `S' match characters that are *not* word or symbol constituents. Unfortunately, this expression matches any character that is either not a word constituent or not a symbol constituent. This matches any character! I then noticed that every word or symbol in my test region was followed by white space (blank space, tab, or newline). So I tried placing a pattern to match one or more blank spaces after the pattern for one or more word or symbol constituents. This failed, too. Words and symbols are often separated by whitespace, but in actual code parentheses may follow symbols and punctuation may follow words. So finally, I designed a pattern in which the word or symbol constituents are followed optionally by characters that are not white space and then followed optionally by white space. Here is the full regular expression: "\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" File: emacs-lisp-intro.info, Node: count-words-in-defun, Next: Several defuns, Prev: Syntax, Up: Words in a defun The `count-words-in-defun' Function =================================== We have seen that there are several ways to write a `count-word-region' function. To write a `count-words-in-defun', we need merely adapt one of these versions. The version that uses a `while' loop is easy to understand, so I am going to adapt that. Because `count-words-in-defun' will be part of a more complex program, it need not be interactive and it need not display a message but just return the count. These considerations simplify the definition a little. On the other hand, `count-words-in-defun' will be used within a buffer that contains function definitions. Consequently, it is reasonable to ask that the function determine whether it is called when point is within a function definition, and if it is, to return the count for that definition. This adds complexity to the definition, but saves us from needing to pass arguments to the function. These considerations lead us to prepare the following template: (defun count-words-in-defun () "DOCUMENTATION..." (SET UP... (WHILE LOOP...) RETURN COUNT) As usual, our job is to fill in the slots. First, the set up. We are presuming that this function will be called within a buffer containing function definitions. Point will either be within a function definition or not. For `count-words-in-defun' to work, point must move to the beginning of the definition, a counter must start at zero, and the counting loop must stop when point reaches the end of the definition. The `beginning-of-defun' function searches backwards for an opening delimiter such as a `(' at the beginning of a line, and moves point to that position, or else to the limit of the search. In practice, this means that `beginning-of-defun' moves point to the beginning of an enclosing or preceding function definition, or else to the beginning of the buffer. We can use `beginning-of-defun' to place point where we wish to start. The `while' loop requires a counter to keep track of the words or symbols being counted. A `let' expression can be used to create a local variable for this purpose, and bind it to an initial value of zero. The `end-of-defun' function works like `beginning-of-defun' except that it moves point to the end of the definition. `end-of-defun' can be used as part of an expression that determines the position of the end of the definition. The set up for `count-words-in-defun' takes shape rapidly: first we move point to the beginning of the definition, then we create a local variable to hold the count, and, finally, we record the position of the end of the definition so the `while' loop will know when to stop looping. The code looks like this: (beginning-of-defun) (let ((count 0) (end (save-excursion (end-of-defun) (point)))) The code is simple. The only slight complication is likely to concern `end': it is bound to the position of the end of the definition by a `save-excursion' expression that returns the value of point after `end-of-defun' temporarily moves it to the end of the definition. The second part of the `count-words-in-defun', after the set up, is the `while' loop. The loop must contain an expression that jumps point forward word by word and symbol by symbol, and another expression that counts the jumps. The true-or-false-test for the `while' loop should test true so long as point should jump forward, and false when point is at the end of the definition. We have already redefined the regular expression for this (*note Syntax::.), so the loop is straightforward: (while (and (< (point) end) (re-search-forward "\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" end t) (setq count (1+ count))) The third part of the function definition returns the count of words and symbols. This part is the last expression within the body of the `let' expression, and can be, very simply, the local variable `count', which when evaluated returns the count. Put together, the `count-words-in-defun' definition looks like this: (defun count-words-in-defun () "Return the number of words and symbols in a defun." (beginning-of-defun) (let ((count 0) (end (save-excursion (end-of-defun) (point)))) (while (and (< (point) end) (re-search-forward "\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" end t)) (setq count (1+ count))) count)) How to test this? The function is not interactive, but it is easy to put a wrapper around the function to make it interactive; we can use almost the same code as for the recursive version of `count-words-region': ;;; Interactive version. (defun count-words-defun () "Number of words and symbols in a function definition." (interactive) (message "Counting words and symbols in function definition ... ") (let ((count (count-words-in-defun))) (cond ((zerop count) (message "The definition does NOT have any words or symbols.")) ((= 1 count) (message "The definition has 1 word or symbol.")) (t (message "The definition has %d words or symbols." count))))) Let's re-use `C-c =' as a convenient keybinding: (global-set-key "\C-c=" 'count-words-defun) Now we can try out `count-words-defun': install both `count-words-in-defun' and `count-words-defun', and set the keybinding, and then place the cursor within the following definition: (defun multiply-by-seven (number) "Multiply NUMBER by seven." (* 7 number)) => 10 Success! The definition has 10 words and symbols. The next problem is to count the numbers of words and symbols in several definitions within a single file. File: emacs-lisp-intro.info, Node: Several defuns, Next: Find a File, Prev: count-words-in-defun, Up: Words in a defun Count Several `defuns' Within a File ==================================== A file such as `simple.el' may have 80 or more function definitions within it. Our long term goal is to collect statistics on many files, but as a first step, our immediate goal is to collect statistics on one file. The information will be a series of numbers, each number being the length of a function definition. We can store the numbers in a list. We know that we will want to incorporate the information regarding one file with information about many other files; this means that the function for counting definition lengths within one file need only return the lengths' list. It need not and should not display any messages. The word count commands contain one expression to jump point forward word by word and another expression to count the jumps. The definitions' lengths' function can be designed to work the same way, with one expression to jump point forward definition by definition and another expression to construct the lengths' list. This statement of the problem makes it elementary to write the function definition. Clearly, we will start the count at the beginning of the file, so the first command will be `(goto-char (point-min))'. Next, we start the `while' loop; and the true-or-false test of the loop can be a regular expression search for the next function definition--so long as the search succeeds, point is moved forward and then the body of the loop is evaluated. The body needs an expression that constructs the lengths' list. `cons', the list construction command, can be used to create the list. That is almost all there is to it. Here is what this fragment of code looks like: (goto-char (point-min)) (while (re-search-forward "^(defun" nil t) (setq lengths-list (cons (count-words-in-defun) lengths-list))) What we have left out is the mechanism for finding the file that contains the function definitions. In previous examples, we either used this, the Info file, or we switched back and forth to some other buffer, such as the `*scratch*' buffer. Finding a file is a new process that we have not yet discussed. File: emacs-lisp-intro.info, Node: Find a File, Next: lengths-list-file, Prev: Several defuns, Up: Words in a defun Find a File =========== To find a file in Emacs, you use the `C-x C-f' (`find-file') command. This command is almost, but not quite right for the lengths problem. Let's look at the source for `find-file' (you can use the `find-tag' command to find the source of a function): (defun find-file (filename) "Edit file FILENAME. Switch to a buffer visiting file FILENAME, creating one if none already exists." (interactive "FFind file: ") (switch-to-buffer (find-file-noselect filename))) The definition possesses short but complete documentation and an interactive specification that prompts you for a file name when you use the command interactively. The body of the definition contains two functions, `find-file-noselect' and `switch-to-buffer'. According to its documentation as shown by `C-h f' (the `describe-function' command), the `find-file-noselect' function reads the named file into a buffer and returns the buffer. However, the buffer is not selected. Emacs does not switch its attention (or yours if you are using `find-file-noselect') to the named buffer. That is what `switch-to-buffer' does: it switches the buffer to which Emacs attention is directed; and it switches the buffer displayed in the window to the new buffer. We have discussed buffer switching elsewhere. (*Note Switching Buffers::.) In this histogram project, we do not need to display each file on the screen as the program determines the length of each definition within it. Instead of employing `switch-to-buffer', we can work with `set-buffer', which redirects the attention of the computer program to a different buffer but does not redisplay it on the screen. So instead of calling on `find-file' to do the job, we must write our own expression. The task is easy: use `find-file-noselect' and `set-buffer'. File: emacs-lisp-intro.info, Node: lengths-list-file, Next: Several files, Prev: Find a File, Up: Words in a defun `lengths-list-file' in Detail ============================= The core of the `lengths-list-file' function is a `while' loop containing a function to move point forward `defun by defun' and a function to count the number of words and symbols in each defun. This core must be surrounded by functions that do various other tasks, including finding the file, and ensuring that point starts out at the beginning of the file. The function definition looks like this: (defun lengths-list-file (filename) "Return list of definitions' lengths within FILE. The returned list is a list of numbers. Each number is the number of words or symbols in one function definition." (message "Working on `%s' ... " filename) (save-excursion (let ((buffer (find-file-noselect filename)) (lengths-list)) (set-buffer buffer) (setq buffer-read-only t) (widen) (goto-char (point-min)) (while (re-search-forward "^(defun" nil t) (setq lengths-list (cons (count-words-in-defun) lengths-list))) (kill-buffer buffer) lengths-list))) The function is passed one argument, the name of the file on which it will work. It has four lines of documentation, but no interactive specification. Since people worry that a computer is broken if they don't see anything going on, the first line of the body is a message. The next line contains a `save-excursion' that returns Emacs attention to the current buffer when the function completes. This is useful in case you embed this function in another function that presumes point is restored to the original buffer. In the varlist of the `let' expression, Emacs finds the file and binds the local variable `buffer' to the buffer containing the file. At the same time, Emacs creates `lengths-list' as a local variable. Next, Emacs switches its attention to the buffer. In the following line, Emacs makes the buffer read-only. Ideally, this line is not necessary. None of the functions for counting words and symbols in a function definition should change the buffer. Besides, the buffer is not going to be saved, even if it were changed. This line is entirely the consequence of great, perhaps excessive, caution. The reason for the caution is that this function and those it calls work on the sources for Emacs and it is very inconvenient if they are inadvertently modified. It goes without saying that I did not realize a need for this line until an experiment went awry and started to modify my Emacs source files ... Next comes a call to widen the buffer if it is narrowed. This function is usually not needed--Emacs creates a fresh buffer if none already exists; but if a buffer visiting the file already exists Emacs returns that one. In this case, the buffer may be narrowed and must be widened. If we wanted to be fully `user-friendly', we would arrange to save the restriction and the location of point, but we won't. The `(goto-char (point-min))' expression moves point to the beginning of the buffer. Then comes a `while' loop in which the `work' of the function is carried out. In the loop, Emacs determines the length of each definition and constructs a lengths' list containing the information. Emacs kills the buffer after working through it. This is to save space inside of Emacs. My version of Emacs 19 contains over 300 source files of interest. Another function will apply `lengths-list-file' to each of them. If Emacs visits all of them and deletes none, my computer may run out of virtual memory. Finally, the last expression within the `let' expression is the `lengths-list' variable; its value is returned as the value of the whole function. You can try this function by installing it in the usual fashion. Then place your cursor after the following expression and type `C-x C-e' (`eval-last-sexp'). (lengths-list-file "../lisp/debug.el") (You may need to change the pathname of the file; the one here works if this Info file and the Emacs sources are in neighboring places, such as `/usr/local/emacs/info' and `/usr/local/emacs/lisp'. To change the expression, copy it to the `*scratch*' buffer and edit it. Then evaluate it.) On my version of Emacs, the lengths' list for `debug.el' takes seven seconds to produce and looks like this: (75 41 80 62 20 45 44 68 45 12 34 235) Note that the length of the last definition in the file is first in the list. File: emacs-lisp-intro.info, Node: Several files, Next: Several files recursively, Prev: lengths-list-file, Up: Words in a defun Count Words in `defuns' in Different Files ========================================== In the previous section, we created a function that returns a list of the lengths of each definition in a file. Now, we want to define a function to return a master list of the lengths of the definitions in a list of files. Working on each of a list of files is a repetitious act, so we can use either a `while' loop or recursion. The design using a `while' loop is routine. The argument passed the function is a list of files. As we saw earlier (*note Loop Example::.), you can write a `while' loop so that the body of the loop is evaluated if such a list contains elements, but to exit the loop if the list is empty. For this design to work, the body of the loop must contain an expression that shortens the list each time the body is evaluated, so that eventually the list is empty. The usual technique is to set the value of the list to the value of the CDR of the list each time the body is evaluated. The template looks like this: (while TEST-WHETHER-LIST-IS-EMPTY BODY... SET-LIST-TO-CDR-OF-LIST) Also, we remember that a `while' loop returns `nil' (the result of evaluating the true-or-false-test), not the result of any evaluation within its body. (The evaluations within the body of the loop are done for their side effects.) However, the expression that sets the lengths' list is part of the body--and that is the value that we want returned by the function as a whole. To do this, we enclose the `while' loop within a `let' expression, and arrange that the last element of the `let' expression contains the value of the lengths' list. (*Note Loop Example with an Incrementing Counter: Incrementing Example.) These considerations lead us directly to the function itself: ;;; Use `while' loop. (defun lengths-list-many-files (list-of-files) "Return list of lengths of defuns in LIST-OF-FILES." (let (lengths-list) ;;; true-or-false-test (while list-of-files (setq lengths-list (append lengths-list ;;; Generate a lengths' list. (lengths-list-file (expand-file-name (car list-of-files))))) ;;; Make files' list shorter. (setq list-of-files (cdr list-of-files))) ;;; Return final value of lengths' list. lengths-list)) `expand-file-name' is a built-in function that converts a file name to its absolute, long, path name form. Thus, debug.el becomes /usr/local/emacs/lisp/debug.el The only other new element of this function definition is the as yet unstudied function `append', which merits a short section for itself. * Menu: * append:: Attaching one list to another. File: emacs-lisp-intro.info, Node: append, Prev: Several files, Up: Several files The `append' Function --------------------- The `append' function attaches one list to another. Thus, (append '(1 2 3 4) '(5 6 7 8)) produces the list (1 2 3 4 5 6 7 8) This is exactly how we want to attach two lengths' lists produced by `lengths-list-file' to each other. The results contrast with `cons', (cons '(1 2 3 4) '(5 6 7 8)) which constructs a new list in which the first argument to `cons' becomes the first element of the new list: ((1 2 3 4) 5 6 7 8)