home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SGI Freeware 2001 May
/
SGI Freeware 2001 May - Disc 3.iso
/
dist
/
fw_elisp-intro.idb
/
usr
/
freeware
/
info
/
emacs-lisp-intro.info-9.z
/
emacs-lisp-intro.info-9
Encoding:
Amiga
Atari
Commodore
DOS
FM Towns/JPY
Macintosh
Macintosh JP
Macintosh to JP
NeXTSTEP
RISC OS/Acorn
Shift JIS
UTF-8
Wrap
GNU Info File
|
1998-10-28
|
48.7 KB
|
1,180 lines
This is Info file emacs-lisp-intro.info, produced by Makeinfo version
1.67 from the input file emacs-lisp-intro.texi.
This is an introduction to `Programming in Emacs Lisp', for people
who are not programmers.
Edition 1.05, 21 October 1997
Copyright (C) 1990, '91, '92, '93, '94, '95, '97 Free Software
Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the sections entitled "Copying" and "GNU General Public License"
are included exactly as in the original, and provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Free Software Foundation.
File: emacs-lisp-intro.info, Node: Whitespace Bug, Prev: count-words-region, Up: count-words-region
The Whitespace Bug in `count-words-region'
------------------------------------------
The `count-words-region' command described in the preceding section
has two bugs, or rather, one bug with two manifestations. First, if
you mark a region containing only whitespace in the middle of some
text, the `count-words-region' command tells you that the region
contains one word! Second, if you mark a region containing only
whitespace at the end of the buffer or the accessible portion of a
narrowed buffer, the command displays an error message that looks like
this:
Search failed: "\\w+\\W*"
If you are reading this in Info in GNU Emacs, you can test for these
bugs yourself.
First, evaluate the function in the usual manner to install it.
Here is a copy of the definition. Place your cursor after the closing
parenthesis and type `C-x C-e' to install it.
;; First version; has bugs!
(defun count-words-region (beginning end)
"Print number of words in the region.
Words are defined as at least one word-constituent character followed
by at least one character that is not a word-constituent. The buffer's
syntax table determines which characters these are."
(interactive "r")
(message "Counting words in region ... ")
;;; 1. Set up appropriate conditions.
(save-excursion
(goto-char beginning)
(let ((count 0))
;;; 2. Run the while loop.
(while (< (point) end)
(re-search-forward "\\w+\\W*")
(setq count (1+ count)))
;;; 3. Send a message to the user.
(cond ((zerop count)
(message "The region does NOT have any words."))
((= 1 count) (message "The region has 1 word."))
(t (message "The region has %d words." count))))))
If you wish, you can also install this keybinding by evaluating it,
too:
(global-set-key "\C-c=" 'count-words-region)
To conduct the first test, set mark and point to the beginning and
end of the following line and then type `C-c =' (or `M-x
count-words-region' if you have not bound `C-c ='):
one two three
Emacs will tell you, correctly, that the region has three words.
Repeat the test, but place mark at the beginning of the line and
place point just *before* the word `one'. Again type the command `C-c
=' (or `M-x count-words-region'). Emacs should tell you that the
region has no words, since it is composed only of the whitespace at the
beginning of the line. But instead Emacs tells you that the region has
one word!
For the third test, copy the sample line to the end of the
`*scratch*' buffer and then type several spaces at the end of the line.
Place mark right after the word `three' and point at the end of line.
(The end of the line will be the end of the buffer.) Type `C-c =' (or
`M-x count-words-region') as you did before. Again, Emacs should tell
you that the region has no words, since it is composed only of the
whitespace at the end of the line. Instead, Emacs displays an error
message saying `Search failed'.
The two bugs stem from the same problem.
Consider the first manifestation of the bug, in which the command
tells you that the whitespace at the beginning of the line contains one
word. What happens is this: The `M-x count-words-region' command moves
point to the beginning of the region. The `while' tests whether the
value of point is smaller than the value of `end', which it is.
Consequently, the regular expression search looks for and finds the
first word. It leaves point after the word. `count' is set to one.
The `while' loop repeats; but this time the value of point is larger
than the value of `end', the loop is exited; and the function displays
a message saying the number of words in the region is one. In brief,
the regular expression search looks for and finds the word even though
it is outside the marked region.
In the second manifestation of the bug, the region is whitespace at
the end of the buffer. Emacs says `Search failed'. What happens is
that the true-or-false-test in the `while' loop tests true, so the
search expression is executed. But since there are no more words in
the buffer, the search fails.
In both manifestations of the bug, the search extends or attempts to
extend outside of the region.
The solution is to limit the search to the region--this is a fairly
simple action, but as you may have come to expect, it is not quite as
simple as you might think.
As we have seen, the `re-search-forward' function takes a search
pattern as its first argument. But in addition to this first,
mandatory argument, it accepts three optional arguments. The optional
second argument bounds the search. The optional third argument, if
`t', causes the function to return `nil' rather than signal an error if
the search fails. The optional fourth argument is a repeat count. (In
Emacs, you can get a function's documentation by typing `C-h f', the
name of the function, and then <RET>.)
In the `count-words-region' definition, the value of the end of the
region is held by the variable `end' which is passed as an argument to
the function. Thus, we can add `end' as an argument to the regular
expression search expression:
(re-search-forward "\\w+\\W*" end)
However, if you make only this change to the `count-words-region'
definition and then test the new version of the definition on a stretch
of whitespace, you will receive an error message saying `Search failed'.
What happens is this: the search is limited to the region, and fails
as you expect because there are no word-constituent characters in the
region. Since it fails, we receive an error message. But we do not
want to receive an error message in this case; we want to receive the
message that "The region does NOT have any words."
The solution to this problem is to provide `re-search-forward' with
a third argument of `t', which causes the function to return `nil'
rather than signal an error if the search fails.
However, if you make this change and try it, you will see the message
"Counting words in region ... " and ... you will keep on seeing that
message ..., until you type `C-g' (`keyboard-quit').
Here is what happens: the search is limited to the region, as before,
and it fails because there are no word-constituent characters in the
region, as expected. Consequently, the `re-search-forward' expression
returns `nil'. It does nothing else. In particular, it does not move
point, which it does as a side effect if it finds the search target.
After the `re-search-forward' expression returns `nil', the next
expression in the `while' loop is evaluated. This expression
increments the count. Then the loop repeats. The true-or-false-test
tests true because the value of point is still less than the value of
end, since the `re-search-forward' expression did not move point. ...
and the cycle repeats ...
The `count-words-region' definition requires yet another
modification, to cause the true-or-false-test of the `while' loop to
test false if the search fails. Put another way, there are two
conditions that must be satisfied in the true-or-false-test before the
word count variable is incremented: point must still be within the
region and the search expression must have found a word to count.
Since both the first condition and the second condition must be true
together, the two expressions, the region test and the search
expression, can be joined with an `and' function and embedded in the
`while' loop as the true-or-false-test, like this:
(and (< (point) end) (re-search-forward "\\w+\\W*" end t))
(*Note forward-paragraph::, for information about `and'.)
The `re-search-forward' expression returns `t' if the search
succeeds and as a side effect moves point. Consequently, as words are
found, point is moved through the region. When the search expression
fails to find another word, or when point reaches the end of the
region, the true-or-false-test tests false, the `while' loop exists,
and the `count-words-region' function displays one or other of its
messages.
After incorporating these final changes, the `count-words-region'
works without bugs (or at least, without bugs that I have found!).
Here is what it looks like:
;;; Final version: `while'
(defun count-words-region (beginning end)
"Print number of words in the region."
(interactive "r")
(message "Counting words in region ... ")
;;; 1. Set up appropriate conditions.
(save-excursion
(let ((count 0))
(goto-char beginning)
;;; 2. Run the while loop.
(while (and (< (point) end)
(re-search-forward "\\w+\\W*" end t))
(setq count (1+ count)))
;;; 3. Send a message to the user.
(cond ((zerop count)
(message
"The region does NOT have any words."))
((= 1 count)
(message
"The region has 1 word."))
(t
(message
"The region has %d words." count))))))
File: emacs-lisp-intro.info, Node: recursive-count-words, Next: Counting Exercise, Prev: count-words-region, Up: Counting Words
Count Words Recursively
=======================
You can write the function for counting words recursively as well as
with a `while' loop. Let's see how this is done.
First, we need to recognize that the `count-words-region' function
has three jobs: it sets up the appropriate conditions for counting to
occur; it counts the words in the region; and it sends a message to the
user telling how many words there are.
If we write a single recursive function to do everything, we will
receive a message for every recursive call. If the region contains 13
words, we will receive thirteen messages, one right after the other.
We don't want this! Instead, we must write two functions to do the
job, one of which (the recursive function) will be used inside of the
other. One function will set up the conditions and display the
message; the other will return the word count.
Let us start with the function that causes the message to be
displayed. We can continue to call this `count-words-region'.
This is the function that the user will call. It will be
interactive. Indeed, it will be similar to our previous versions of
this function, except that it will call `recursive-count-words' to
determine how many words are in the region.
We can readily construct a template for this function, based on our
previous versions:
;; Recursive version; uses regular expression search
(defun count-words-region (beginning end)
"DOCUMENTATION..."
(INTERACTIVE-EXPRESSION...)
;;; 1. Set up appropriate conditions.
(EXPLANATORY MESSAGE)
(SET-UP FUNCTIONS...
;;; 2. Count the words.
RECURSIVE CALL
;;; 3. Send a message to the user.
MESSAGE PROVIDING WORD COUNT))
The definition looks straightforward, except that somehow, the count
returned by the recursive call must be passed to the message displaying
the word count. A little thought suggests that this can be done by
making use of a `let' expression: we can bind a variable in the varlist
of a `let' expression to the number of words in the region, as returned
by the recursive call; and then the `cond' expression, using binding,
can display the value to the user.
Often, one thinks of the binding within a `let' expression as
somehow secondary to the `primary' work of a function. But in this
case, what you might consider the `primary' job of the function,
counting words, is done within the `let' expression.
Using `let', the function definition looks like this:
(defun count-words-region (beginning end)
"Print number of words in the region."
(interactive "r")
;;; 1. Set up appropriate conditions.
(message "Counting words in region ... ")
(save-excursion
(goto-char beginning)
;;; 2. Count the words.
(let ((count (recursive-count-words end)))
;;; 3. Send a message to the user.
(cond ((zerop count)
(message
"The region does NOT have any words."))
((= 1 count)
(message
"The region has 1 word."))
(t
(message
"The region has %d words." count))))))
Next, we need to write the recursive counting function.
A recursive function has at least three parts: the `do-again-test',
the `next-step-expression', and the recursive call.
The do-again-test determines whether the function will or will not be
called again. Since we are counting words in a region and can use a
function that moves point forward for every word, the do-again-test can
check whether point is still within the region. The do-again-test
should find the value of point and determine whether point is before,
at, or after the value of the end of the region. We can use the
`point' function to locate point. Clearly, we must pass the value of
the end of the region to the recursive counting function as an argument.
In addition, the do-again-test should also test whether the search
finds a word. If it does not, the function should not call itself
again.
The next-step-expression changes a value so that when the recursive
function is supposed to stop calling itself, it stops. More precisely,
the next-step-expression changes a value so that at the right time, the
do-again-test stops the recursive function from calling itself again.
In this case, the next-step-expression can be the expression that moves
point forward word by word.
The third part of a recursive function is the recursive call.
Somewhere, also, we also need a part that does the `work' of the
function, a part that does the counting. A vital part!
But already, we have an outline of the recursive counting function:
(defun recursive-count-words (region-end)
"DOCUMENTATION..."
DO-AGAIN-TEST
NEXT-STEP-EXPRESSION
RECURSIVE CALL)
Now we need to fill in the slots. Let's start with the simplest
cases first: if point is at or beyond the end of the region, there
cannot be any words in the region, so the function should return zero.
Likewise, if the search fails, there are no words to count, so the
function should return zero.
On the other hand, if point is within the region and the search
succeeds, the function should call itself again.
Thus, the do-again-test should look like this:
(and (< (point) region-end)
(re-search-forward "\\w+\\W*" region-end t))
Note that the search expression is part of the do-again-test--the
function returns `t' if its search succeeds and `nil' if it fails.
(*Note The Whitespace Bug in `count-words-region': Whitespace Bug, for
an explanation of how `re-search-forward' works.)
The do-again-test is the true-or-false test of an `if' clause.
Clearly, if the do-again-test succeeds, the then-part of the `if'
clause should call the function again; but if it fails, the else-part
should return zero since either point is outside the region or the
search failed because there were no words to find.
But before considering the recursive call, we need to consider the
next-step-expression. What is it? Interestingly, it is the search
part of the do-again-test.
In addition to returning `t' or `nil' for the do-again-test,
`re-search-forward' moves point forward as a side effect of a
successful search. This is the action that changes the value of point
so that the recursive function stops calling itself when point
completes its movement through the region. Consequently, the
`re-search-forward' expression is the next-step-expression.
In outline, then, the body of the `recursive-count-words' function
looks like this:
(if DO-AGAIN-TEST-AND-NEXT-STEP-COMBINED
;; then
RECURSIVE-CALL-RETURNING-COUNT
;; else
RETURN-ZERO)
How to incorporate the mechanism that counts?
If you are not used to writing recursive functions, a question like
this can be troublesome. But it can and should be approached
systematically.
We know that the counting mechanism should be associated in some way
with the recursive call. Indeed, since the next-step-expression moves
point forward by one word, and since a recursive call is made for each
word, the counting mechanism must be an expression that adds one to the
value returned by a call to `recursive-count-words'.
Consider several cases:
* If there are two words in the region, the function should return a
value resulting from adding one to the value returned when it
counts the first word, plus the number returned when it counts the
remaining words in the region, which in this case is one.
* If there is one word in the region, the function should return a
value resulting from adding one to the value returned when it
counts that word, plus the number returned when it counts the
remaining words in the region, which in this case is zero.
* If there are no words in the region, the function should return
zero.
From the sketch we can see that the else-part of the `if' returns
zero for the case of no words. This means that the then-part of the
`if' must return a value resulting from adding one to the value
returned from a count of the remaining words.
The expression will look like this, where `1+' is a function that
adds one to its argument.
(1+ (recursive-count-words region-end))
The whole `recursive-count-words' function will then look like this:
(defun recursive-count-words (region-end)
"DOCUMENTATION..."
;;; 1. do-again-test
(if (and (< (point) region-end)
(re-search-forward "\\w+\\W*" region-end t))
;;; 2. then-part: the recursive call
(1+ (recursive-count-words region-end))
;;; 3. else-part
0))
Let's examine how this works:
If there are no words in the region, the else part of the `if'
expression is evaluated and consequently the function returns zero.
If there is one word in the region, the value of point is less than
the value of `region-end' and the search succeeds. In this case, the
true-or-false-test of the `if' expression tests true, and the then-part
of the `if' expression is evaluated. The counting expression is
evaluated. This expression returns a value (which will be the value
returned by the whole function) that is the sum of one added to the
value returned by a recursive call.
Meanwhile, the next-step-expression has caused point to jump over the
first (and in this case only) word in the region. This means that when
`(recursive-count-words region-end)' is evaluated a second time, as a
result of the recursive call, the value of point will be equal to or
greater than the value of region end. So this time,
`recursive-count-words' will return zero. The zero will be added to
one, and the original evaluation of `recursive-count-words' will return
one plus zero, which is one, which is the correct amount.
Clearly, if there are two words in the region, the first call to
`recursive-count-words' returns one added to the value returned by
calling `recursive-count-words' on a region containing the remaining
word--that is, it adds one to one, producing two, which is the correct
amount.
Similarly, if there are three words in the region, the first call to
`recursive-count-words' returns one added to the value returned by
calling `recursive-count-words' on a region containing the remaining
two words--and so on and so on.
With full documentation the two functions look like this:
The recursive function:
(defun recursive-count-words (region-end)
"Number of words between point and REGION-END."
;;; 1. do-again-test
(if (and (< (point) region-end)
(re-search-forward "\\w+\\W*" region-end t))
;;; 2. then-part: the recursive call
(1+ (recursive-count-words region-end))
;;; 3. else-part
0))
The wrapper:
;;; Recursive version
(defun count-words-region (beginning end)
"Print number of words in the region.
Words are defined as at least one word-constituent
character followed by at least one character that is
not a word-constituent. The buffer's syntax table
determines which characters these are."
(interactive "r")
(message "Counting words in region ... ")
(save-excursion
(goto-char beginning)
(let ((count (recursive-count-words end)))
(cond ((zerop count)
(message
"The region does NOT have any words."))
((= 1 count)
(message "The region has 1 word."))
(t
(message
"The region has %d words." count))))))
File: emacs-lisp-intro.info, Node: Counting Exercise, Prev: recursive-count-words, Up: Counting Words
Exercise: Counting Punctuation
==============================
Using a `while' loop, write a function to count the number of
punctuation marks in a region--period, comma, semicolon, colon,
exclamation mark, question mark. Do the same using recursion.
File: emacs-lisp-intro.info, Node: Words in a defun, Next: Readying a Graph, Prev: Counting Words, Up: Top
Counting Words in a `defun'
***************************
Our next project is to count the number of words in a function
definition. Clearly, this can be done using some variant of
`count-word-region'. *Note Counting Words: Repetition and Regexps:
Counting Words. If we are just going to count the words in one
definition, it is easy enough to mark the definition with the `C-M-h'
(`mark-defun') command, and then call `count-word-region'.
However, I am more ambitious: I want to count the words and symbols
in every definition in the Emacs sources and then print a graph that
shows how many functions there are of each length: how many contain 40
to 49 words or symbols, how many contain 50 to 59 words or symbols, and
so on. I have often been curious how long a typical function is, and
this will tell.
* Menu:
* Divide and Conquer:: Split a daunting project into parts.
* Words and Symbols:: What to count?
* Syntax:: What constitutes a word or symbol?
* count-words-in-defun:: Very like `count-words'.
* Several defuns:: Counting several defuns in a file.
* Find a File:: Do you want to look at a file?
* lengths-list-file:: A list of the lengths of many definitions.
* Several files:: Counting in definitions in different files.
* Several files recursively:: Recursively counting in different files.
* Prepare the data:: Prepare the data for display in a graph.
File: emacs-lisp-intro.info, Node: Divide and Conquer, Next: Words and Symbols, Prev: Words in a defun, Up: Words in a defun
Divide and Conquer
==================
Described in one phrase, the histogram project is daunting; but
divided into numerous small steps, each of which we can take one at a
time, the project becomes less fearsome. Let us consider what the
steps must be:
* First, write a function to count the words in one definition. This
includes the problem of handling symbols as well as words.
* Second, write a function to list the numbers of words in each
function in a file. This function can use the
`count-words-in-defun' function.
* Third, write a function to list the numbers of words in each
function in each of several files. This entails automatically
finding the various files, switching to them, and counting the
words in the definitions within them.
* Fourth, write a function to convert the list of numbers that we
created in step three to a form that will be suitable for printing
as a graph.
* Fifth, write a function to print the results as a graph.
This is quite a project! But if we take each step slowly, it will
not be difficult.
File: emacs-lisp-intro.info, Node: Words and Symbols, Next: Syntax, Prev: Divide and Conquer, Up: Words in a defun
What to Count?
==============
When we first start thinking about how to count the words in a
function definition, the first question is (or ought to be) what are we
going to count? When we speak of `words' with respect to a Lisp
function definition, we are actually speaking, in large part, of
`symbols'. For example, the following `multiply-by-seven' function
contains the five symbols `defun', `multiply-by-seven', `number', `*',
and `7'. In addition, in the documentation string, it contains the
four words `Multiply', `NUMBER', `by', and `seven'. The symbol
`number' is repeated, so the definition contains a total of ten words
and symbols.
(defun multiply-by-seven (number)
"Multiply NUMBER by seven."
(* 7 number))
However, if we mark the `multiply-by-seven' definition with `C-M-h'
(`mark-defun'), and then call `count-words-region' on it, we will find
that `count-words-region' claims the definition has eleven words, not
ten! Something is wrong!
The problem is twofold: `count-words-region' does not count the `*'
as a word, and it counts the single symbol, `multiply-by-seven', as
containing three words. The hyphens are treated as if they were
interword spaces rather than intraword connectors: `multiply-by-seven'
is counted as if it were written `multiply by seven'.
The cause of this confusion is the regular expression search within
the `count-words-region' definition that moves point forward word by
word. In the canonical version of `count-words-region', the regexp is:
"\\w+\\W*"
This regular expression is a pattern defining one or more word
constituent characters possibly followed by one or more characters that
are not word constituents. What is meant by `word constituent
characters' brings us to the issue of syntax, which is worth a section
of its own.
File: emacs-lisp-intro.info, Node: Syntax, Next: count-words-in-defun, Prev: Words and Symbols, Up: Words in a defun
What Constitutes a Word or Symbol?
==================================
Emacs treats different characters as belonging to different "syntax
categories". For example, the regular expression, `\\w+', is a pattern
specifying one or more *word constituent* characters. Word constituent
characters are members of one syntax category. Other syntax categories
include the class of punctuation characters, such as the period and the
comma, and the class of whitespace characters, such as the blank space
and the tab character. (For more information, see *Note Syntax:
(emacs)Syntax, and, *Note Syntax Tables: (elisp)Syntax Tables.)
Syntax tables specify which characters belong to which categories.
Usually, a hyphen is not specified as a `word constituent character'.
Instead, it is specified as being in the `class of characters that are
part of symbol names but not words.' This means that the
`count-words-region' function treats it in the same way it treats an
interword white space, which is why `count-words-region' counts
`multiply-by-seven' as three words.
There are two ways to cause Emacs to count `multiply-by-seven' as
one symbol: modify the syntax table or modify the regular expression.
We could redefine a hyphen as a word constituent character by
modifying the syntax table that Emacs keeps for each mode. This action
would serve our purpose, except that a hyphen is merely the most common
character within symbols that is not typically a word constituent
character; there are others, too.
Alternatively, we can redefine the regular expression used in the
`count-words' definition so as to include symbols. This procedure has
the merit of clarity, but the task is a little tricky.
The first part is simple enough: the pattern must match "at least one
character that is a word or symbol constituent". Thus:
\\(\\w\\|\\s_\\)+
The `\\(' is the first part of the grouping construct that includes the
`\\w' and the `\\s_' as alternatives, separated by the `\\|'. The
`\\w' matches any word-constituent character and the `\\s_' matches any
character that is part of a symbol name but not a word-constituent
character. The `+' following the group indicates that the word or
symbol constituent characters must be matched at least once.
However, the second part of the regexp is more difficult to design.
What we want is to follow the first part with "optionally one or more
characters that are not constituents of a word or symbol". At first, I
thought I could define this with the following:
\\(\\W\\|\\S_\\)*"
The upper case `W' and `S' match characters that are *not* word or
symbol constituents. Unfortunately, this expression matches any
character that is either not a word constituent or not a symbol
constituent. This matches any character!
I then noticed that every word or symbol in my test region was
followed by white space (blank space, tab, or newline). So I tried
placing a pattern to match one or more blank spaces after the pattern
for one or more word or symbol constituents. This failed, too. Words
and symbols are often separated by whitespace, but in actual code
parentheses may follow symbols and punctuation may follow words. So
finally, I designed a pattern in which the word or symbol constituents
are followed optionally by characters that are not white space and then
followed optionally by white space.
Here is the full regular expression:
"\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*"
File: emacs-lisp-intro.info, Node: count-words-in-defun, Next: Several defuns, Prev: Syntax, Up: Words in a defun
The `count-words-in-defun' Function
===================================
We have seen that there are several ways to write a
`count-word-region' function. To write a `count-words-in-defun', we
need merely adapt one of these versions.
The version that uses a `while' loop is easy to understand, so I am
going to adapt that. Because `count-words-in-defun' will be part of a
more complex program, it need not be interactive and it need not
display a message but just return the count. These considerations
simplify the definition a little.
On the other hand, `count-words-in-defun' will be used within a
buffer that contains function definitions. Consequently, it is
reasonable to ask that the function determine whether it is called when
point is within a function definition, and if it is, to return the
count for that definition. This adds complexity to the definition, but
saves us from needing to pass arguments to the function.
These considerations lead us to prepare the following template:
(defun count-words-in-defun ()
"DOCUMENTATION..."
(SET UP...
(WHILE LOOP...)
RETURN COUNT)
As usual, our job is to fill in the slots.
First, the set up.
We are presuming that this function will be called within a buffer
containing function definitions. Point will either be within a
function definition or not. For `count-words-in-defun' to work, point
must move to the beginning of the definition, a counter must start at
zero, and the counting loop must stop when point reaches the end of the
definition.
The `beginning-of-defun' function searches backwards for an opening
delimiter such as a `(' at the beginning of a line, and moves point to
that position, or else to the limit of the search. In practice, this
means that `beginning-of-defun' moves point to the beginning of an
enclosing or preceding function definition, or else to the beginning of
the buffer. We can use `beginning-of-defun' to place point where we
wish to start.
The `while' loop requires a counter to keep track of the words or
symbols being counted. A `let' expression can be used to create a
local variable for this purpose, and bind it to an initial value of
zero.
The `end-of-defun' function works like `beginning-of-defun' except
that it moves point to the end of the definition. `end-of-defun' can
be used as part of an expression that determines the position of the
end of the definition.
The set up for `count-words-in-defun' takes shape rapidly: first we
move point to the beginning of the definition, then we create a local
variable to hold the count, and, finally, we record the position of the
end of the definition so the `while' loop will know when to stop
looping.
The code looks like this:
(beginning-of-defun)
(let ((count 0)
(end (save-excursion (end-of-defun) (point))))
The code is simple. The only slight complication is likely to concern
`end': it is bound to the position of the end of the definition by a
`save-excursion' expression that returns the value of point after
`end-of-defun' temporarily moves it to the end of the definition.
The second part of the `count-words-in-defun', after the set up, is
the `while' loop.
The loop must contain an expression that jumps point forward word by
word and symbol by symbol, and another expression that counts the
jumps. The true-or-false-test for the `while' loop should test true so
long as point should jump forward, and false when point is at the end
of the definition. We have already redefined the regular expression
for this (*note Syntax::.), so the loop is straightforward:
(while (and (< (point) end)
(re-search-forward
"\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*" end t)
(setq count (1+ count)))
The third part of the function definition returns the count of words
and symbols. This part is the last expression within the body of the
`let' expression, and can be, very simply, the local variable `count',
which when evaluated returns the count.
Put together, the `count-words-in-defun' definition looks like this:
(defun count-words-in-defun ()
"Return the number of words and symbols in a defun."
(beginning-of-defun)
(let ((count 0)
(end (save-excursion (end-of-defun) (point))))
(while
(and (< (point) end)
(re-search-forward
"\\(\\w\\|\\s_\\)+[^ \t\n]*[ \t\n]*"
end t))
(setq count (1+ count)))
count))
How to test this? The function is not interactive, but it is easy to
put a wrapper around the function to make it interactive; we can use
almost the same code as for the recursive version of
`count-words-region':
;;; Interactive version.
(defun count-words-defun ()
"Number of words and symbols in a function definition."
(interactive)
(message
"Counting words and symbols in function definition ... ")
(let ((count (count-words-in-defun)))
(cond
((zerop count)
(message
"The definition does NOT have any words or symbols."))
((= 1 count)
(message
"The definition has 1 word or symbol."))
(t
(message
"The definition has %d words or symbols." count)))))
Let's re-use `C-c =' as a convenient keybinding:
(global-set-key "\C-c=" 'count-words-defun)
Now we can try out `count-words-defun': install both
`count-words-in-defun' and `count-words-defun', and set the keybinding,
and then place the cursor within the following definition:
(defun multiply-by-seven (number)
"Multiply NUMBER by seven."
(* 7 number))
=> 10
Success! The definition has 10 words and symbols.
The next problem is to count the numbers of words and symbols in
several definitions within a single file.
File: emacs-lisp-intro.info, Node: Several defuns, Next: Find a File, Prev: count-words-in-defun, Up: Words in a defun
Count Several `defuns' Within a File
====================================
A file such as `simple.el' may have 80 or more function definitions
within it. Our long term goal is to collect statistics on many files,
but as a first step, our immediate goal is to collect statistics on one
file.
The information will be a series of numbers, each number being the
length of a function definition. We can store the numbers in a list.
We know that we will want to incorporate the information regarding
one file with information about many other files; this means that the
function for counting definition lengths within one file need only
return the lengths' list. It need not and should not display any
messages.
The word count commands contain one expression to jump point forward
word by word and another expression to count the jumps. The
definitions' lengths' function can be designed to work the same way,
with one expression to jump point forward definition by definition and
another expression to construct the lengths' list.
This statement of the problem makes it elementary to write the
function definition. Clearly, we will start the count at the beginning
of the file, so the first command will be `(goto-char (point-min))'.
Next, we start the `while' loop; and the true-or-false test of the loop
can be a regular expression search for the next function definition--so
long as the search succeeds, point is moved forward and then the body
of the loop is evaluated. The body needs an expression that constructs
the lengths' list. `cons', the list construction command, can be used
to create the list. That is almost all there is to it.
Here is what this fragment of code looks like:
(goto-char (point-min))
(while (re-search-forward "^(defun" nil t)
(setq lengths-list
(cons (count-words-in-defun) lengths-list)))
What we have left out is the mechanism for finding the file that
contains the function definitions.
In previous examples, we either used this, the Info file, or we
switched back and forth to some other buffer, such as the `*scratch*'
buffer.
Finding a file is a new process that we have not yet discussed.
File: emacs-lisp-intro.info, Node: Find a File, Next: lengths-list-file, Prev: Several defuns, Up: Words in a defun
Find a File
===========
To find a file in Emacs, you use the `C-x C-f' (`find-file')
command. This command is almost, but not quite right for the lengths
problem.
Let's look at the source for `find-file' (you can use the `find-tag'
command to find the source of a function):
(defun find-file (filename)
"Edit file FILENAME.
Switch to a buffer visiting file FILENAME,
creating one if none already exists."
(interactive "FFind file: ")
(switch-to-buffer (find-file-noselect filename)))
The definition possesses short but complete documentation and an
interactive specification that prompts you for a file name when you use
the command interactively. The body of the definition contains two
functions, `find-file-noselect' and `switch-to-buffer'.
According to its documentation as shown by `C-h f' (the
`describe-function' command), the `find-file-noselect' function reads
the named file into a buffer and returns the buffer. However, the
buffer is not selected. Emacs does not switch its attention (or yours
if you are using `find-file-noselect') to the named buffer. That is
what `switch-to-buffer' does: it switches the buffer to which Emacs
attention is directed; and it switches the buffer displayed in the
window to the new buffer. We have discussed buffer switching
elsewhere. (*Note Switching Buffers::.)
In this histogram project, we do not need to display each file on the
screen as the program determines the length of each definition within
it. Instead of employing `switch-to-buffer', we can work with
`set-buffer', which redirects the attention of the computer program to
a different buffer but does not redisplay it on the screen. So instead
of calling on `find-file' to do the job, we must write our own
expression.
The task is easy: use `find-file-noselect' and `set-buffer'.
File: emacs-lisp-intro.info, Node: lengths-list-file, Next: Several files, Prev: Find a File, Up: Words in a defun
`lengths-list-file' in Detail
=============================
The core of the `lengths-list-file' function is a `while' loop
containing a function to move point forward `defun by defun' and a
function to count the number of words and symbols in each defun. This
core must be surrounded by functions that do various other tasks,
including finding the file, and ensuring that point starts out at the
beginning of the file. The function definition looks like this:
(defun lengths-list-file (filename)
"Return list of definitions' lengths within FILE.
The returned list is a list of numbers.
Each number is the number of words or
symbols in one function definition."
(message "Working on `%s' ... " filename)
(save-excursion
(let ((buffer (find-file-noselect filename))
(lengths-list))
(set-buffer buffer)
(setq buffer-read-only t)
(widen)
(goto-char (point-min))
(while (re-search-forward "^(defun" nil t)
(setq lengths-list
(cons (count-words-in-defun) lengths-list)))
(kill-buffer buffer)
lengths-list)))
The function is passed one argument, the name of the file on which it
will work. It has four lines of documentation, but no interactive
specification. Since people worry that a computer is broken if they
don't see anything going on, the first line of the body is a message.
The next line contains a `save-excursion' that returns Emacs
attention to the current buffer when the function completes. This is
useful in case you embed this function in another function that
presumes point is restored to the original buffer.
In the varlist of the `let' expression, Emacs finds the file and
binds the local variable `buffer' to the buffer containing the file.
At the same time, Emacs creates `lengths-list' as a local variable.
Next, Emacs switches its attention to the buffer.
In the following line, Emacs makes the buffer read-only. Ideally,
this line is not necessary. None of the functions for counting words
and symbols in a function definition should change the buffer.
Besides, the buffer is not going to be saved, even if it were changed.
This line is entirely the consequence of great, perhaps excessive,
caution. The reason for the caution is that this function and those it
calls work on the sources for Emacs and it is very inconvenient if they
are inadvertently modified. It goes without saying that I did not
realize a need for this line until an experiment went awry and started
to modify my Emacs source files ...
Next comes a call to widen the buffer if it is narrowed. This
function is usually not needed--Emacs creates a fresh buffer if none
already exists; but if a buffer visiting the file already exists Emacs
returns that one. In this case, the buffer may be narrowed and must be
widened. If we wanted to be fully `user-friendly', we would arrange to
save the restriction and the location of point, but we won't.
The `(goto-char (point-min))' expression moves point to the
beginning of the buffer.
Then comes a `while' loop in which the `work' of the function is
carried out. In the loop, Emacs determines the length of each
definition and constructs a lengths' list containing the information.
Emacs kills the buffer after working through it. This is to save
space inside of Emacs. My version of Emacs 19 contains over 300 source
files of interest. Another function will apply `lengths-list-file' to
each of them. If Emacs visits all of them and deletes none, my
computer may run out of virtual memory.
Finally, the last expression within the `let' expression is the
`lengths-list' variable; its value is returned as the value of the
whole function.
You can try this function by installing it in the usual fashion.
Then place your cursor after the following expression and type `C-x
C-e' (`eval-last-sexp').
(lengths-list-file "../lisp/debug.el")
(You may need to change the pathname of the file; the one here works if
this Info file and the Emacs sources are in neighboring places, such as
`/usr/local/emacs/info' and `/usr/local/emacs/lisp'. To change the
expression, copy it to the `*scratch*' buffer and edit it. Then
evaluate it.)
On my version of Emacs, the lengths' list for `debug.el' takes seven
seconds to produce and looks like this:
(75 41 80 62 20 45 44 68 45 12 34 235)
Note that the length of the last definition in the file is first in
the list.
File: emacs-lisp-intro.info, Node: Several files, Next: Several files recursively, Prev: lengths-list-file, Up: Words in a defun
Count Words in `defuns' in Different Files
==========================================
In the previous section, we created a function that returns a list of
the lengths of each definition in a file. Now, we want to define a
function to return a master list of the lengths of the definitions in a
list of files.
Working on each of a list of files is a repetitious act, so we can
use either a `while' loop or recursion.
The design using a `while' loop is routine. The argument passed the
function is a list of files. As we saw earlier (*note Loop
Example::.), you can write a `while' loop so that the body of the loop
is evaluated if such a list contains elements, but to exit the loop if
the list is empty. For this design to work, the body of the loop must
contain an expression that shortens the list each time the body is
evaluated, so that eventually the list is empty. The usual technique
is to set the value of the list to the value of the CDR of the list
each time the body is evaluated.
The template looks like this:
(while TEST-WHETHER-LIST-IS-EMPTY
BODY...
SET-LIST-TO-CDR-OF-LIST)
Also, we remember that a `while' loop returns `nil' (the result of
evaluating the true-or-false-test), not the result of any evaluation
within its body. (The evaluations within the body of the loop are done
for their side effects.) However, the expression that sets the
lengths' list is part of the body--and that is the value that we want
returned by the function as a whole. To do this, we enclose the
`while' loop within a `let' expression, and arrange that the last
element of the `let' expression contains the value of the lengths'
list. (*Note Loop Example with an Incrementing Counter: Incrementing
Example.)
These considerations lead us directly to the function itself:
;;; Use `while' loop.
(defun lengths-list-many-files (list-of-files)
"Return list of lengths of defuns in LIST-OF-FILES."
(let (lengths-list)
;;; true-or-false-test
(while list-of-files
(setq lengths-list
(append
lengths-list
;;; Generate a lengths' list.
(lengths-list-file
(expand-file-name (car list-of-files)))))
;;; Make files' list shorter.
(setq list-of-files (cdr list-of-files)))
;;; Return final value of lengths' list.
lengths-list))
`expand-file-name' is a built-in function that converts a file name
to its absolute, long, path name form. Thus,
debug.el
becomes
/usr/local/emacs/lisp/debug.el
The only other new element of this function definition is the as yet
unstudied function `append', which merits a short section for itself.
* Menu:
* append:: Attaching one list to another.
File: emacs-lisp-intro.info, Node: append, Prev: Several files, Up: Several files
The `append' Function
---------------------
The `append' function attaches one list to another. Thus,
(append '(1 2 3 4) '(5 6 7 8))
produces the list
(1 2 3 4 5 6 7 8)
This is exactly how we want to attach two lengths' lists produced by
`lengths-list-file' to each other. The results contrast with `cons',
(cons '(1 2 3 4) '(5 6 7 8))
which constructs a new list in which the first argument to `cons'
becomes the first element of the new list:
((1 2 3 4) 5 6 7 8)