InfoMagic Source Code 1993 July

home *** CD-ROM | disk | FTP | other *** search

/ InfoMagic Source Code 1993 July / THE_SOURCE_CODE_CD_ROM.iso / gnu / elisp / elisp-24 < prev next >

Wrap

GNU Info File | 1993-05-31 | 47.0 KB | 1,113 lines

This is Info file elisp, produced by Makeinfo-1.55 from the input file elisp.texi. This is edition 2.0 of the GNU Emacs Lisp Reference Manual, for Emacs Version 19. Published by the Free Software Foundation, 675 Massachusetts Avenue, Cambridge, MA 02139 USA Copyright (C) 1990, 1991, 1992, 1993 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. File: elisp, Node: Case Changes, Next: Text Properties, Prev: Columns, Up: Text Case Changes ============ The case change commands described here work on text in the current buffer. *Note Character Case::, for case conversion commands that work on strings and characters. *Note Case Table::, for how to customize which characters are upper or lower case and how to convert them. - Command: capitalize-region START END This function capitalizes all words in the region defined by START and END. To capitalize means to convert each word's first character to upper case and convert the rest of each word to lower case. The function returns `nil'. If one end of the region is in the middle of a word, the part of the word within the region is treated as an entire word. When `capitalize-region' is called interactively, START and END are point and the mark, with the smallest first. ---------- Buffer: foo ---------- This is the contents of the 5th foo. ---------- Buffer: foo ---------- (capitalize-region 1 44) => nil ---------- Buffer: foo ---------- This Is The Contents Of The 5th Foo. ---------- Buffer: foo ---------- - Command: downcase-region START END This function converts all of the letters in the region defined by START and END to lower case. The function returns `nil'. When `downcase-region' is called interactively, START and END are point and the mark, with the smallest first. - Command: upcase-region START END This function converts all of the letters in the region defined by START and END to upper case. The function returns `nil'. When `upcase-region' is called interactively, START and END are point and the mark, with the smallest first. - Command: capitalize-word COUNT This function capitalizes COUNT words after point, moving point over as it does. To capitalize means to convert each word's first character to upper case and convert the rest of each word to lower case. If COUNT is negative, the function capitalizes the -COUNT previous words but does not move point. The value is `nil'. If point is in the middle of a word, the part of word the before point (if moving forward) or after point (if operating backward) is ignored. The rest is treated as an entire word. When `capitalize-word' is called interactively, COUNT is set to the numeric prefix argument. - Command: downcase-word COUNT This function converts the COUNT words after point to all lower case, moving point over as it does. If COUNT is negative, it converts the -COUNT previous words but does not move point. The value is `nil'. When `downcase-word' is called interactively, COUNT is set to the numeric prefix argument. - Command: upcase-word COUNT This function converts the COUNT words after point to all upper case, moving point over as it does. If COUNT is negative, it converts the -COUNT previous words but does not move point. The value is `nil'. When `upcase-word' is called interactively, COUNT is set to the numeric prefix argument. File: elisp, Node: Text Properties, Next: Substitution, Prev: Case Changes, Up: Text Text Properties =============== Each character position in a buffer or a string can have a "text property list", much like the property list of a symbol. The properties belong to a particular character at a particular place, such as, the letter `T' at the beginning of this sentence or the first `o' in `foo'--if the same character occurs in two different places, the two occurrences generally have different properties. Each property has a name, which is usually a symbol, and an associated value, which can be any Lisp object--just as for properties of symbols (*note Property Lists::.). If a character has a `category' property, we call it the "category" of the character. It should be a symbol. The properties of the symbol serve as defaults for the properties of the character. Copying text between strings and buffers preserves the properties along with the characters; this includes such diverse functions as `substring', `insert', and `buffer-substring'. * Menu: * Examining Properties:: Looking at the properties of one character. * Changing Properties:: Setting the properties of a range of text. * Property Search:: Searching for where a property changes value. * Special Properties:: Particular properties with special meanings. * Not Intervals:: Why text properties do not use Lisp-visible text intervals. File: elisp, Node: Examining Properties, Next: Changing Properties, Up: Text Properties Examining Text Properties ------------------------- The simplest way to examine text properties is to ask for the value of a particular property of a particular character. For that, use `get-text-property'. Use `text-properties-at' to get the entire property list of a character. *Note Property Search::, for functions to examine the properties of a number of characters at once. These functions handle both strings and buffers. Keep in mind that positions in a string start from 0, whereas positions in a buffer start from 1. - Function: get-text-property POS PROP &optional OBJECT This function returns the value of the PROP property of the character after position POS in OBJECT (a buffer or string). The argument OBJECT is optional and defaults to the current buffer. If there is no PROP property strictly speaking, but the character has a category which is a symbol, then `get-text-property' returns the PROP property of that symbol. - Function: text-properties-at POSITION &optional OBJECT This function returns the list of properties held by the character at POSITION in the string or buffer OBJECT. If OBJECT is `nil', it defaults to the current buffer. File: elisp, Node: Changing Properties, Next: Property Search, Prev: Examining Properties, Up: Text Properties Changing Text Properties ------------------------ The primitives for changing properties apply to a specified range of text. The function `set-text-properties' (see end of section) sets the entire property list of the text in that range; more often, it is useful to add, change, or delete just certain properties specified by name. Since text properties are considered part of the buffer's contents, and can affect how the buffer looks on the screen, any change in the text properties is considered a buffer modification. Buffer text property changes are undoable. - Function: add-text-properties START END PROPS &optional OBJECT This function modifies the text properties for the text between START and END in the string or buffer OBJECT. If OBJECT is `nil', it defaults to the current buffer. The argument PROPS specifies which properties to change. It should have the form of a property list (*note Property Lists::.): a list whose elements include the property names followed alternately by the corresponding values. The return value is `t' if the function actually changed some property's value; `nil' otherwise (if PROPS is `nil' or its values agree with those in the text). For example, here is how to set the `comment' property to `t' for a range of text: (add-text-properties (region-beginning) (region-end) (list 'comment t)) - Function: put-text-property START END PROP VALUE &optional OBJECT This function sets the PROP property to VALUE for the text between START and END in the string or buffer OBJECT. If OBJECT is `nil', it defaults to the current buffer. - Function: remove-text-properties START END PROPS &optional OBJECT This function deletes specified text properties from the text between START and END in the string or buffer OBJECT. If OBJECT is `nil', it defaults to the current buffer. The argument PROPS specifies which properties to delete. It should have the form of a property list (*note Property Lists::.): a list whose elements include the property names followed by the corresponding values. The property names mentioned in PROPS are the ones deleted from the text. The values associated in PROPS with these names do not matter. The return value is `t' if the function actually changed some property's value; `nil' otherwise (if PROPS is `nil' or if none of the text had any of those properties). - Function: set-text-properties START END PROPS &optional OBJECT This function completely replaces the text property list for the text between START and END in the string or buffer OBJECT. If OBJECT is `nil', it defaults to the current buffer. The argument PROPS is the new property list. It should have the form of a list whose elements include the property names followed by the corresponding values. After `set-text-properties' returns, all the characters in the specified range have identical properties. If PROPS is `nil', the effect is to get rid of all properties from the specified range of text. Here's an example: (set-text-properties (region-beginning) (region-end) nil) File: elisp, Node: Property Search, Next: Special Properties, Prev: Changing Properties, Up: Text Properties Property Search Functions ------------------------- In typical use of text properties, most of the time several or many consecutive characters have the same value for a property. Rather than writing your programs to examine characters one by one, it is much faster to process chunks of text that have the same property value. Here are functions you can use to do this. In all cases, OBJECT defaults to the current buffer. - Function: next-property-change POS &optional OBJECT The function scans the text forward from position POS in the string or buffer OBJECT till it finds a change in some text property, then returns the position of the change. In other words, it returns the position of the first character beyond POS whose properties are not identical to those of the character just after POS. The value is `nil' if the properties remain unchanged all the way to the end of OBJECT. If the value is non-`nil', it is a position greater than POS, never equal. Here is an example of how to scan the buffer by chunks of text within which all properties are constant: (while (not (eobp)) (let ((plist (text-properties-at (point))) (next-change (or (next-property-change (point) (current-buffer)) (point-max)))) PROCESS TEXT FROM POINT TO NEXT-CHANGE... (goto-char next-change))) - Function: next-single-property-change POS PROP &optional OBJECT The function scans the text forward from position POS in the string or buffer OBJECT till it finds a change in the PROP property, then returns the position of the change. In other words, it returns the position of the first character beyond POS whose PROP property differs from that of the character just after POS. The value is `nil' if the properties remain unchanged all the way to the end of OBJECT. If the value is non-`nil', it is a position greater than POS, never equal. - Function: previous-property-change POS &optional OBJECT This is like `next-property-change', but scans back from POS instead of forward. If the value is non-`nil', it is a position always strictly less than POS. - Function: previous-single-property-change POS PROP &optional OBJECT This is like `next-property-change', but scans back from POS instead of forward. If the value is non-`nil', it is a position always strictly less than POS. File: elisp, Node: Special Properties, Next: Not Intervals, Prev: Property Search, Up: Text Properties Special Properties ------------------ If a character has a `category' property, we call it the "category" of the character. It should be a symbol. The properties of the symbol serve as defaults for the properties of the character. You can use the property `face' to control the font and color of text. *Note Faces::, for more information. This feature is temporary; in the future, we may replace it with other ways of specifying how to display text. The property `mouse-face' is used instead of `face' when the mouse is on or near the character. For this purpose, "near" means that all text between the character and where the mouse is have the same `mouse-face' property value. You can specify a different keymap for a portion of the text by means of a `local-map' property. The property's value, for the character after point, replaces the buffer's local map. *Note Active Keymaps::. If a character has the property `read-only', then modifying that character is not allowed. Any command that would do so gets an error. If a character has the property `modification-hooks', then its value should be a list of functions; modifying that character calls all of those functions. Each function receives two arguments: the beginning and end of the part of the buffer being modified. Note that if a particular modification hook function appears on several characters being modified by a single primitive, you can't predict how many times the function will be called. Insertion of text does not, strictly speaking, change any existing character, so there is a special rule for insertion. It compares the `read-only' properties of the two surrounding characters; if they are non-`nil' and `eq' to each other, then the insertion is not allowed. Assuming insertion is allowed, it then gets the `modification-hooks' properties of those characters and calls all the functions in each of them. (If a function appears on both characters, it may be called once or twice.) See also *Note Change Hooks::, for other hooks that are called when you change text in a buffer. The special properties `point-entered' and `point-left' record hook functions that report motion of point. Each time point moves, Emacs compares these two property values: * the `point-left' property of the character after the old location, and * the `point-entered' property of the character after the new location. If these two values differ, each of them is called (if not `nil') with two arguments: the old value of point, and the new one. The same comparison is made for the characters before the old and new locations. The result may be to execute two `point-left' functions (which may be the same function) and/or two `point-entered' functions (which may be the same function). The `point-left' functions are always called before the `point-entered' functions. A primitive function may examine characters at various positions without moving point to those positions. Only an actual change in the value of point runs these hook functions. File: elisp, Node: Not Intervals, Prev: Special Properties, Up: Text Properties Why Text Properties are not Intervals ------------------------------------- Some editors that support adding attributes to text in the buffer do so by letting the user specify "intervals" within the text, and adding the properties to the intervals. Those editors permit the user or the programmer to determine where individual intervals start and end. We deliberately provided a different sort of interface in Emacs Lisp to avoid certain paradoxical behavior associated with text modification. If the actual subdivision into intervals is meaningful, that means you can distinguish between a buffer that is just one interval with a certain property, and a buffer containing the same text subdivided into two intervals, both of which have that property. Suppose you take the buffer with just one interval and kill part of the text. The text remaining in the buffer is one interval, and the copy in the kill ring (and the undo list) becomes a separate interval. Then if you undo the kill, you get two intervals with the same properties. Thus, the distinction can't be preserved when editing happens. But suppose we "fix" this problem by coalescing the two intervals when the text is inserted. That works fine if the buffer originally was a single interval. But if it was two intervals, and the killed text equals one of them, then undoing the kill yields just one interval. Again, the distinction can't be preserved. Insertion of text at the border between intervals also raises questions that have no satisfactory answer. However, it is easy to arrange for editing to behave consistently for questions of the form, "What are the properties of this character?" So we have decided these are the only questions that make sense; we have not implemented asking questions about where intervals start or end. For practical purposes, the property search functions serve in place of explicit interval boundaries. You can think of them as finding the boundaries of intervals, assuming that intervals are always coalesced whenever possible. *Note Property Search::. Emacs also provides explicit intervals as a presentation feature; see *Note Overlays::. File: elisp, Node: Substitution, Next: Underlining, Prev: Text Properties, Up: Text Substituting for a Character Code ================================= The following functions replace characters within a specified region based on their character codes. - Function: subst-char-in-region START END OLD-CHAR NEW-CHAR &optional NOUNDO This function replaces all occurrences of the character OLD-CHAR with the character NEW-CHAR in the region of the current buffer defined by START and END. If NOUNDO is non-`nil', then `subst-char-in-region' does not record the change for undo and does not mark the buffer as modified. This feature is useful for changes which are not considered significant, such as when Outline mode changes visible lines to invisible lines and vice versa. `subst-char-in-region' does not move point and returns `nil'. ---------- Buffer: foo ---------- This is the contents of the buffer before. ---------- Buffer: foo ---------- (subst-char-in-region 1 20 ?i ?X) => nil ---------- Buffer: foo ---------- ThXs Xs the contents of the buffer before. ---------- Buffer: foo ---------- - Function: translate-region START END TABLE This function applies a translation table to the characters in the buffer between positions START and END. The translation table TABLE is a string; `(aref TABLE OCHAR)' gives the translated character corresponding to OCHAR. If the length of TABLE is less than 256, any characters with codes larger than the length of TABLE are not altered by the translation. The return value of `translate-region' is the number of characters which were actually changed by the translation. This does not count characters which were mapped into themselves in the translation table. This function is available in Emacs versions 19 and later. File: elisp, Node: Underlining, Next: Registers, Prev: Substitution, Up: Text Underlining =========== The underlining commands are somewhat obsolete. The `underline-region' function actually inserts `_^H' before each appropriate character in the region. This command provides a minimal text formatting feature that might work on your printer; however, we recommend instead that you use more powerful text formatting facilities, such as Texinfo. - Command: underline-region START END This function underlines all nonblank characters in the region defined by START and END. That is, an underscore character and a backspace character are inserted just before each non-whitespace character in the region. The backspace characters are intended to cause overstriking, but in Emacs they display as either `\010' or `^H', depending on the setting of `ctl-arrow'. There is no way to see the effect of the overstriking within Emacs. The value is `nil'. - Command: ununderline-region START END This function removes all underlining (overstruck underscores) in the region defined by START and END. The value is `nil'. File: elisp, Node: Registers, Next: Change Hooks, Prev: Underlining, Up: Text Registers ========= A register is a sort of variable used in Emacs editing that can hold a marker, a string, a rectangle, a window configuration (of one frame), or a frame configuration (of all frames). Each register is named by a single character. All characters, including control and meta characters (but with the exception of `C-g'), can be used to name registers. Thus, there are 255 possible registers. A register is designated in Emacs Lisp by a character which is its name. The functions in this section return unpredictable values unless otherwise stated. - Variable: register-alist This variable is an alist of elements of the form `(NAME . cONTENTS)'. Normally, there is one element for each Emacs register that has been used. The object NAME is a character (an integer) identifying the register. The object CONTENTS is a string, marker, or list representing the register contents. A string represents text stored in the register. A marker represents a position. A list represents a rectangle; its elements are strings, one per line of the rectangle. - Command: view-register REG This command displays what is contained in register REG. - Function: get-register REG This function returns the contents of the register REG, or `nil' if it has no contents. - Function: set-register REG VALUE This function sets the contents of register REG to VALUE. A register can be set to any value, but the other register functions expect only certain data types. The return value is VALUE. - Command: point-to-register REG This command stores both the current location of point and the current buffer in register REG as a marker. - Command: jump-to-register REG - Command: register-to-point REG This command restores the status recorded in register REG. If REG contains a marker, it moves point to the position stored in the marker. Since both the buffer and the location within the buffer are stored by the `point-to-register' function, this command can switch you to another buffer. If REG contains a window configuration or a frame configuration. `jump-to-register' restores that configuration. - Command: insert-register REG &optional BEFOREP This command inserts contents of register REG into the current buffer. Normally, this command puts point before the inserted text, and the mark after it. However, if the optional second argument BEFOREP is non-`nil', it puts the mark before and point after. You can pass a non-`nil' second argument BEFOREP to this function interactively by supplying any prefix argument. If the register contains a rectangle, then the rectangle is inserted with its upper left corner at point. This means that text is inserted in the current line and underneath it on successive lines. If the register contains something other than saved text (a string) or a rectangle (a list), currently useless things happen. This may be changed in the future. - Command: copy-to-register REG START END &optional DELETE-FLAG This command copies the region from START to END into register REG. If DELETE-FLAG is non-`nil', it deletes the region from the buffer after copying it into the register. - Command: prepend-to-register REG START END &optional DELETE-FLAG This command prepends the region from START to END into register REG. If DELETE-FLAG is non-`nil', it deletes the region from the buffer after copying it to the register. - Command: append-to-register REG START END &optional DELETE-FLAG This command appends the region from START to END to the text already in register REG. If DELETE-FLAG is non-`nil', it deletes the region from the buffer after copying it to the register. - Command: copy-rectangle-to-register REG START END &optional DELETE-FLAG This command copies a rectangular region from START to END into register REG. If DELETE-FLAG is non-`nil', it deletes the region from the buffer after copying it to the register. - Command: window-configuration-to-register REG This function stores the window configuration of the selected frame in register REG. - Command: frame-configuration-to-register REG This function stores the current frame configuration in register REG. File: elisp, Node: Change Hooks, Prev: Registers, Up: Text Change Hooks ============ These hook variables let you arrange to take notice of all changes in all buffers (or in a particular buffer, if you make them buffer-local). See also *Note Special Properties::, for how to detect changes to specific parts of the text. - Variable: before-change-function If this variable is non-`nil', then it should be a function; the function is called before any buffer modification. Its arguments are the beginning and end of the region that is going to change, represented as integers. The buffer that's about to change is always the current buffer. - Variable: after-change-function If this variable is non-`nil', then it should be a function; the function is called after any buffer modification. It receives three arguments: the beginning and end of the region just changed, and the length of the text that existed before the change. (To get the current length, subtract the region beginning from the region end.) All three arguments are integers. The buffer that's about to change is always the current buffer. Both of these variables are temporarily bound to `nil' during the time that either of these hooks is running. This means that if one of these functions changes the buffer, that change won't run these functions. If you do want the hook function to be run recursively, write your hook functions to bind these variables back to their usual values. - Variable: first-change-hook This variable is a normal hook; its hook functions are run using `run-hooks' whenever a buffer is changed that was previously in the unmodified state. The variables described in this section are meaningful only starting with Emacs version 19. File: elisp, Node: Searching and Matching, Next: Syntax Tables, Prev: Text, Up: Top Searching and Matching ********************** GNU Emacs provides two ways to search through a buffer for specified text: exact string searches and regular expression searches. After a regular expression search, you can identify the text matched by parts of the regular expression by examining the "match data". * Menu: * String Search:: Search for an exact match. * Regular Expressions:: Describing classes of strings. * Regexp Search:: Searching for a match for a regexp. * Replacement:: Internals of `query-replace'. * Match Data:: Finding out which part of the text matched various parts of a regexp, after regexp search. * Standard Regexps:: Useful regexps for finding sentences, pages,... * Searching and Case:: Case-independent or case-significant searching. File: elisp, Node: String Search, Next: Regular Expressions, Up: Searching and Matching Searching for Strings ===================== These are the primitive functions for searching through the text in a buffer. They are meant for use in programs, but you may call them interactively. If you do so, they prompt for the search string; LIMIT and NOERROR are set to `nil', and REPEAT is set to 1. - Command: search-forward STRING &optional LIMIT NOERROR REPEAT This function searches forward from point for an exact match for STRING. If successful, it sets point to the end of the occurrence found, and returns the new value of point. If no match is found, the value and side effects depend on NOERROR (see below). In the following example, point is positioned at the beginning of the line. Then `(search-forward "fox")' is evaluated in the minibuffer and point is left after the last letter of `fox': ---------- Buffer: foo ---------- -!-The quick brown fox jumped over the lazy dog. ---------- Buffer: foo ---------- (search-forward "fox") => t ---------- Buffer: foo ---------- The quick brown fox-!- jumped over the lazy dog. ---------- Buffer: foo ---------- The argument LIMIT specifies the upper bound to the search. (It must be a position in the current buffer.) No match extending after that position is accepted. If LIMIT is omitted or `nil', it defaults to the end of the accessible portion of the buffer. What happens when the search fails depends on the value of NOERROR. If NOERROR is `nil', a `search-failed' error is signaled. If NOERROR is `t', `search-forward' returns `nil' and does nothing. If NOERROR is neither `nil' nor `t', then `search-forward' moves point to the upper bound and returns `nil'. (It would be more consistent now to return the new position of point in that case, but some programs may depend on a value of `nil'.) If REPEAT is non-`nil', then the search is repeated that many times. Point is positioned at the end of the last match. - Command: search-backward STRING &optional LIMIT NOERROR REPEAT This function searches backward from point for STRING. It is just like `search-forward' except that it searches backwards and leaves point at the beginning of the match. - Command: word-search-forward STRING &optional LIMIT NOERROR REPEAT This function searches forward from point for a "word" match for STRING. If it finds a match, it sets point to the end of the match found, and returns the new value of point. A word search differs from a simple string search in that a word search *requires* that the words it searches for are present as entire words (searching for the word `ball' does not match the word `balls'), and punctuation and spacing are ignored (searching for `ball boy' does match `ball. Boy!'). In this example, point is first placed at the beginning of the buffer; the search leaves it between the `y' and the `!'. ---------- Buffer: foo ---------- -!-He said "Please! Find the ball boy!" ---------- Buffer: foo ---------- (word-search-forward "Please find the ball, boy.") => t ---------- Buffer: foo ---------- He said "Please! Find the ball boy-!-!" ---------- Buffer: foo ---------- If LIMIT is non-`nil' (it must be a position in the current buffer), then it is the upper bound to the search. The match found must not extend after that position. If NOERROR is `t', then `word-search-forward' returns `nil' when a search fails, instead of signaling an error. If NOERROR is neither `nil' nor `t', then `word-search-forward' moves point to LIMIT (or the end of the buffer) and returns `nil'. If REPEAT is non-`nil', then the search is repeated that many times. Point is positioned at the end of the last match. - Command: word-search-backward STRING &optional LIMIT NOERROR REPEAT This function searches backward from point for a word match to STRING. This function is just like `word-search-forward' except that it searches backward and normally leaves point at the beginning of the match. File: elisp, Node: Regular Expressions, Next: Regexp Search, Prev: String Search, Up: Searching and Matching Regular Expressions =================== A "regular expression" ("regexp", for short) is a pattern that denotes a (possibly infinite) set of strings. Searching for matches for a regexp is a very powerful operation. This section explains how to write regexps; the following section says how to search for them. * Menu: * Syntax of Regexps:: Rules for writing regular expressions. * Regexp Example:: Illustrates regular expression syntax. File: elisp, Node: Syntax of Regexps, Next: Regexp Example, Up: Regular Expressions Syntax of Regular Expressions ----------------------------- Regular expressions have a syntax in which a few characters are special constructs and the rest are "ordinary". An ordinary character is a simple regular expression which matches that character and nothing else. The special characters are `$', `^', `.', `*', `+', `?', `[', `]' and `\'; no new special characters will be defined in the future. Any other character appearing in a regular expression is ordinary, unless a `\' precedes it. For example, `f' is not a special character, so it is ordinary, and therefore `f' is a regular expression that matches the string `f' and no other string. (It does *not* match the string `ff'.) Likewise, `o' is a regular expression that matches only `o'. Any two regular expressions A and B can be concatenated. The result is a regular expression which matches a string if A matches some amount of the beginning of that string and B matches the rest of the string. As a simple example, we can concatenate the regular expressions `f' and `o' to get the regular expression `fo', which matches only the string `fo'. Still trivial. To do something more powerful, you need to use one of the special characters. Here is a list of them: `. (Period)' is a special character that matches any single character except a newline. Using concatenation, we can make regular expressions like `a.b' which matches any three-character string which begins with `a' and ends with `b'. `*' is not a construct by itself; it is a suffix that means the preceding regular expression is to be repeated as many times as possible. In `fo*', the `*' applies to the `o', so `fo*' matches one `f' followed by any number of `o's. The case of zero `o's is allowed: `fo*' does match `f'. `*' always applies to the *smallest* possible preceding expression. Thus, `fo*' has a repeating `o', not a repeating `fo'. The matcher processes a `*' construct by matching, immediately, as many repetitions as can be found. Then it continues with the rest of the pattern. If that fails, backtracking occurs, discarding some of the matches of the `*'-modified construct in case that makes it possible to match the rest of the pattern. For example, matching `ca*ar' against the string `caaar', the `a*' first tries to match all three `a's; but the rest of the pattern is `ar' and there is only `r' left to match, so this try fails. The next alternative is for `a*' to match only two `a's. With this choice, the rest of the regexp matches successfully. `+' is a suffix character similar to `*' except that it must match the preceding expression at least once. So, for example, `ca+r' will match the strings `car' and `caaaar' but not the string `cr', whereas `ca*r' would match all three strings. `?' is a suffix character similar to `*' except that it can match the preceding expression either once or not at all. For example, `ca?r' will match `car' or `cr'; nothing else. `[ ... ]' `[' begins a "character set", which is terminated by a `]'. In the simplest case, the characters between the two form the set. Thus, `[ad]' matches either one `a' or one `d', and `[ad]*' matches any string composed of just `a's and `d's (including the empty string), from which it follows that `c[ad]*r' matches `cr', `car', `cdr', `caddaar', etc. Character ranges can also be included in a character set, by writing two characters with a `-' between them. Thus, `[a-z]' matches any lower case letter. Ranges may be intermixed freely with individual characters, as in `[a-z$%.]', which matches any lower case letter or `$', `%' or a period. Note that the usual special characters are not special any more inside a character set. A completely different set of special characters exists inside character sets: `]', `-' and `^'. To include a `]' in a character set, make it the first character. For example, `[]a]' matches `]' or `a'. To include a `-', write `-' as the first or last character in the range. To include `^', make it other than the first character in the set. `[^ ... ]' `[^' begins a "complement character set", which matches any character except the ones specified. Thus, `[^a-z0-9A-Z]' matches all characters *except* letters and digits. `^' is not special in a character set unless it is the first character. The character following the `^' is treated as if it were first (thus, `-' and `]' are not special there). Note that a complement character set can match a newline, unless newline is mentioned as one of the characters not to match. `^' is a special character that matches the empty string, but only at the beginning of a line in the text being matched. Otherwise it fails to match anything. Thus, `^foo' matches a `foo' which occurs at the beginning of a line. When matching a string, `^' matches at the beginning of the string or after a newline character `\n'. `$' is similar to `^' but matches only at the end of a line. Thus, `x+$' matches a string of one `x' or more at the end of a line. When matching a string, `$' matches at the end of the string or before a newline character `\n'. `\' has two functions: it quotes the special characters (including `\'), and it introduces additional special constructs. Because `\' quotes special characters, `\$' is a regular expression which matches only `$', and `\[' is a regular expression which matches only `[', and so on. Note that `\' also has special meaning in the read syntax of Lisp strings (*note String Type::.), and must be quoted with `\'. For example, the regular expression that matches the `\' character is `\\'. To write a Lisp string that contains the characters `\\', Lisp syntax requires you to quote each `\' with another `\'. Therefore, the read syntax for a regular expression matching `\' is `"\\\\"'. *Please note:* for historical compatibility, special characters are treated as ordinary ones if they are in contexts where their special meanings make no sense. For example, `*foo' treats `*' as ordinary since there is no preceding expression on which the `*' can act. It is poor practice to depend on this behavior; better to quote the special character anyway, regardless of where it appears. For the most part, `\' followed by any character matches only that character. However, there are several exceptions: characters which, when preceded by `\', are special constructs. Such characters are always ordinary when encountered on their own. Here is a table of `\' constructs: `\|' specifies an alternative. Two regular expressions A and B with `\|' in between form an expression that matches anything that either A or B matches. Thus, `foo\|bar' matches either `foo' or `bar' but no other string. `\|' applies to the largest possible surrounding expressions. Only a surrounding `$ ... $' grouping can limit the grouping power of `\|'. Full backtracking capability exists to handle multiple uses of `\|'. `$ ... $' is a grouping construct that serves three purposes: 1. To enclose a set of `\|' alternatives for other operations. Thus, `$foo\|bar$x' matches either `foox' or `barx'. 2. To enclose a complicated expression for a suffix character such as `*' to operate on. Thus, `ba$na$*' matches `bananana', etc., with any (zero or more) number of `na' strings. 3. To record a matched substring for future reference. This last application is not a consequence of the idea of a parenthetical grouping; it is a separate feature which happens to be assigned as a second meaning to the same `$ ... $' construct because there is no conflict in practice between the two meanings. Here is an explanation of this feature: `\DIGIT' matches the same text which is matched the DIGITth time by a previous `$ ... $' construct. In other words, after the end of a `$ ... $' construct. the matcher remembers the beginning and end of the text matched by that construct. Then, later on in the regular expression, you can use `\' followed by DIGIT to mean "match the same text matched the DIGITth time by the `$ ... $' construct." The strings matching the first nine `$ ... $' constructs appearing in a regular expression are assigned numbers 1 through 9 in the order that the open parentheses appear in the regular expression. So you can use `\1' through `\9' to refer to the text matched by the corresponding `$ ... $' constructs. For example, `$.*$\1' matches any newline-free string that is composed of two identical halves. The `$.*$' matches the first half, which may be anything, but the `\1' that follows must match the same exact text. `\`' matches the empty string, provided it is at the beginning of the buffer. `\'' matches the empty string, provided it is at the end of the buffer. `\=' matches the empty string, provided it is at point. `\b' matches the empty string, provided it is at the beginning or end of a word. Thus, `\bfoo\b' matches any occurrence of `foo' as a separate word. `\bballs?\b' matches `ball' or `balls' as a separate word. `\B' matches the empty string, provided it is *not* at the beginning or end of a word. `\<' matches the empty string, provided it is at the beginning of a word. `\>' matches the empty string, provided it is at the end of a word. `\w' matches any word-constituent character. The editor syntax table determines which characters these are. *Note Syntax Tables::. `\W' matches any character that is not a word-constituent. `\sCODE' matches any character whose syntax is CODE. Here CODE is a character which represents a syntax code: thus, `w' for word constituent, `-' for whitespace, `(' for open parenthesis, etc. *Note Syntax Tables::, for a list of the codes. `\SCODE' matches any character whose syntax is not CODE. Not every string is a valid regular expression. For example, any string with unbalanced square brackets is invalid, and so is a string that ends with a single `\'. If an invalid regular expression is passed to any of the search functions, an `invalid-regexp' error is signaled. - Function: regexp-quote STRING This function returns a regular expression string which matches exactly STRING and nothing else. This allows you to request an exact string match when calling a function that wants a regular expression. (regexp-quote "^The cat$") => "\\^The cat\\$" One use of `regexp-quote' is to combine an exact string match with context described as a regular expression. For example, this searches for the string which is the value of `string', surrounded by whitespace: (re-search-forward (concat "\\s " (regexp-quote string) "\\s ")) File: elisp, Node: Regexp Example, Prev: Syntax of Regexps, Up: Regular Expressions Complex Regexp Example ---------------------- Here is a complicated regexp, used by Emacs to recognize the end of a sentence together with any whitespace that follows. It is the value of the variable `sentence-end'. First, we show the regexp as a string in Lisp syntax to enable you to distinguish the spaces from the tab characters. The string constant begins and ends with a double-quote. `\"' stands for a double-quote as part of the string, `\\' for a backslash as part of the string, `\t' for a tab and `\n' for a newline. "[.?!][]\"')}]*\$$\\|\t\\| \$[ \t\n]*" In contrast, if you evaluate the variable `sentence-end', you will see the following: sentence-end => "[.?!][]\"')}]*\$$\\| \\| \$[ ]*" In this case, the tab and carriage return are the actual characters. This regular expression contains four parts in succession and can be deciphered as follows: `[.?!]' The first part of the pattern consists of three characters, a period, a question mark and an exclamation mark, within square brackets. The match must begin with one of these three characters. `[]\"')}]*' The second part of the pattern matches any closing braces and quotation marks, zero or more of them, that may follow the period, question mark or exclamation mark. The `\"' is Lisp syntax for a double-quote in a string. The `*' at the end indicates that the immediately preceding regular expression (a character set, in this case) may be repeated zero or more times. `\$$\\|\t\\| \$' The third part of the pattern matches the whitespace that follows the end of a sentence: the end of a line, or a tab, or two spaces. The double backslashes are needed to prevent Emacs from reading the parentheses and vertical bars as part of the search pattern; the parentheses are used to mark the group and the vertical bars are used to indicated that the patterns to either side of them are alternatives. The dollar sign is used to match the end of a line. The tab character is written using `\t' and the two spaces are written as themselves. `[ \t\n]*' Finally, the last part of the pattern indicates that the end of the line or the whitespace following the period, question mark or exclamation mark may, but need not, be followed by additional whitespace.