Built-in Module regex

regex This module provides regular expression matching operations similar to those found in Emacs. It is always available.

By default the patterns are Emacs-style regular expressions; there is a way to change the syntax to match that of several well-known utilities.

This module is 8-bit clean: both patterns and strings may contain null bytes and characters whose high bit is set.

Please note: There is a little-known fact about Python string literals which means that you don't usually have to worry about doubling backslashes, even though they are used to escape special characters in string literals as well as in regular expressions. This is because Python doesn't remove backslashes from string literals if they are followed by an unrecognized escape character. However, if you want to include a literal backslash in a regular expression represented as a string literal, you have to quadruple it. E.g. to extract LaTeX section{ ...} headers from a document, you can use this pattern: 'section{(.*)}'.

The module defines these functions, and an exception:


\begin{funcdesc}{match}{pattern\, string}
Return how many characters at the beg...
...match the pattern (this is different from a
zero-length match!).
\end{funcdesc}


\begin{funcdesc}{search}{pattern\, string}
Return the first position in \var{st...
... pattern (this is different from a zero-length match
anywhere!).
\end{funcdesc}


\begin{funcdesc}{compile}{pattern\, translate}
Compile a regular expression pat...
...ion at a time needn't worry about compiling regular
expressions.)
\end{funcdesc}


\begin{funcdesc}{set_syntax}{flags}
Set the syntax to be used by future calls t...
...tax}; read the file \file{regex_syntax.py} for
more information.
\end{funcdesc}


\begin{funcdesc}{symcomp}{pattern\, translate}
This is like \code{compile}, but ...
...piled regular expression object, like this:
\code{p.group('id')}.
\end{funcdesc}


\begin{excdesc}{error}
Exception raised when a string passed to one of the func...
...t is
never an error if a string contains no match for a pattern.)
\end{excdesc}


\begin{datadesc}{casefold}
A string suitable to pass as \var{translate} argument...
... to map all upper case characters to their lowercase
equivalents.
\end{datadesc}

Compiled regular expression objects support these methods:


\begin{funcdesc}{match}{string\, pos}
Return how many characters at the beginni...
...line, not necessarily at the index where the search
is to start.
\end{funcdesc}


\begin{funcdesc}{search}{string\, pos}
Return the first position in \var{string...
...d parameter has the same meaning as for the
\code{match} method.
\end{funcdesc}


\begin{funcdesc}{group}{index\, index\, ...}
This method is only valid when the ...
...ments may also be strings
identifying groups by their group name.
\end{funcdesc}

Compiled regular expressions support these data attributes:


\begin{datadesc}{regs}
When the last call to the \code{match} or \code{search} m...
...ttern. When the last match or search failed, this is
\code{None}.
\end{datadesc}


\begin{datadesc}{last}
When the last call to the \code{match} or \code{search} m...
...ethod. When the
last match or search failed, this is \code{None}.
\end{datadesc}


\begin{datadesc}{translate}
This is the value of the \var{translate} argument to...
...as omitted in the \code{regex.compile}
call, this is \code{None}.
\end{datadesc}


\begin{datadesc}{givenpat}
The regular expression pattern as passed to \code{compile} or
\code{symcomp}.
\end{datadesc}


\begin{datadesc}{realpat}
The regular expression after stripping the group names...
... compiled with \code{symcomp}. Same as \code{givenpat}
otherwise.
\end{datadesc}


\begin{datadesc}{groupindex}
A dictionary giving the mapping from symbolic group...
... expressions compiled with \code{symcomp}.
\code{None} otherwise.
\end{datadesc}