Regex is a backtracking regular expression matcher that incorporates a traditional Nondeterministic Finite Automation (NFA) engine such as used by Perl, Python, Emacs, and Tcl. This distinguishes it from faster, but more limited, pure regular expression Deterministic Finite Automation (DFA) engines such as those found in awk, egrep, or lex. This also distinguishes it from standardized, but slower, POSIX NFAs.
The three types of regular expression engines
Why does Regex implement a traditional NFA matcher? Here is an overview of the three types of engines.
DFA engines run in linear time because they do not require backtracking (and thus they will never test the same character twice). They can also guarantee matching the longest possible string. However, since a DFA contains only finite state, it cannot match a pattern with backreferences, and because it does not construct an explicit expansion, it cannot capture subexpressions.
Traditional NFA engines run "greedy match" backtracking algorithms, testing all possible expansions of a regular expression in a specific order, and accepting the first match. Because a traditional NFA constructs a specific expansion of the regular expression for a successful match, it can capture subexpression matches and matching backreferences. However, because a traditional NFA backtracks, it can visit exactly the same state multiple times if the state is arrived at over different paths. As a result, it can run exponentially slowly in the worst caseBecause a traditional NFA accepts the first match it finds, it can also leave other (possibly longer) matches undiscovered.
POSIX NFA engines are like traditional NFA engines, except that they are "patient" — they continue to backtrack until they can guarantee that they have found the longest match possible. As a result, a POSIX NFA engine is slower than a traditional NFA engine, and when using a POSIX NFA it is not possible to favor a shorter match over a longer one by changing the order of the backtracking search.
Traditional NFA engines are favored by programmers because they are more expressive than either DFA or POSIX NFA expressions. Although in the general worst-case they can have exponential running time, by using patterns that reduce ambiguities and limit backtracking, they can be "steered" to find matches in linear or polynomial time.
Regex provides the best steering
Taking the benefits of a traditional NFA engines to heart, Regex includes the most complete set of constructs to allow programmers to steer the backtracking engine. These constructs can be used to find matches faster or to favor specific expansions over others
Other Regex features include: