Colorer library regular expressions syntax description ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. It is. All work of Colorer library is based on using regular expressions (regexp). They let you create universal syntax rules of highlighting, but regexp is an independent instrument, which could be used in different applications. Here you'll find description of my regexps, they limitations, secrets and other rocks of their using. Here I'll assume, that you little know what is regexps - and why and how they used. At first I'll describe regexp's syntax, and after it try to help you to understand them. At all you can read some other documents - perl regexps (man perlre) or something other... My regexps has one main difference against the perl - all operators in my regexps are written after pattern. It means, that perl-like regexp /foo(?=bar)/ in my variant will be the next: /foo(bar)?=/ May be less looked, but more logical. All standard regexp's operators are like in perl. You can ask me: why we need all this extended operators? And I'll answer: try to edit Hrcs an hour-two-three - and you will understand me. 2. So, let start: All regexps must be in slashes /.../ After the end slash could be a parameters: i - Don't match case. x - Ignore real spaces (for comfort). In regexp each symbol linearly compared with target string. Everything, that not it the next symbols range means as a simple characters. 2.1. Special metacharacters: ^ Match the beginning of line $ Match the end of line . Match any character [ ] Match characters in set [^ ] Match characters not in set Here all the operators are disabled, but you can use other metacharacters, and range operator: a-z means all chars from first to second (a - z) \# Next symbol after slash (except a-z and 1-9) \b Start of word \B End of word \xNN NN - ASCII char (hex) \n 0x10 (lf) \r 0x13 (cr) \t 0x09 (tab) \s tab/space \S Non-space \w Word symbol (chars, digits, _) \W Non-word symbol \d Digit \D Non-Digit \u Uppercase symbol \l Lowercase symbol 2.2. Extended metacharacters: \c Non-word before \m Change start of regexp \M Change end of regexp \N Link inside of regexp to one of its brackets. N - needed brackets pair. This operator works only with non-operator symbols in a bracket. \yN Link to the external regexp (in End to the Start param). N - needed brackets pair. 2.3. Operators. Here we are. They can't be used without everything. Each operator must apply to the appropriate character, metacharacter, or block of their combination (brackets). ( ) Group and remember characters to form one pattern. | Match previous or next pattern. * Match previous pattern 0 or more times. + Match previous pattern 1 or more times. ? Match previous pattern 0 or 1 times. {n} Repeat n times. {n,} Repeat n or more times. {n,m} Repeat from n to m times. If you add ? after operator, it becomes nongreedy. For example * operator becomes nongreedy after substitution *? Greedy operators tries to take as much in string, as they can. NonGreedy takes by minimum. 2.4. Extended operators. ?#N Look-behind. N - symbols number. ?~N Inverted Look-behind. ?= Look-ahead. ?! Inverted Look-ahead. Is it horrible? yes? 3. And, at the end some examples. /foobar/ will match "foobar", "foobar barfoo" / FOO bar /ix will match "foobar" "FOOBAR" "foobar and two other foos" /(foo)?bar/ will match "foobar", "bar" /^foobar$/ will match _only_ with "foobar" /([\d\.])+/ Any number /((foo)|(bar))+/ will match "foofoofoobarfoobar", "bar"