NGWS SDK Documentation  

This is preliminary documentation and subject to change.
To comment on this topic, please send us email at ngwssdk@microsoft.com. Thanks!

Example: Scanning for HREFS

The following example searches an input string and prints out all the href="…" values and their locations in a string. It does this by constructing a compiled Regex object, and then using a Match object to iterate over all the matches in the string.

To understand the example, it is important to understand that the metacharacter \s matches any space characterand that \S matches any non-space.

void DumpHrefs(String inputString)
{
   Regex r;
   Match m;

   r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))", "i");
   for (m = r.Match(inputString); m.Success; m = m.NextMatch()) {
      Console.Write("Found href " + m.Group(1) + " at " +.Group(1).Index);
   }
}

Compiled Pattern

In the example, a Regex object was created before beginning the loop to search the string. The Regex object is where the compiled pattern is stored. Because parsing, optimizing, and compiling a regular expression takes time, we want to do this work outside the loop so that it is only done once.

Objects of class Regex are immutable; each one corresponds to a single pattern and is stateless. This allows a single Regex instance to be shared between functions or even between different threads.

Match Result Class

After a search is done, the results are stored in the Match class. This class provides access to all the substrings extracted by the search. It also remembers the string being searched and the regular expression being used, so it can be used to do another search, starting where the last one ended.

Explicitly Named Captures

In traditional regular expressions, capturing parentheses are automatically numbered sequentially. This leads to two problems: first, if a regular expression is modified by inserting or removing a set of parentheses, all the code that refers to the captures must be rewritten to reflect the new numbering. Second, because often different sets of parentheses are used to match two alternative expressions for the same concept, the result we are seeking is sometimes in one of two result buckets, and it usually requires some trickery to determine which one.

Regex supports the syntax (?<name>…) for capturing a match into a specified slot (the slot can be named using a string or an integer; integers can be recalled faster). That way, alternative matches for the same thing can all be directed to the same place. In case of a conflict, the last match dropped into a slot wins. (Actually, there is more: in case there are multiple matches for a single slot, either due to repeated quantifiers or aliasing, a complete list of all of them is available. See the Group.Captures collection for details.)