Filtering

Microsoft Index Server filters documents by inserting data from the document files into content indexes. Content filters break documents into words (keys) and create word lists, which supply raw data for the index. Filtering is a three-step process:

  1. A filter DLL (dynamic-link library) extracts the text and properties out of a document.
  2. A word-breaker DLL parses the text and textual properties into words.
  3. Noise words (also known as stop words) are removed from the data extracted from the document, and the remaining words are stored in the index.

This section contains:


© 1997 by Microsoft Corporation. All rights reserved.