Understanding Indexes

This section describes the types of indexes that Index Server creates:

Words and properties extracted from a document first appear in a word list, then move to a shadow index, and finally move to the master index. This organization is optimized for query responsiveness and performance. It also ensures optimal resource usage. Even though there are multiple indexes internally, these details are completely hidden from the user. The user sees only a list of documents that satisfy the query that was posted.

Word Lists

Word lists are small, in-memory indexes. Each word list contains data for a small number of documents. As soon as a document is filtered, its data is stored in a word list. Creation of a word list is very quick and does not require updating any on-disk data. It is used as a temporary staging area during indexing.

There are several registry parameters that control word list behavior. All the keys are under the registry path

HKEY_LOCAL_MACHINE
\SYSTEM
 \CurrentControlSet
  \Control
   \ContentIndex

The following list shows the registry parameters, with links to explanations:

Once the number of word lists exceeds the MaxWordLists parameter, the word lists are merged into a shadow index. This merge process is called the shadow merge. Although the data in word lists is compressed to some extent, the compression is not very high because word lists are temporary structures. Because word lists are in-memory structures, documents in a word list must be refiltered whenever the Content Index service is restarted. The refiltering is automatically detected and performed by the Index Server engine.

Persistent Index

When data for an index is stored on disk, it is called a persistent index. Unlike word lists, which are in-memory indexes, a persistent index survives shutdowns and restarts. Persistent-index data is stored in a highly compressed format. There are two types of persistent indexes:

Shadow Index

A shadow index is a persistent index created by merging word lists and sometimes other shadow indexes into a single index. There can be multiple shadow indexes in the catalog.

Master Index

A master index is a persistent index that contains the indexed data for a large number of documents. This is usually the largest persistent data structure. In an ideal state, this is the only index present, because all the indexed data is stored in the master index and there are no shadow indexes or word lists. The data is highly compressed.

A master index is created by master merge, which merges all the shadow indexes and the current master index (if any) into a new master index. After the master merge, all the source indexes are deleted and only the new master index will be left. In this state, queries are resolved most efficiently.

The total number of persistent indexes (shadow indexes and master index) in a catalog cannot exceed 255.


© 1997 by Microsoft Corporation. All rights reserved.