Support for Multiple Languages

Many Web pages today are written in English, but others are not. Because IIS can serve documents in many languages, multilingual indexing and querying features are a standard feature of Index Server. The query system was built with localization in mind. It is completely modular and can dynamically load and unload language-specific utilities. These utilities include word breakers, stemmers, and normalizers. These linguistic components are available for several languages.

Index Server can index multilingual documents and switch between languages as required (for example, index an English paragraph, index a French paragraph, and switch back to English). All index information is stored as Unicode characters, and all queries are converted to Unicode before they are processed.

Index Server does not distinguish between language once the words have been entered into the index. It is possible to return documents written in a language different from the language posted in the query. In many cases this is appropriate. For example a query for Windows 95 will return a French document that contains the English phrase Windows 95. In some cases languages contain words known as homologues—words that are spelled the same in two or more languages but have very different meanings. Index Server does not distinguish these cases because Index Server does not perform any language translation.

Finally, you should know that mixing languages can cause unpredictable results. For example, if you set the multi-language form to German and query for English words, the results likely will not be the same if the identical query were posted with the language set to English. This is because Index Server is using the German linguistics modules to analyze the query field to determine which words and phrases to search for (that is, it is trying to perform German word breaking on English text). The German word breaker assumes German grammar when it breaks textual characters into words, so it often generates wrong word-break results when breaking non-German text.

This section contains:


© 1997 by Microsoft Corporation. All rights reserved.