Searching Documents in Multiple Languages

The following test drive demonstrates a query form that queries in different languages and uses the stemming features of Index Server.

Note   Do this test drive only if you have installed Index Server with all of its supported languages.

This topic takes you on a test drive through querying in another language and then summarizes the results.

Querying in Another Language

The sample corpus contains some forms and scripts that are not installed by default during catalog server setup. To use these scripts you must first configure IIS.

To configure IIS for this test drive
  1. On the Taskbar, click Start, point to Programs, point to Microsoft Internet Server (Common), and click Internet Service Manager.
  2. Double-click the WWW Service for the local computer.
  3. On the WWW Service property sheet, select the Directories tab.
  4. Double-click the entry for your Corpus directory to display the properties.

    The entry should have an alias of /Corpus.

  5. At the bottom of the property sheet in a control group named Access Flags, click the Execute check box.
  6. The Read check box should already be selected. This setting tells IIS to execute scripts from within the Corpus subdirectory.

  7. Click OK to return to the WWW Service dialog box.
  8. Click OK.
  9. Close Internet Service Manager.

You have configured IIS to execute scripts from within the Corpus subdirectories. The steps so far have not involved Index Server. They have merely set up IIS to serve the proper query form.

To test your query form
  1. Open a browser and point it to http://server_name/corpus/scripts/query2.htm to open a page with the title Multiple Language Query Form.
  2. In the Enter your query below field, type in the phrase Exchange Client.
  3. Make sure that the drop-down list box on the right side of the form is set to English – United States.

  4. Click Execute Query.
  5. You should get eight results. This step ensures the form has been set up correctly.

  6. Click New Query to return to the query form.

Now you’re ready to query in another language.

To query in German

  1. Click Clear to clear the query form.
  2. In the drop-down list box on the right side of the form, click Deutsche.
  3. In the Enter your query below field, type gehen** (being sure to type the two asterisks as shown).
  4. Click Execute Query.

This query looks for all documents that contain the German word gehen, which means to go in English. The two asterisks instruct Index Server to stem the word. Stemming is a linguistic process that takes a given word and reduces it to its root linguistic form. For example, the English stem for swam is swim. After stemming is performed, Index Server inflects the stemmed form into all the grammatically correct variants. For English, stemming swam would generate the root form swim and all the other variants, such as swim, swimmer, swimmers, swam, swum, and so on. In this query, Index Server will stem gehen and inflect it to all its forms and post a query using the variants. Index Server knows to use German linguistics for stemming this word because you selected the German language in the drop-down list.

Executing the query may take some time because Index Server needs to load the German linguistics modules. Subsequent German queries will take much less time because the modules are already loaded.

Examining the Query Results

Index Server returns four documents. The first is a stock sample to demonstrate the query results more clearly. The stock sample contains only the text you see in the abstract. Note that it does not contain the word gehen anywhere in the text. It does, however, contain the word gegangen, which is the past-tense form of gehen. Index Server stemmed gehen and inflected it out to its linguistic forms, in this case, including gegangen.

Note also that the numeric values and time and date-stamps in the references have been formatted to German conventions (that is, using a period instead of a comma for thousands separators, and so on).

Index Server can be configured to use a default locale and language so that the language need not be specified by every query and query form. This form also allows the user to override any default locale and language settings for the purposes of the exercise. For more information, see Support for Multiple Languages.


© 1997 by Microsoft Corporation. All rights reserved.