Sorting Entries

Entries in the raw index file are sorted primarily on their index keys and secondarily on their page numbers. Index keys are sorted first; within the same index key, page numbers are sorted numerically. Sort keys and numeric page numbers are used in the comparison, while actual keys and literal page fields are entered into the resulting index. In our design, a complete index key is an aggregate of one or more sort keys plus the same or a smaller number of actual keys. The comparison is based on sort keys, but if two aggregates have identical sort fields and page numbers, the actual keys can be used to distinguish their order.

Index keys can be categorized into the following groups: strings, numbers, and symbols. A string is a pattern whose leading character is a letter in the alphabet. A number is a pattern consisting of all digits. A symbol is a pattern beginning with a character not in the union of the English alphabet and arabic digits or starting with a digit but mixed with non-digits. Members of the same group should appear in sequence. Hence there are two issues concerning ordering: one deals with entries within a group; the other is the global precedence among the three groups in question. Details of sorting index keys can be found in Reference [18].

There are three basic types of numerals for page numbers: roman, alphabetic, and arabic. The sorting of arbitrary combinations of these three types of numerals (e.g., 112, iv, II-12, A.1.3, etc.) must be based on their numeric values and relative precedence. The attribute of page_precedence in Table 2, for instance, specifies the precedence. Again, details of sorting page numbers can be found in Reference [18].