Standard authors' guides, such as Reference [4], recommend that index terms be marked on page proofs or on a separate set of galley proofs. The traditional method is to use 3×5 cards, appropriately called index cards. A page number is added to a card when a reference is encountered. Sorting is done by hand and the process is tedious and error-prone. Computers offer an opportunity to significantly reduce the amount of labor invested in this process while noticeably improving the quality of the resulting index.
This paper indicates how the indexing process can be automated in a way that is largely independent of a specific typesetting system and independent of the format being used. Specifically, we develop a framework for placing index commands in the document. In addition, the design of a general purpose index processor that transforms a raw index into an alphabetized version is described. These concepts have been implemented as part of an extensive authoring environment [5]. This environment includes a suite of Lisp programs for the index placing facility and a C program for the index processor. The resulting system has been successfully used in producing indexes for some books [6,7] and a number of technical reports and manuals.
Indexing issues under both source-language and direct-manipulation [8,9] paradigms are considered. In a source-language based system, the user specifies the document with interspersed commands, which is then passed to a formatter, and the output is obtained. The source-language usually provides some abstraction and control mechanisms such as procedures, macros, conditionals, variables, etc. in much the same way as a high-level programming language does. In a direct-manipulation environment, the user manipulates the document output appearance directly by invoking built-in operators available through menus and buttons. These systems are highly interactive; the result of invoking an operation is observed instantaneously, thereby creating an illusion that the user is ``directly'' manipulating the underlying object.
The document attributes in direct-manipulation systems are usually specified by a declarative language encapsulated as form-based property sheets. These property sheets correspond to textual markup tags that can be imported to or exported from the direct-manipulation system for document interchange purposes. While a source representation is maintained explicitly in the source-language model, the notion of a document semantics specification language is somewhat implicit in direct-manipulation systems.
Some people have called direct-manipulation systems WYSIWYG (what-you-see-is-what-you-get). The two concepts are not equivalent, however. WYSIWYG refers to the correspondence between what is presented on a video display and what can be generated on some other device. In the context of electronic publishing, WYSIWYG means that there is a very close relationship in terms of document appearance between a screen representation and the final hardcopy. Direct manipulation is a more general concept that models user interfaces. A direct-manipulation document preparation system may not have a WYSIWYG relationship between its display representation and the final hardcopy. Conversely, a batch-oriented, source-language based formatter may be coupled with a previewer that is WYSIWYG.
This paper presents some general indexing problems and our solutions in a top-down fashion. First we discuss the desirable features of an index processor and then some design and implementation considerations for such a processor. Our goal is to arrive at general purpose solutions. Next, a framework is introduced under which an author enters index commands or tags with much reduced overhead. The examples shown are in LATEX [10], a high-level document preparation language based on TEX [11]. The model, however, is not restricted to any particular formatting language, nor to the source-language paradigm. We also examine some unique indexing issues in electronic document development environments that do not seem to find appropriate counterparts in traditional printed material. Finally our indexing facilities are evaluated against those available in other formatting systems such as Scribe [12], troff [13], and some direct-manipulation environments.