WAIS Client for OS/2 Documentation An Online version of this help file is available from the help menu within WAIS or by entering the 'view os2wais.inf' command from a command window. Introduction The WAIS OS/2 Client is a Public Domain software product developed at the Libarary of Congress. The Client allows OS/2 users to connect to WAIS Servers on the Internet and to search for and retrieve documents from those Servers. Documents returned ca be text, pictures, or other types of data, depending on the type of server being accessed. The Client and WAIS Servers communicate using the WAIS Protocol. This allows a single user to query many different data servers without having to learn a new que ry language or interface. The Client can also be used to access local WAIS Servers across a local area network (LAN). This manual describes how to contact a WAIS Server, search its database, and retrieve documents. The Client user first selects a SOURCE (a pointer to a WAIS Server) to query. Then the user forms a QUERY (a string of words) to search the database and r eceives document HEADLINES (document titles or descriptions). Using RELEVANCE FEEDBACK (defined below), the user can refine her search to find a desired document. Finally, the user retrieves the document, either to display it, or to store it on her mach ine for later reference. Installation After unzipping the os2wais.zip file using unzip.exe (available from ftp-os2.cdrom.com), start the os2wais program by typing 'os2wais'. Some sample source files are included, and some default viewers are selected. To add an icon to a folder or to the desktop, right-click on another application and choose 'create another' -> 'program'. In the path name field of the setup for the new icon, enter the path of the os2wais.exe program and the program name. Network Requirements The OS/2 Client runs on top of IBM's TCP/IP for OS/2 network software. The user must be able to open a socket connection to remote WAIS Server machine on the network. The WAIS Client will work with either the 16-bit or 32-bit flavor of IBM's TCP/IP fo r OS/2 product. The Client will not work with non-IBM TCP/IP products, but conversion should not be difficult. Since the Client is Public Domain software, source code available for porting, modification, or improvement. Sources The first step in beginning a serach is to select a source to contact. The user lists the currently known sources by clicking on the "Sources" button. A window listing current known sources will appear. You can select one or more of these sources wit h the mouse and then hit the "Use Selected Sources" button or double click on a source. The selected sources will then appear in the "Look in these Sources:" window, ready to be searched. Most searches are a single-source, but there are times when it is desirable to search mutiple sources simultaneously. If you want to stop searching a source, select the source in the "Look in these Source:" window and execute the "Stop Using Source" command in the "Sources" pull-down menu. Known sources are described in files with ".src" extensions. The first time the user lists sources, the Client loads in all the .src files in the local directory. To see what these files contain, select a source in the "Known Sources" window (just one ) and then click on the "Edit Source" button. A window will appear, showing all the information associated with that source. Typically, the source description provides information on how to search that source, how to obtain more information on that sour ce, whether or not the server service costs money, and the email address of the source administrator. Be careful, you can click in any of these windows and edit the contents, if you change the network information, you may not be able to contact that sou rce in the future. You can also select sources and hit the "Delete Selected Sources" button. This erases all the information related to that source and erases the .src file in the local directory. Queries Once you have selected a source to use, the Client should put you back into the Query window. This is the window which is labeled, "Tell me about:". You can now enter a natural language question in this window, or just type a set of words and phases t hat are relevant to the type of information you are seeking from the selected source. The general algorithm for weighting words and phrases is as follows: if a word is rarely used in the database, it get more weight; if a phase matches exactly, it gets more weight; and if a word appears in the document title, it get more weight. On ce you have entered your query, hit return or click on the "Search" button to begin a search. You can also enter more complex queries, depending on the type of server you are contacting. For example, WAIS Inc. commercial servers allow you to enter boolean queries by using logical words in capital letters, like AND, OR, and NOT. The source desc ription should tell you what kind of server it is and what kinds of queries it supports. Also, the server description often contains a method for getting a help document about that server. Results Search results are displayed in the results window, the largest window in the display with the column headings "Score Size HEADLINES". The server should return a number of document titles or headlines, along with their score and size. The score runs from 0 to 1000. The highest scoring documents are listed first at the top of the display. The default file size indicates the number of bytes or characters it contains. If the file is large, the size will be expressed in mutiples of 1024. If the size is followd by a "k", these are units of 1024. "M" stands for megabytes, or units of 1024 squared (slightly more than a million). "G" stands for gigabytes, or units of 1024 cubed (slightly more than a billion). Retrieving Documents You can double click on any displayed headline in the results window to retreive and display the document. Before retrieving a document, it is wise to look at how large it is to get an idea of how long it will take to retrieve the document. A 150k fil e will take anywhere from 10 seconds to a minute to download, depending on network traffic, network bandwidth, and server workload. The Client retrieves the document and puts it into a file called "new_doc.tmp" and launches a viewer to display the document. The type of viewer depends on the type of document retreived. The user can select which type of viewer to launch with each type of document by selecting the "Document Viewers" menu item from the "Options" menu list. Typically, editors are used for text documents, while an image viewer is used to display GIF, JPEG , or TIFF documents. The Client comes with default viewer settings. The OS/2 epm editor is called on text documents. Also included on the Client distribution disk is a Public Domain image viewer which is called by the Client for GIF and JPEG images. The user can substitu te his preferred editors and viewers for these default values. The document viewer runs as a separate program. When you are done viewing a document, simply quit or close out the editor or viewer. The WAIS Client will still be running. Saving Documents Each document retrieval erases the previous contents of "new_doc.tmp". If the user wishes to permanently store a document, she should copy the file "new_doc.tmp" to another file before retrieving another document. In the case of text documents, simply use the "Save As" command in the editor to save the file under another name. With imgages, the user may have to go to another OS/2 command window to copy the file, unless the viewer has a "Save As" command. Finding New Sources The Client disk comes with a few of .src files, but these are only for demonstration purposes. The one source which is essential to have is the Directory of Servers. This is a WAIS Server which is a database of databases. Begin your search with this source in order to locate sources which are relevant to your query. The Directory of Servers functions like a normal WAIS Server, except that the documents it returns are source descriptions, not documents. To examine a source description, simply double click on the headline in the Results window. The "document" will be retrieved and displayed. At this point you have the option to discard the source description "Cancel", or to save it out for future use "Save". If you wish to save the source, be sure to edit the "Filename" field to indicate the filename to use. The default name is "new-src" which will be overwitten the next time you save a source description wihtout changing the file name. The Client will ap pend a ".src" extension to the source filename. The new source should now appear in the known sources window, listed under the filename you chose, ready to be used. If you are running WAIS on a FAT formatted disk, you will get an error if you specify a filename greater than eight characters. Creating Source Pointers You can also create source descriptions if you know the database name, the internet address, and the port number of the Server you are trying to contact. Call the "Create a New Source" command under the "Sources" pull-down menu. Then fill out the nece ssary information by clicking in each field. The IP Number is not required, but if you know it, put it in as it will save lookup time. The rest of the information is optional. You must enter the exact Database Name; the machine name and port number are not sufficient. Servers run under the UNIX Operating System. The Database Name is actually a UNIX path name which the Server uses to access the database. UNIX is case sensit ive. This means that the database name must have the correct capitalization. Also note that UNIX pathnames use "/" not "\" as in DOS, or OS/2. Relevance Feedback One of the most powerful aspects of WAIS is the ability to say to a server, find me more documents like this one. This is called relevance feedback. This is a quick, intuitive way of searching large databases to obtain the documents you are looking fo r. If you find a document that you want to use for relevance feedback, select the document headline and execute the "Use Document for Relevance Feedback" command under the "Documents" menu list. The document headline, along with the source it comes from , will appear in the relevance feedback window which is titled "Similar to:". You can now run the search again (by hitting the "Search" button), but this time, in addition to your query, the document pointers in the relevance feedback window will be passed to the server to refine your search. Relevance feedback can be used itera tively, adding and deleting documents until you find the what you are looking for. Relevance Feedback and Multiple Source Searches Relevance feedback works best with single-source searches with documents which come from that source. If you are doing a multiple-source query, relevance feedback becomes more complicated. For those of you who want to know how it really works, read on. Although all relevance feedback document ID's are send to all the servers being searched, only those servers that can access relevance feedback documents on their own file systems will use them, otherwise they will ignore them. That is, relevance feedb ack documents from Server X cannot be used by Server Y, unless Server X and Y are on the same file system. Thus, if you are simultaneously searching on two servers (X and Y) with relevance feedback documents from both servers, and if they are not on the same file system, then each server will perfom its search only with the relevance feedback documents from their respective databases. Also, when a user removes a source from the "Look in these Sources:" window (via the "Stop Using Source" command), all the relevance feedback documents from that source are placed at the bottom of the list, with the label "These documents may be ignored :" to indicate that their source is no longer being used. If they exist on a file system that is still in use, they may still be used, but otherwise they will be ignored.