Power-Programmierung

home *** CD-ROM | disk | FTP | other *** search

/ Power-Programmierung / CD2.mdf / doc / mir / 11intro < prev next >

Wrap

Text File | 1992-06-29 | 3.7 KB | 96 lines

════════════════════════════════════════════════ 1. INTRODUCTION TO MIR TUTORIAL ONE ════════════════════════════════════════════════ ════════════════════════════ 1.1 Project overview ════════════════════════════ The Mass Indexing and Retrieval (MIR) project deals with the technical side of enabling people to find information within large quantities of data. Output from the project takes the form of five sets of printed tutorials, plus related software and source code under these headings: ONE Database Analysis TWO Secrets of Data Preparation THREE Keys to Automated Indexing FOUR Search Engines and Information Retrieval FIVE Related Topics and Applications The tutorials are addressed to Directors of Information Services, custom software providers, information publishers, government information distributors, educators, trainers, and programmers. The software is distributed under "copyleft" rules of the Free Software Foundation. Improvements are invited and will be shared in a final volume and in an accompanying CD-ROM. You may wish to print the five introductory topics together with Tutorial ONE and include them in a three ring binder. For best formatting, use the WordPerfect 5.1 version of the files provided on diskettes. Printed copies are also available from Marpex Inc. for a nominal cost; see the files ORDRINFO and ORDRFORM. ═════════════════════════════════ 1.2 Tutorial ONE overview ═════════════════════════════════ The purpose of MIR Tutorial ONE is to enable you to analyze computerized data from an indexing perspective. The first topic, source code guidelines, explains the perspectives that have been built into the software that is provided with the tutorials. People who wish to improve on the technology are shown how to share their insights and C language source code. Methods of data gathering affect the cost, the quality and the complexity of the task of indexing. An index adds value to data, so we pay attention to some marketing considerations. Data analysis has to do with recognizing various forms in which data is accumulated, and detecting the inconsistencies (common in large sets of data) that make indexing more challenging. Data format offers possibilities and imposes limitations that will face searchers who wish to extract information. How might the data be structured in a way that better suits the needs of searchers? The reader is provided with a variety of software tools for this critical data analysis function. The ability to identify patterns in byte sequences quickly is critical to keeping indexing costs low. We examine a series of software tools for this purpose. Worked examples are provided of the analysis stage. These topics are at a "nuts and bolts" level... use such and such a program, here is the input, here is the output, and here is what the results mean. The sequence is from simplest to most complex... simple ASCII text, ASCII with markup, fielded text, fixed length records, the addition of packed numbers, then various forms of binary data Data deblocking is explained at this stage since it may be required in order to finish analysis of the data. At the end of TUTORIAL ONE, the participant has detailed exposure to the techniques of data analysis, and is able to use a selection of analysis tools (source code provided) to recognize and interpret a wide range of data types.