home *** CD-ROM | disk | FTP | other *** search
-
- ════════════════════════════════
-
- 10. GLOSSARY AND INDEX
-
- ════════════════════════════════
-
-
- Each item is indexed by topic and section number. The
- first reference is to the topic and section in which the item is
- explained or first discussed. The index indicates other notable
- places in which the item is mentioned. For example, "accented
- characters" below are discussed in some detail in topic 4, section
- 8. Section 10 of topic 4 and four sections within topic 5 touch on
- accented characters further.
-
-
- ≡≡≡≡->> QUESTION:
- What terms or expressions are used in Tutorial ONE that
- puzzle you or that should be included in this glossary
- and index?
- <<-≡≡≡≡
-
-
- ═════
- A
- ═════
-
- accented characters
- 4.8 indicated in DOS with high-bit-set characters
- 4.10 distribution frequencies
- 5.1 displayed in MIR program HEAD
- 5.3 displayed in MIR program F_PRINT
- 5.4 displayed in MIR program DUMP
- 5.5 displayed in MIR program FRAGMENT
-
- ANSI C (American National Standards Institute)
- 2.3 The ANSI draft standard for the C programming
- language proposes a basic set of functions and
- characteristics; adhering to ANSI C is the best
- way to assure maximum portability of C programs
-
- ASCII (The American Standard Code for Information Interchange)
- 4.8 an agreed-upon assignment of bit patterns to
- letters, digits, punctuation, control characters
- etc. Mentioned in 64 other sections.
-
-
-
- ASCII text
- 4.8 a file made up of the printable subset of ASCII,
- entered from a normal keyboard and displayable on
- terminals in ASCII-based operating systems such as
- DOS and UNIX
- 4.10 byte distribution in example of ASCII text file
- 5.1 MIR program HEAD reports if file not ASCII text
- 5.3 MIR program F_PRINT extracts ASCII text
- 6.1 programs for analyzing ASCII text
- 7.1 fixed length ASCII text records
- 9.2 MIR program F_TRAIL to remove trailing blanks
-
- ASCII text, extended
- 4.8 to the printable set of ASCII characters, the
- extended set adds accented characters commonly
- found in various European languages
-
- A_BYTES
- 4.7 MIR program to analyze the distribution of byte
- frequencies within any file
- 4.8 and analysis of data types
- 4.9 and analysis of data presentation
- 4.10 worked example on extended ASCII with markup
- 5.5 use locations to examine context
- 7.1 recognizing printable subsets
-
- A_LEN
- 6.1 MIR program to analyze the distribution of line
- lengths up to 1024 bytes within any file.
-
- A_OCCUR
- 5.8 MIR program to count the frequency of occurrence
- of identical lines in ASCII text
-
- A_OCCUR2
- 5.8 MIR program to calculate cumulative frequency of
- merged A_OCCUR outputs
-
- A_OCCUR3
- 5.8 MIR program to reverse an A_OCCUR file back to
- repeated lines of ASCII text
-
- A_PATTRN
- 5.6 MIR program to list every occurrence of a key
- character or string in a file
- 5.7 the power of sorting A_PATTRN outputs
-
-
-
- ═════
- B
- ═════
-
- batch file
- 4.5 a text file in DOS containing an orderly series of
- commands, each of which runs a program or process
- as part of a larger task
-
- BCD (Binary Coded Decimal)
- 4.8 a set of codes in which a combination of 4 bits is
- assigned each digit 0 through 9 (0000, 0001, ...
- 1001); each 8 bit byte can hold two BCD digits
- 7.4 used within EBCDIC COBOL records for packing
-
- bit
- 2.2 the smallest measure of computer memory; a single
- off/on characteristic that is interpreted as a
- zero or a one. A series of bits can be mapped to
- binary arithmetic. Example... 10110 =
- 1 X 2 to the fourth power (1 X 16) +
- 0 X 2 to the third power (0 X 8) +
- 1 X 2 squared (1 X 4) +
- 1 X 2 to the power 1 (1 X 2) +
- 0 X 2 to the power zero (0 X 1)
- which is decimal 22.
-
- blocked records
- 4.9 a method of data presentation in which successive
- records are grouped in a logical consistent manner
- for convenience of reading, writing or storage
- 9. topic on how to deblock records
-
- BPI (bits per inch)
- a measure of the quantity of information held on
- magnetic tape; normal measures are 1600 and 6250
- BPI
-
- byte
- 8 bits; one byte can represent 256 different values
-
- byte stream
- 4.9 the crudest form of file; sequence of bytes which
- a program reads sequentially and manipulates
- according to content rather than according to
- position within the file
- 6.4 contrast to hierarchical text
-
-
- ═════
- C
- ═════
-
- C language
- "a general purpose programming language which
- features economy of expression, modern control
- flow and data structures, and a rich set of
- operators" (Kernighan and Ritchie, The C
- Programming Language, page ix), in which source
- code requires little or no adaptation to be used
- on a wide variety of computers
-
- Canada
- the home of the GST (Grab and Squander Tax) and
- the place where cold weather comes from; a country
- in which natives huddle in their igloos and write
- superlative software in vain attempt to stay warm
-
- CD-ROM (Compact Disc Read Only Memory)
- a computer optical storage medium, closely related
- to the compact discs used for music, holding 660
- million bytes of data, with random access to any
- point on the disc in less than two seconds
-
- COBOL (COmmon Business-Oriented Language)
- a computer programming language favored in
- commercial applications in the 1960s and later,
- particularly in mainframe (large computer)
- installations
-
- COLRM
- 5.8 MIR program to remove a specified range of columns
- from each line of an ASCII text file.
- 7.3 extracting a single field from a file consisting
- of fixed length ASCII records
-
- compiler
- 2.3 computer program used to translate source code
- into a machine language program, suitable for
- executing on compatible computers with the same
- operating system
-
- CompuServe Information System
- an electronic information and communication system
- with over 900,000 subscribers, widely used for
- electronic mail; sometimes abbreviated CIS or CI$;
- CompuServe is a registered trademark of
- CompuServe, Inc.
-
- concatenate
- 4.7 to link together, as in a chain; to place several
- text files one after another within a combined
- file
-
- copyleft
- refers to the Free Software Foundation GNU General
- Public License in which persons receiving source
- code can do almost anything with it except put in
- under copyright or patent
-
- CPB
- 4.6 MIR program to copy any portion of any file to a
- new file
- 5.5 use to get a more detailed, but less convenient,
- display than that produced by FRAGMENT
-
-
- ═════
- D
- ═════
-
- DEBLOC_A
- 9.4 MIR program to remove blocking and insert line
- feeds in a variable length blocked ASCII text file
-
- DEBLOC_B
- 9.5 MIR program to deblock two level binary blocked
- files
-
- DIR
- a DOS command to list files and their sizes within
- a directory
-
- DOS (Disk Operating System)
- the most widely used operating system for IBM
- compatible personal computers; MS-DOS is a
- registered trademark of Microsoft Corporation
-
- DOS executable form
- 2.3 selected for widest spectrum of potential users
- 4.5 program in PC compatible machine language ready
- for use in a MS DOS or PC DOS environment
-
- DOSIFY
- 5.2 MIR program to replace a UNIX-style text file with
- a DOS version in which each line feed is preceded
- by one carriage return, and the file ends with one
- CTL-Z byte
-
- DUMP
- 5.4 MIR program to list the contents of a specified
- portion of any file, reporting 16 bytes per line
- in hexadecimal and (where feasible) printable form
- 5.5 detailed way to display context at a location
- 8.2 use to examine file signatures
- 8.4 use to verify binary blocking
-
-
- ═════
- E
- ═════
-
- EBCDIC (Extended Binary Coded Decimal Interchange Code)
- 4.8 an agreed-upon assignment of bit patterns to
- letters, digits, punctuation, control characters;
- an alternate to ASCII, common on IBM mainframes
- 7.4 may need to re-convert to identify packed values
- 9.5 DEBLOC_B program
-
- EBC_ASC
- 4.8 MIR program to convert an EBCDIC file to ASCII
- 9.5 distorts binary values when converting files
-
-
- ═════
- F
- ═════
-
- field
- 4.1 unit of data that takes on meaning according to
- location or an identifying code; examples...
- purchase order number, street address, quantity,
- cost per unit, etc.
- 5.6, 5.7 recognizing field separators
- 6.5 fielded variable length text
- 6.6 sequence of data within a field as an analysis aid
- 7.2 field layouts
- 7.3 extracting a single field from fixed length data
-
-
- fixed length records
- 4.9 a file consists entirely of equal size segments,
- and within each segment, fields have specific byte
- range assignments which do not vary from one
- record to the next
- 7. topic on worked examples of fixed length records
- 8.5 binary data within fixed length records
- 9.3 deblocking fixed length records
-
- FORtran (FORmula TRANslation)
- a procedure oriented programming language
- developed in the 1950s for solving problems in
- mathematics, science and engineering; Fortran is
- still in use
-
- FRAGMENT
- 5.5 MIR program to display a five line fragment of a
- file in printable form, providing a quick view of
- context
-
- F_PRINT
- 5.3 MIR program to filter/reduce a file to printable
- characters only
-
- F_TRAIL
- 9.2 MIR program to remove trailing blanks from lines
- of ASCII text
-
- ═════
- G
- ═════
-
- gigabyte
- 1,073,742,824 characters of data
-
- GNU (GNU's Not UNIX)
- a recursive acronym for the Free Software
- Foundation's alternative to the UNIX operating
- system; a diabolical threat to mental health if
- one is asked too frequently: "What's GNU?"
-
- ═════
- H
- ═════
-
- hard copy
- 4.4, 7.4 data printed on paper an aid to analysis
-
- hardware
- 2.3 the physical components of a computer (case, disk
- drives, boards, chips, etc.) and its peripheral
- equipment (printer, external drives, terminal,
- cables, etc.); what you can see, feel, hear, and
- (when the terminal has been on too many hours)
- smell
-
- HEAD
- 5.1 MIR program to display lines at the beginning or
- end of a text file
- 5.2 use to recognize non-DOS text
-
- hexadecimal notation
- 4.7 Arithmetic to the base 16; the rightmost digit in
- an octal number is a multiple of 16 to the power 0
- (i.e., 1), the next digit 16 to the power 1, the
- third digit from the right 16 to the power 2, etc.
- The hexadecimal digits are 0 1 2 3 4 5 6 7 8 9 A B
- C D E and F. Example: hexadecimal 6D is 6 X 16
- plus 13 X 1 which in decimal arithmetic is 109 and
- in ASCII code is the letter 'm'. The 256 possible
- values in one byte are hexadecimal 00 through FF.
- Note one hexadecimal digit represents 4 bits.
- 5.4 output from DUMP program
- 5.6 output from A_PATTRN program when /x argument used
-
- HEX_BIN
- 8.4 MIR program to create test files with any
- combination of printable and binary characters
-
- high-bit-set
- 4.8 the first of eight bits in a byte is turned on
- 4.9 bytes show up in binary length blocked records
- 4.10 used in DOS for accented characters
-
- homonyms
- words of different meaning which share the same
- spelling (a significant problem in indexing)
-
- ═════
- I
- ═════
-
- IBM
- registered trademark of International Business
- Machines Corporation
-
- ISO 9660
- Standard controlling the headers and file
- references on CD-ROM that permits any computer
- program written to standard to access files in
- conforming CD-ROM readers of any manufacturer; ISO
- = International Standards Organization
-
-
- ═════
- J
- ═════
-
- ═════
- K
- ═════
-
- ═════
- L
- ═════
-
- line records
- 4.9 segments of text padded to a fixed length
- 9.2 reducing line records
-
- LINES
- 6.1 MIR program to provide a quick count of the number
- of lines in each of one or more text files
-
- LINE_NUM
- 6.1 MIR program to assign a sequence number to each
- line in a text file
-
- ═════
- M
- ═════
-
- markup codes
- 4.8 embedded signals which direct how data should be
- displayed
- 3.6 and standards; and SGML
- 6.2 ASCII markup patterns
- 6.3 Standard Generalized Markup Language
- 8.1 binary markup
-
- media
- alternate methods of storing data so that it may
- be entered readily into computer memory; examples
- are hard disk, floppy diskette, optical disk,
- magnetic tape, laser card, punched card, punched
- tape
-
- media independent
- describes a technique in which the selection of
- data storage technology has no bearing
-
-
-
- MIR (Mass Indexing and Retrieval)
- project whose output is a set of tutorials, plus
- extensive C language source code under copyleft
- rules, aimed at enabling technical people to write
- or adapt software leading to high speed retrieval
- in any size database
-
- mouse
- 2.1 a hand operated device to point to objects or text
- on a computer screen; a mouse-click on an object
- or piece of text acts as a command to a program
-
- ═════
- N
- ═════
-
- NEWLINES
- 7.1 MIR program to insert carriage returns and line
- feeds at regular intervals, to deblock data
- received in line blocks
- 7.3 use to extract a field from a fixed length ASCII
- text file
- 9.2 use to deblock line records
-
- ═════
- O
- ═════
-
- octal notation
- 4.7 Arithmetic to the base 8; the rightmost digit in
- an octal number is a multiple of 8 to the power 0
- (i.e., 1), the next digit 8 to the power 1, the
- third digit from the right 8 to the power 2, etc.
- Example: octal 376 is 3 X 64 plus 7 X 8 plus 6 X 1
- which in decimal arithmetic is 254. The 256
- possible values in one byte are octal 000 through
- 377. Note one octal represents 3 bits.
- 7.3 used by the UNIX utility TR
-
- OCR (Optical Character Recognition)
- 3.5 computer software and a scanning device interact
- to convert text on paper into machine-readable
- form
- 3.7 human checking for validity
-
- open architecture
- describes hardware and software in which the
- technical detail is made generally available
-
- operating system
- 2.3 the software and data that initiates, coordinates
- and directs the components of a computer; serves
- as an intermediary between the user's programs and
- the computer hardware
-
- ═════
- P
- ═════
-
- preprocessing
- the use of a wide variety of techniques to bring
- data into a standardized form; used in MIR in
- preparation for automated indexing
-
- P_FIXED
- 9.3 MIR program to convert a fixed record length file
- to ASCII with field numbers
-
- P_MARC
- 9.5 Program source code, untested, to deblock MARC
- library records
-
- ═════
- Q
- ═════
-
- ═════
- R
- ═════
-
- RAM (Random Access Memory)
- 2.2 making do with little high speed memory
- 8.6 use of RAM in decompression
-
- reboot
- 2.1 restart a computer by pressing a reset button or
- (on a PC compatible) by pressing the three keys
- CTL-ALT-DEL at the same time; an inelegant way to
- escape from a badly written computer program
-
- REPLACE1
- 7.3 table-driven MIR program to replace every byte in
- an input file with exactly one alternate byte
- (passing reference; full write-up in Tutorial TWO)
-
-
- ═════
- S
- ═════
-
- SFQL (Structured Full text Query Language)
- proposed standard to enable "interoperability" of
- CD-ROMs and software interfaces by different
- vendors
-
- SGML (Standard Generalized Markup Language)
- 6.3 introduction to SGML
- 3.6 user control over format
-
- SORT2
- 5.8 MIR program to sort large text files using the
- memory-bound DOS SORT routine in multiple passes
-
- source code
- 2. the form in which computer programs are normally
- written and changed, in a "language" which a
- compiler program can translate into machine
- language for high speed use; without access to
- source code it is very difficult to make changes
- to a program to accommodate it to new needs
-
- stdin (standard input)
- instead of taking data from a named file, a
- program receives data directly from another
- program or from a terminal; risky in DOS for non-
- text files
-
- stdout (standard output)
- the result of a program is sent to another program
- or to a terminal; risky in DOS for non-text files
-
- ═════
- T
- ═════
-
- ═════
- U
- ═════
-
- UNIX
- a computer operating system and trademark of Bell
- Laboratories
- ═════
- V
- ═════
-
- ═════
- W
- ═════
-
- WordPerfect
- the word processor used to create the topics on
- the MIR diskettes; WordPerfect is a registered
- trademark of WordPerfect Corporation
- 8.3 converting a file to ASCII
-
- WYSIWYG (What You See Is What You Get)
- 6.2 the simplest form of text file
- 6.3 untagged SGML
- 8.3 WordPerfect ASCII conversion
-
- ═════
- X
- ═════
-
- ═════
- Y
- ═════
-
- ═════
- Z
- ═════
-