home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Power-Programmierung
/
CD2.mdf
/
doc
/
mir
/
10toc
< prev
next >
Wrap
Text File
|
1992-07-01
|
6KB
|
266 lines
══════════════════════════════════════
MIR TUTORIAL ONE
DATABASE ANALYSIS
Detailed Table of Contents
══════════════════════════════════════
0. MIR TUTORIAL ONE Table of Contents
1. Introduction to MIR TUTORIAL ONE
1.1 Project overview
1.2 Tutorial ONE overview
2. Source code guidelines
2.1 Needs of the information searcher
The value of time
Simplicity, simplicity, simplicity
Control
Freedom from a ticking clock
Freedom from obscure error messages
Freedom from the curse of codes
Language of choice
Context-sensitive help
More bang per computer dollar
2.2 Design background
Squeezing each bit... the conservationist start
The gigabyte years
Unix influence
C with a Fortran accent
2.3 Design decisions
Language
Hardware
Operating system and compiler
Avoiding code that blows up
2.4 Conventions
Humans use programs
Humans read programs
2.5 Use It, Improve It
3. Data gathering
3.1 Some definitions
Datum
Data
Record
Information
Knowledge
3.2 Why gather data?
3.3 Who are data gatherers?
3.4 Keyboard data input
3.5 Scanned data input
3.6 Formats, standards and common sense
3.7 Data quality
Accuracy
Timeliness
Consistency
3.8 Value of data
Market capacity
Cost recovery strategy
Educating the market
Perception of value
Value added through combination
3.9 Data ownership
3.10 Summary
4. First steps in data analysis
4.1 Objectives
Extract searchable content
Recognize record separations
Recognize field separations
Recognize formatting aids
4.2 Learn how the data was accumulated
4.3 Learn how the data will be used
4.4 Access to samples and hard copy
Media
Representativeness
Hard copy
4.5 Access to software tools
4.6 Extracting samples from larger files
Use CPB to get subsets
4.7 Byte surveys - a worked example
A_BYTES to analyze bytes
Sorting byte analysis reports
A_BYTES -L for locations data
4.8 Data types
ASCII text
Extended ASCII text
Text with ASCII markup codes
Text with binary markup codes
Text with packed numbers
Text with compression substitutions
EBCDIC
EBC_ASC to convert EBCDIC to ASCII
Binary data
4.9 Data presentation
Byte stream
Line records
Fixed length records
Blocked records with ASCII lengths
Blocked records with binary lengths
4.10 Byte distributions
English text
European languages text
Significance of byte frequencies
5. Patterns in byte sequences
5.1 Heads and tails... first impressions of a file
HEAD to see the beginning of a file
HEAD ## to see ## lines
HEAD -t to see the tail end of a file
HEAD -a to see accented characters
5.2 Non-DOS files
DOSIFY to insert carriage returns
5.3 Displaying printable data
F_PRINT filter
5.4 Detailed data dumps
DUMP to display hex and ASCII
5.5 Convenient display of fragments
FRAGMENT to show context
5.6 Viewing patterns throughout a file
A_PATTRN to extract byte patterns
5.7 The power of sorting patterns
5.8 Sorting large files
SORT2 for files over 60k
COLRM to reduce large files before sorting
A_OCCUR to analyze occurrences
A_OCCUR2 and A_OCCUR3 utilities
6. Worked Examples - Variations in ASCII text
6.1 Other analysis tools
LINES for a quick line count
A_LEN for a distribution of line lengths
LINE_NUM to insert line numbers
6.2 ASCII markup patterns
6.3 Standard Generalized Markup Language (SGML)
6.4 Free versus hierarchical text
6.5 Fielded variable length text
6.6 Independent versus continuous data
7. Worked Examples - Fixed length records
7.1 Recognizing fixed length ASCII text
NEWLINES to separate records
7.2 Field layouts
7.3 Extracting a single field
7.4 Packed numbers in fixed length records
8. Worked Examples - Binary data
8.1 The preprocessing option
8.2 File signatures
8.3 Converting word processing files
8.4 Binary deblocking lengths
HEX_BIN to create test files
8.5 Binary data in fixed length records
8.6 Compressed data
9. Data Deblocking
9.1 An aid in analysis
9.2 Reducing line records
F_TRAIL
9.3 Handling fixed length records
P_FIXED
9.4 Blocked records with ASCII lengths
DEBLOC_A
9.5 Blocked records with binary lengths
DEBLOC_B
P_MARC
10. Glossary and index of terms
END OF MIR TUTORIAL 1