opyright (C) 1986, 2001 by Pinnacle Software (Toronto) +-------------------------- WIDE APPLICABILITY ---------------------------+ | | | | | Runs under Windows (3.x, 95, 98, NT, Me, 2000), OS/2, | | and Novell (DR-DOS). | | | | Can be invoked by other Windows, DOS or OS/2 programs | | such as FoxPro, Pascal, Visual Basic, C++, Delphi and | | so on, with success verification (return code or log) | | | | Can be run under DOS emulators (e.g. Macintosh, Unix) | | for convenient cross-platform conversion. | | | | | +------- HERE ARE A FEW OF THE THINGS PARSE-O-MATIC CAN DO FOR YOU -------+ | | | | | Importing Exporting Automated Editing | | Text Extraction Data Conversion Table Lookup | | Retabulation Info Weeding Selective Copying | | Binary-File to Text Report Reformatting Wide-Text Folding | | Auto-Batch Creation Comm-log Trimming Tab Replacement | | Character Filtering Column Switching DBF Interpretation | | De-uppercasing Name Properization And much more! | | | | | +---- INPUT AND OUTPUT METHODS CURRENTLY SUPPORTED BY PARSE-O-MATIC ------+ | | | | | Input: Text (any format), Binary, DBF (DBase), Fixed-Record-Length, | | Variable-Record-Length, EBCDIC | | | | Output: Text (e.g. flat, comma-delimited, paginated, hex), Binary, | | Fixed-Record-Length, Variable-Record-Length, EBCDIC, | | Generic output devices (e.g. COM1: or LPT2:) | | | | | +-------------------------------------------------------------------------+ +------------ UNSOLICITED TESTIMONIALS (USED WITH PERMISSION) ------------+ | | | | | "Parse-O-Matic is absolutely great. I use it when I collect data | | from the McDonald's restaurants in Switzerland. POM has paid for | | itself so many times ..." -- Chris Friedli | | | | | +-------------------------------------------------------------------------+ | | | | | "Parse-O-Matic is a wonderful time saver .... Each report that I | | can convert from our ... accounting system saves our company about | | 500 man hours per year." -- R. Brooker | | | | | +-------------------------------------------------------------------------+ | | | | | "In 30 years of working with computers, this is by far the easiest | | way I have found to extract data from files. I was very surprised | | that the program just took a few seconds to chew its way through | | 1MB of data. You ought to mention it's FAST." -- Koenraad Rutgers | | | | | +-------------------------------------------------------------------------+ | | | | | "Parse-O-Matic is THE greatest parsing package .... I wrote my own | | software for parsing ... then I was introduced to POM and have been | | using it ever since. Good job!" -- Jeff Tallent, Vestax Securities | | | | | +-------------------------------------------------------------------------+ +---------------------- PARSE-O-MATIC IS VERSATILE -----------------------+ | | | | | | | This manual was formatted by Parse-O-Matic, from plain text files. | | The table of contents was also generated by Parse-O-Matic. | | | | | | | +-------------------------------------------------------------------------+ +------------------------ WHO USES PARSE-O-MATIC? ------------------------+ | | | | | Some of our distinguished customers include: | | | | | | Bankers Trust HBO (Home Box Office) Philip Morris | | | | Berliner Volksbank Harris Semiconductor Pitney Bowes | | | | Boeing Home Box Office Prentice Hall | | | | Bridgestone Hughes Procter and Gamble | | | | CIBA Vision Ingram Rank Xerox | | | | Calcomp Canada Carrefour France Kodak | | | | Royal Bank Champion Int'l Lipton Tea | | | | Royal Caribbean CompUSA May Department Stores | | | | SmithKline Beacham Degussa McCain Foods | | | | Southwest Bancorp Dresdner Bank McDonald's | | | | Sun Life Eaton Monsanto France | | | | Sundstrand Aerospace Eddy NEC | | | | Target European American Bank Nestle | | | | Visa International First Bank System Nike | | | | Yale University Press First Federal Novell | | | | Ziff-Davis Georgia Gulf Pacific Gas & Electric | | | | | +-------------------------------------------------------------------------+ +-------------------------------------------------------------------------+ | | | | | The above list includes our best known customers, but omits | | city, state or provincial government offices and hospitals. | | | | | +-------------------------------------------------------------------------+ ============================================================================ TABLE OF CONTENTS ============================================================================ INTRODUCTION 1 What is Parse-O-Matic? 1 Parse-O-Matic Versus Automatic Converters 1 Why You Need Parse-O-Matic -- An Example 2 Parse-O-Matic to the Rescue! 2 How It Works 3 How To Contact Us 3 FUNDAMENTALS 4 The Parse-O-Matic Command 4 The POM File 4 Padding for Clarity 5 A Simple Example 6 QUICK REFERENCE 7 Command Descriptions 7 Basic Commands 7 Output Commands 7 Input Commands 7 Input Filters 7 Flow Control Commands 8 Variable Modifiers 8 Free-Form Commands 8 Positional Commands 8 Date Commands 8 Calculation Commands 9 Input Preprocessors 9 Lookup Commands 9 Data Converters 9 Miscellaneous Commands 9 Command Formats 10 BASIC COMMANDS 13 The SET Command 13 Basic Usage 13 The Trimming String 14 Non-Keyable Characters 15 The Null-Handling String 15 The IF Command 16 OUTPUT COMMANDS 17 The KEEP Command 17 The OFILE Command 18 Basic Usage 19 Closing the Output File 19 Strong Deduction 19 Weak Deduction 20 Appended Deductions 21 The OUT and OUTEND Commands 21 Generating a Blank Line 22 Missing Output 22 The OUTHDG Command 22 The OUTPAGE Command 23 The PAGELEN Command 24 INPUT COMMANDS 26 The GET Command 26 Variable Length Records 27 Delimiter-Terminated Data 29 Handling Long Delimiter-Terminated Data 30 Using GET with Text Files 30 End-of-File Considerations 31 The GETTEXT Command 32 The READNEXT Command 33 End of File Conditions 33 Optional Comparisons 34 Ignoring Null Lines 34 Saving the Previous Line 34 The PEEK Command 35 The REWIND Command 36 Why REWIND is Necessary 36 Using REWIND 37 Example 37 INPUT FILTERS 38 The MINLEN Command 38 The IGNORE Command 39 The ACCEPT Command 39 Clustered Accepts 40 FLOW CONTROL COMMANDS 41 The BEGIN Command 41 The CALL Command 43 Variable CALLs 43 Making Your Technique Obvious 44 Avoiding Unknown Code Sections 44 The CODE Command 45 Performance Considerations 46 Nested Subroutine Calls 46 Variable Code Sections 46 A Note to Experienced Programmers 47 The ELSE Command 47 The END Command 48 The AGAIN Command 49 Using AGAIN for Variable-Length Data 50 Pointless Command Combinations 51 Examples 52 The DONE Command 53 The NEXTFILE Command 55 The HALT Command 56 The SETERROR Command 57 The PROLOGUE Command 58 The EPILOGUE Command 59 VARIABLE MODIFIERS 61 The TRIM Command 61 The PAD Command 62 The CHANGE Command 62 The CVTCASE Command 63 Control Settings 64 The PROPER Command 64 The INSERT Command 66 The APPEND Command 67 The OVERLAY Command 68 Simple Arrays 69 The MAPFILE Command 70 What is a Map File? 70 Sample Map Files 70 Map File Format 71 Search Order 71 Case Matching 73 Reverse Mapping 73 Irreversible Mapping 74 Memory Limitations 75 An Example of Remapping 75 The REMAP Command 76 REMAP Versus CHANGE 77 Using REMAP 77 FREE-FORM COMMANDS 78 What are Free-Form Commands? 78 The PARSE Command 78 Decapsulators 79 Sample Application 80 The Occurrence Number 80 Finding the Last Occurrence 81 Unsuccessful Searches 81 The Control Setting 82 The Plain Decapsulator 82 The Null Decapsulator 83 Why Null Decapsulators Work Differently 84 Overlapping Decapsulators 84 Parsing Empty Fields 85 Additional Examples 85 The PEEL Command 85 The Control Setting 86 Parsing Empty Fields 87 The Left-Peeling Technique 88 The Leftover Technique 88 The PEELX Command 89 POSITIONAL COMMANDS 91 General Discussion 91 What are Positional Commands? 91 Why Use Positional Commands? 91 A Cautionary Note 92 Negative Positional Indices 93 The SETLEN Command 94 The DELETE Command 95 The COPY Command 96 The EXTRACT Command 97 The FINDPOSN Command 98 The Plain String Find 98 Using a Single Decapsulator 99 The Encapsulated String Find 99 Control Settings 101 Insoluble Searches 102 Null Decapsulators 102 Finding The Last Word 103 Who Needs This? 103 The SCANPOSN Command 104 The Scanlist 105 Accommodating Variation 105 Handling Prefixes and Suffixes 106 Controlling the Search 106 Leftmost, Rightmost, Find-Any 107 The Best Match Principle 108 DATE COMMANDS 110 General Discussion 110 The POMDATE.CFG File 110 Date Formats 110 The TODAY Command 112 The DATE Command 113 The MONTHNUM Command 114 The ZERODATE Command 115 CALCULATION COMMANDS 116 The CALC Command 116 The CALCREAL Command 118 The INC and DEC Commands 119 The ROUNDING Command 120 The CALCBITS Command 121 INPUT PREPROCESSORS 123 The SPLIT Command 123 Indicating Actual Input Length 124 Non-Contiguous Splits 124 The CHOP Command 125 Manual Reading 125 LOOKUP COMMANDS 126 The LOOKUP Command 126 Search Method 127 Limitations 127 Null Lines and Comments 128 Multiple Columns 128 LOOKUP Versus REMAP 129 The LOOKFILE Command 129 The LOOKCOLS Command 130 The LOOKSPEC Command 131 DATA CONVERTERS 132 The MAKEDATA Command 132 Creating Binary Data 132 Converting Dates 133 Practical Considerations 134 The MAKETEXT Command 134 Converting Binary Data 135 Converting Dates 135 Practical Considerations 135 MISCELLANEOUS COMMANDS 136 The ERASE Command 136 The FILESIZE Command 137 The GETENV Command 138 Disappearing Environment Variables 138 Examples 139 The LOG Command 139 The MSGWAIT Command 140 Standard Behavior 141 Setting a Time-Out Delay 141 Color Cues 141 Key Stacking 141 Exceptions 142 A Word of Caution 142 The PAUSE Command 142 The RANDOM Command 143 The SHOWNOTE Command 144 Other Notes 145 Slowing Down 145 The SOUND Command 145 The LISTEN Utility 146 Changing the Error Message Sound 146 The TRACE Command 147 TERMS 148 Values 148 Variables 148 Predefined Values 149 $FLINE 149 $FLUPC 150 $LASTFLINE 150 $SPLIT 151 $LINECOUNTER 151 Running Out of Variables, Literals or Lines 152 Delimiters 153 Illegal Characters 153 Using Comparators 153 Literal Comparators 154 Numerical Comparators 155 Length Comparators 155 Literal Comparisons and Sort Order 155 Numeric Comparisons 156 Upgrading from Earlier Versions 157 Predefined Data Types 157 Interpreting Data Formats in a File 158 DEDUCED VARIABLES 159 Deduced Variables 159 Definition 159 The Look-Up Process 160 Restrictions 160 Usage Guidelines 161 Array Variables 162 Multidimensional Arrays 163 Eponymous Variables 164 Drawbacks and Advantages 165 VALUE TECHNIQUES 166 Uninitialized and Persistent Variables 166 Example 166 Inline Incrementing and Decrementing 167 Line Counters 168 The SHOWNUM Utility 168 Quick Reference Screen 168 Converting a Number 169 Converting a Character 169 Windows Considerations 170 PROGRAMMING TECHNIQUES 171 Tracing 171 Logging 171 COMMAND-LINE TECHNIQUES 173 Quiet Mode 173 User-Specified Command-Line Parameters 173 Case Considerations 174 Spaces in Values 174 Command-Line Switches 175 Hex and Decimal Code Strings 175 Summary 176 FILE HANDLING 177 How Parse-O-Matic Searches for a File 177 How Parse-O-Matic Opens an Output File 178 Appending to an Output File 180 Sending Output to a Device 181 COM Ports 181 DbF Files 182 POM and Wildcards 182 OPERATIONAL TECHNIQUES 184 Parse-O-Matic Job (POJ) Files 184 Simple Usage 184 Commenting a Job File 185 Prompting for File Names 185 Suggesting File Names 186 Optional Parameters 186 Examples 187 Encrypted (Scrambled) POM Files 187 The SCRAMBLE Utility 187 Why Scramble a POM File? 188 Support Considerations 188 Batch Files 189 Unattended Operation 193 Examples 195 OPERATIONAL CONSIDERATIONS 196 Running Parse-O-Matic on 8088 and 8086 Machines 196 Running Parse-O-Matic from Another Program 196 Solving Memory Problems 196 RUNNING UNDER WINDOWS 197 Compatibility 197 Setting Up for Windows 95 197 Setting Up an Association for the POM File 197 Setting Up an Association for a POJ (Job) File 198 Setting Up an Association for the BAT File (Optional) 199 Setting Up for Windows 98, Me, 2000 200 Additional Associations (Optional) 200 Installing the ShowNum Utility 201 Long File Names in Windows 201 The WINUTIL Utility 201 Limitations 202 Quick Reference Screen 202 Sample Batch File 202 Detecting if Windows is Running 203 Copying the Clipboard to a Text File 203 Copying a Text File to the Clipboard 204 The SEE Utility 204 ============================================================================ INTRODUCTION ============================================================================ ---------------------- What is Parse-O-Matic? ---------------------- Parse-O-Matic is a programmable file-parser. Simple enough for even a non- programmer to master, it can help out in countless ways. If you have a file you want to edit, manipulate, or change around, Parse-O-Matic may be just the tool you need. Parse-O-Matic can also speed up or automate long or repetitive editing tasks. ----------------------------------------- Parse-O-Matic Versus Automatic Converters ----------------------------------------- Parse-O-Matic is not an "automatic file converter". It will not, for example, convert WordPerfect files to MS-Word format, or convert Lotus 1-2-3 Spreadsheets DIRECTLY to Excel files -- although it can read reports from one program and convert them to another format (e.g. comma-delimited), which can be imported by the other program. One advantage of this method (as opposed to automatic file conversion) is that you can create an "intelligent" importing procedure, which can make decisions and modify data. You could, for example, eliminate certain types of records, tidy up names, convert case, unify fields, make calculations, and so on. 1 ---------------------------------------- Why You Need Parse-O-Matic -- An Example ---------------------------------------- There are plenty of programs out there that have valuable data locked away inside them. How do you get that data OUT of one program and into another one? Some programs provide a feature which "exports" a file into some kind of generic format. Perhaps the most popular of these formats is known as a "comma-delimited file", which is a text file in which each data field is separated by a comma. Character strings -- which might themselves contain commas -- are surrounded by double quotes. So a few lines from a comma-delimited file might look something like this (an export from a hypothetical database of people who owe your company money): "JONES","FRED","1234 GREEN AVENUE", "KANSAS CITY", "MO",293.64 "SMITH","JOHN","2343 OAK STREET","NEW YORK","NY",22.50 "WILLIAMS","JOSEPH","23 GARDEN CRESCENT","TORONTO","ON",16.99 Unfortunately, not all programs export or import data in this format. Even more frustrating is a program that exports data in a format that is ALMOST what you need! If that's the case, you might decide to spend a few hours in a text editor, modifying the export file so that the other program can understand it. Or you might write a program to do the editing for you. Both solutions are time-consuming. An even more challenging problem arises when a program which has no export capability does have the ability to "print" reports to a file. You can write a program to read these files and convert them to something you can use, but this can be a LOT of work! ---------------------------- Parse-O-Matic to the Rescue! ---------------------------- Parse-O-Matic is a utility that reads a file, interprets the data, and outputs the result to another file. It can help you "boil down" data to its essential information. You can also use it to convert NEARLY compatible import files, or generate printable reports. 2 ------------ How It Works ------------ You need three things: 1) The Parse-O-Matic program 2) A Parse-O-Matic "POM" file (to tell Parse-O-Matic what to do) 3) The input file The input file might be a report or data file from another program, or text captured from a communications session. Parse-O-Matic can handle many types of input. We've provided several sample input files. For example, the file XMPDAT02.TXT comes from the AccPac accounting software. AccPac is a great program, but its export capabilities leave something to be desired. Parse-O-Matic can help! To see detailed demonstrations of how various files can be parsed, enter INFO at the DOS prompt (or run INFO.BAT from Windows or OS/2), then select TUTORIAL from the menu. ----------------- How To Contact Us ----------------- If you have any questions about Parse-O-Matic, you can write to us at the following address: Pinnacle Software, 80 Mornelle Court #310, Toronto, ON, Canada M1E 4P8 You can also contact us electronically at the following addresses: Voice Line: 416-287-8892 Internet Email: psoftinfo@aol.com Web Site: http://members.aol.com/psoftinfo 3 ============================================================================ FUNDAMENTALS ============================================================================ This documentation assumes that you are an experienced computer user. If you have trouble, you might ask a programmer to help you -- POM file creation is a little like programming! ------------------------- The Parse-O-Matic Command ------------------------- The basic format of the Parse-O-Matic command line is: POM pom-file input-file output-file Here is an example, as you would type it at the DOS command line, or as a command in a batch file: POM POMFILE.POM REPORT.TXT OUTPUT.TXT For a more formal description of the command line, start up POM by typing this command at the DOS prompt: POM Another method of calling the POM command is to specify a job (.POJ) file. This is explained later, in the "Operational Planning" chapter -- see "Parse-O-Matic Job (POJ) Files". Briefly, a job file lets you save the Parse-O-Matic command-line specifications in a text file. ------------ The POM File ------------ The POM file is a text file with a .POM extension. The following conventions are used when interpreting the POM file: - Null lines and lines starting with a semi-colon (comments) are ignored. - A POM file may contain up to 750 lines of specifications. Comment lines do not count in this total. A POM file does not rely on "loops" (to use the programming term). Each line or record of the input file is processed by the entire POM file. If you would like this expressed in terms of programming languages, here is what POM does: 4 +-------------------------------------------------------------------------+ | START: If there's nothing left in the input file, go to QUIT. | | Read a line from the input file | | Do everything in the POM file | | Go to START | | QUIT: Tell the user you are finished! | +-------------------------------------------------------------------------+ The method by which Parse-O-Matic finds the POM file is discussed in the section "How Parse-O-Matic Searches for a File". ------------------- Padding for Clarity ------------------- Spaces and tabs between the words and variables in a POM file line are generally ignored (except in the case of the "output picture" of the OUT and OUTEND commands). You can use spaces to make the commands in your POM files easier to read. Additionally, in any line in the POM file, the following terms are ignored: THEN ELSE (There is a POM command named ELSE, but Parse-O-Matic can tell that this is not "padding".) Finally, the equals ("=") character is ignored if it is found in a place where no comparison is taking place. This will be demonstrated below. You can use these techniques to make your POM files easier to read. For example, the IF command can be written in several ways: Very terse: IF PRICE = "0.00" BONUS "0.00" "1.00" Padded with spaces: IF PRICE = "0.00" BONUS "0.00" "1.00" Fully padded: IF PRICE = "0.00" THEN BONUS = "0.00" ELSE "1.00" In the last example, the first equals sign ("=") is a "comparator". (For details about comparators, see the section entitled "Using Comparators".) The second equals sign is not really required, but it does make the line easier to understand. 5 ---------------- A Simple Example ---------------- Let's say you have a text file called NAMES.TXT that looks like this: WILLIAMS JACK SMITH JOHNNY JOHNSON MARY : : Column 1 Column 12 Now let's say you want to switch the columns, so that the first name appears first. Your first step is to create a file using a text editor. The file would look like this: SET last = $FLINE[ 1 10] SET first = $FLINE[12 17] PAD first "R" " " "10" OUTEND |{first} {last} The first two lines tell Parse-O-Matic which text to extract from each input line. For the first line of the input file, the variable named 'last' will be given the value "WILLIAMS ". You will notice there are two spaces at the end. That is because we take every character from position 1 to position 10 -- which in this case includes two spaces. The PAD line adds enough spaces on the right side of the variable named 'first' to make sure that it is 10 characters long. The OUTEND command sends the two variables to the output file. Save the file with the name TEST.POM and exit your text editor. At the DOS prompt, enter this command: POM TEST.POM NAMES.TXT OUTPUT.TXT This will run the POM file (TEST.POM) on every line of the input file (NAMES.TXT) and place the output in the file OUTPUT.TXT, which will then look like this: JACK WILLIAMS JOHNNY SMITH MARY JOHNSON : : Column 1 Column 12 Of course, for such a simple task, it would be easier to switch the columns yourself, using a text editor. But when you are dealing with large amounts of data, and want to guard against typing errors, Parse-O-Matic can save you a lot of time, effort and risk. It also lets you automate editing operations that you perform frequently. 6 ============================================================================ QUICK REFERENCE ============================================================================ -------------------- Command Descriptions -------------------- This manual's explanations of the commands are grouped by related functions, in the following order: --------------------------------------------------------------------------- Basic Commands --------------------------------------------------------------------------- SET Assigns a value to a variable IF Conditionally assigns a value to a variable --------------------------------------------------------------------------- Output Commands --------------------------------------------------------------------------- KEEP Do a page eject if less than the specified number of lines remain OFILE Specify output file or device OUT Sends text and variables to the output file OUTEND Like OUT but adds a new line at end (Carriage Return/Linefeed) OUTHDG Sets up title lines to appear at the top of a report or each page OUTPAGE Starts a new page PAGELEN Sets the page length for a report --------------------------------------------------------------------------- Input Commands --------------------------------------------------------------------------- GET Manually reads bytes from the input file GETTEXT Manually reads bytes from the input file, converts them to text READNEXT Moves to next input line but retains your place in the POM file REWIND Backs up a specified number of bytes in an input file PEEK Looks at the next line in a text input file --------------------------------------------------------------------------- Input Filters --------------------------------------------------------------------------- MINLEN Sets minimum length required for an input line to be processed IGNORE Ignores an input line that meets the specified condition ACCEPT Accepts an input line that meets the specified condition 7 --------------------------------------------------------------------------- Flow Control Commands --------------------------------------------------------------------------- BEGIN Defines the conditions for processing the code block CALL Calls a subroutine code block CODE Marks the start of a subroutine code block ELSE Defines the start of code to be processed if the BEGIN fails END Ends a code block (BEGIN/END, CODE/END, PROLOGUE/END etc.) AGAIN Conditionally returns to the corresponding BEGIN command DONE Reads the next input line and starts at the top of the POM file NEXTFILE Skips the current input file and proceeds to the next (if any) HALT Terminates all processing if a given condition exists SETERROR Sets the program return code PROLOGUE Defines code block to run before any input lines are processed EPILOGUE Defines code block to run after all input lines are processed --------------------------------------------------------------------------- Variable Modifiers --------------------------------------------------------------------------- TRIM Removes a character from the left, right or all of a variable PAD Centers, or left/right-justifies variable to a specified width CHANGE Replaces all occurrences of a string in a variable PROPER Properizes a variable (e.g. "JOHN SMITH" becomes "John Smith") INSERT Inserts a string on the left or right, or at a "found" position APPEND Concatenates several variables into one variable OVERLAY Extends or overwrites portions of a variable CVTCASE Converts a value to uppercase or lowercase MAPFILE Reads a file containing data transformations for REMAP REMAP Transforms sub-strings into other strings --------------------------------------------------------------------------- Free-Form Commands --------------------------------------------------------------------------- PARSE Obtains a variable found between delimiters in free-form data PEEL Works like PARSE, but removes the "found" text from the data PEELX Works like PEEL, but assumes presence of delimiter on last search --------------------------------------------------------------------------- Positional Commands --------------------------------------------------------------------------- SETLEN Sets a variable according to the length of a value DELETE Removes a range of characters from a variable COPY Copies a range of characters from a value to a variable EXTRACT Like COPY, but removes the characters from the source variable FINDPOSN Finds the starting or ending position of a value in another SCANPOSN Finds "best match" in list of values, returns start/end positions --------------------------------------------------------------------------- Date Commands --------------------------------------------------------------------------- TODAY Sets a variable to today's date, in a variety of formats DATE Sets a given year, month and day, in a variety of formats MONTHNUM Sets the month number of a given month, expressed as text 8 --------------------------------------------------------------------------- Calculation Commands --------------------------------------------------------------------------- CALC Performs arithmetic functions on integer values CALCREAL Performs arithmetic functions on decimal values DEC Subtract a number from another (used for counters) INC Add a number to another (used for counters) ROUNDING Controls rounding in CALCREAL operations --------------------------------------------------------------------------- Input Preprocessors --------------------------------------------------------------------------- SPLIT Breaks up a wide text file (more than 255 characters) CHOP Breaks up a fixed-record-length file --------------------------------------------------------------------------- Lookup Commands --------------------------------------------------------------------------- LOOKUP Looks up a word in another file and returns a corresponding value LOOKFILE Specifies the file that the LOOKUP command will use (see also /L) LOOKCOLS Specifies the format of the look-up file LOOKSPEC Controls the behavior of the LOOKUP command --------------------------------------------------------------------------- Data Converters --------------------------------------------------------------------------- MAKEDATA Converts text into binary format MAKETEXT Converts binary format into text --------------------------------------------------------------------------- Miscellaneous Commands --------------------------------------------------------------------------- ERASE Deletes a file FILESIZE Obtains the size of a file (in bytes) GETENV Obtains a system environment variable (e.g. PATH) LOG Adds a line to the processing log MSGWAIT Controls the behavior of error messages PAUSE Delays the specified number of milliseconds RANDOM Generates a random number SHOWNOTE Displays a message on the processing screen SOUND Makes a noise or sets the noise generated by error messages TRACE Traces a variable (results saved in the text file POM.TRC) 9 --------------- Command Formats --------------- The following comment lines are for quick-reference. You can copy them into your own POM files to make programming easier. ---------------------------------- --------------------------------------- COMMAND FORMATS EXAMPLE ---------------------------------- --------------------------------------- ACCEPT comp ACCEPT $FLINE[1 3] = "YES" AGAIN [comp] AGAIN linecntr #< "3" APPEND var val val [val [val]] APPEND name first last BEGIN [comp] BEGIN linecntr #< "3" CALL val CALL "Format Price Field" CALC var num operation num CALC total total "+" sold CALCBITS var char operation char CALCBITS z byte1 "XOR" $80 CALCREAL var num operation num CALCREAL salary hours "*" rate CHANGE var val val CHANGE date "/" "-" CHOP from to [,from to] [...] CHOP 1 250, 251 300 CODE val CODE "Format Price Field" COPY var val ixfrom [ixto] COPY x $FLINE "-2" "-1" CVTCASE var val [ctl] CVTCASE x $FLINE "L7" DATE var num num num [ctl] DATE x "98" "12" "31" DEC var val DEC x "2" DELETE var ixfrom [ixto] DELETE x "3" "5" DONE [comp] DONE $FLINE = "End Data" ELSE ELSE END END EPILOGUE EPILOGUE ERASE file ERASE "C:\MYFILES\OUTPUT.TXT" EXTRACT var var ixfrom [ixto] EXTRACT x $FLINE "15" "30" FILESIZE var file FILESIZE x "C:\MYFILES\INPUT.TXT" FINDPOSN var val left [right [ctl]] FINDPOSN x $FLINE "2*/" GET var ctl [ctl [ctl]] GET x #0 "END" "I" GETENV var val GETENV x "COMSPEC" GETTEXT var ctl [ctl] GETTEXT date "WORD" "DATE" HALT comp val [ctl] HALT x = y "Item repeated" IF comp var val [val] IF x = "Y" THEN z = "N" INC var val INC x "2" IGNORE comp IGNORE price = "0.00" INSERT var ctl val INSERT price "L" "$" KEEP num KEEP "5" LOG comp val [val [val]] LOG x = y "Item repeated" LOOKCOLS num num num num LOOKCOLS "1" "3" "8" "255" LOOKFILE file LOOKFILE "C:\TABLES\DATA.TBL" LOOKSPEC ctl ctl ctl LOOKSPEC "Y" "N" "N" LOOKUP var val LOOKUP phonenum "FRED JONES" MAKEDATA var val ctl MAKEDATA x "255" "BYTE" MAKETEXT var val ctl MAKETEXT z x "BYTE" MAPFILE file val [ctl] MAPFILE "XYZ.MPF" "XYZ" "ANYCASE" MINLEN num [num] MINLEN "15" "1" MONTHNUM var val MONTHNUM x "February" 10 MSGWAIT num MSGWAIT "60" NEXTFILE [comp] NEXTFILE $FLINE = "End File" OFILE file [val [ctl]] OFILE "C:\MYFILES\OUT.TXT" OUT [comp] |pic OUT z = "X" |{price} OUTEND [comp] |pic OUTEND z = "X" |{$FLINE} OUTHDG val OUTHDG "LIST OF EMPLOYEES" OUTPAGE [comp] OUTPAGE partnum <> oldpartnum OVERLAY var val from OVERLAY x "***" "15" PAD var ctl char num PAD sernum "L" "0" "10" PAGELEN num [ctl] PAGELEN "66" "N" PARSE var val left right [ctl] PARSE x $FLINE "2*(" "3*)" "I" PAUSE num PAUSE "1000" PEEK var PEEK nextline PEEL var var left right [ctl] PEEL x $FLINE "2*(" "3*)" "I" PEELX var var left right [ctl] PEELX word wordlist "" " " PROLOGUE PROLOGUE PROPER var [ctl [file]] PROPER custname "I" "XY.PEF" RANDOM var val val RANDOM roll "1" "6" READNEXT [comp] READNEXT $FLINE[1 5] = "NOTE:" REMAP var [val] REMAP $FLINE "BIN2CODE" REWIND [num] REWIND "15" ROUNDING val ROUNDING "N" SCANPOSN var var val val [ctl] SCANPOSN from to $FLINE "/MR/MISS/MRS" SET var val [val [val]] SET price $FLINE[20 26] "L$" "0" SETERROR [comp] val SETERROR custname = "NONE" "123" SETLEN var val SETLEN length custname SHOWNOTE val [val] [val] [...] SHOWNOTE "Processing record #" recnum SOUND ctl SOUND "BUZZ" SPLIT from to [,from to] [...] SPLIT 1 250, 251 300 TODAY var [ctl] TODAY x "?y/?n/?d" TRACE var TRACE price TRIM var ctl char TRIM price "R" "$" ZERODATE num num num ZERODATE "1753" "12" "31" ---------------------------------- --------------------------------------- The following conventions are used in the preceding table: comp A comparison (Example: Name = "John") ctl Variable or literal: command control specifications file File name (see "How Parse-O-Matic Searches for a File") from Variable or literal: a starting character position (see Note #1) ixfrom Variable or literal: character position (beginning from...) ixto Variable or literal: character position (ending at...) left Variable or literal: a delimiter-search parameter (decapsulator) num Variable or literal: must contain a number (see Note #1) pic Output picture used by OUT and OUTEND right Variable or literal: a delimiter-search parameter (decapsulator) to Variable or literal: an ending position (see Note #1) val Variable or literal whose value is being read var Variable that is being set [xxx] Square brackets indicate optional items 11 Note #1: Tabs, spaces and commas are stripped from numeric values The commands are explained in detail in the following sections. A summary of the commands and default settings appear, as comments, in the file QUICKREF.POM. You can copy these comments into your own POM file as a convenient quick reference. 12 ============================================================================ BASIC COMMANDS ============================================================================ --------------- The SET Command --------------- FORMAT: SET var1 value1 [value2 [value3]] PURPOSE: SET assigns value1 to the variable var1. PARAMETERS: var1 is the variable being set value1 is the value being read value2 is the optional trimming string value3 is the optional null-handling string ALTERNATIVES: The COPY command, or just about any command that sets a variable. SEE ALSO: The TRIM Command ----------- Basic Usage ----------- The usual reason to use the SET command is to set a variable from the input line (represented by the variable $FLINE) prior to cleaning it up with TRIM. For example, if the input line looked like this: JOHN SMITH 555-1234 322 Westchester Lane Architect | | | | | Column 1 Col 12 Col 22 Col 33 Col 57 then we could extract the last name from the input line with these two POM commands: SET name1 = $FLINE[12 21] <-- Sets the variable name1 from the input line TRIM name1 "R" " " <-- Trims any spaces on the right side SET would first assign the variable name1 this value: "SMITH " After the TRIM, the variable name1 would have the value: "SMITH" You will also use SET if you plan to include a portion of text string in the output, since the OUT and OUTEND commands do not recognize substrings (e.g. myvar[10 20]) after the "|" marker; they only recognize plain text and complete variables. 13 ------------------- The Trimming String ------------------- After SETting a variable, you may wish to use one or more TRIM commands to "tidy up" the variable by removing leading and trailing spaces, extraneous commas, and so on. However, you can do this at the same time as you do the SET command, using the optional "trimming string". The trimming string is a list of pairs of characters. The first character in the pair is the TRIM specification, while the second is the character being trimmed. Consider the following POM code: SET xyz = "xx$3.00xx" SET price1 = xyz <-- Sets price1 to "XX$3.00XX" TRIM price1 "B" "x" <-- Sets price1 to "$3.00" TRIM price1 "L" "$" <-- Sets price1 to "3.00" This achieves the desired result (i.e. it gets rid of the "x" characters and the dollar sign), but it takes up four lines of code. You can use SET's trimming string to accomplish the same thing: SET xyz = "xx$3.00xx" SET price1 = xyz "BxL$" <-- Sets price1 to "3.00" The "BxL$" trimming string (made up of the character pairs "Bx" and "L$") instructs Parse-O-Matic as follows: CHAR TRIMMING PAIR OPERATION ---- --------- Bx Strip away "x" on Both sides of the value, yielding "$3.00", then... L$ Strip the dollar sign on the Left side, yielding "3.00" The trimming string is interpreted from left to right. Thus, the following commands would NOT produce the same result as the previous example: SET xyz = "xx$3.00xx" SET price1 = xyz "L$Bx" Parse-O-Matic will try to strip the dollar signs on the left, but since there are none there, it will move on to the next step without changing anything. It will then remove the "x" characters. The final result will be "$3.00", not "3.00". 14 ---------------------- Non-Keyable Characters ---------------------- In the trimming string, the second character in each character-pair must be a keyable character (i.e. something you can type on your keyboard and see on your screen). You can not use the $hex or #decimal representations (see "Values" in the "Terms" chapter). The following command is valid: SET num1 = $FLINE[10 20] "B A," This will remove the spaces on both sides and eliminate any commas. The following command is NOT valid: SET hex1 = $FLINE[10 20] "B$00R#13" The SET command can not translate the $00 code into "hex character zero". Similarly, it does not recognize the #13 as "decimal character 13" (which is, by the way, the ASCII "carriage return" character). To strip non-keyable characters, use the TRIM command. ------------------------ The Null-Handling String ------------------------ Sometimes you want to assign a "default" value to a variable if it turns out to be null (i.e. empty). Here is one way to do this: SET xyz = "" <-- This is a null string SET price1 = xyz <-- Sets price1 to "" IF price1 = "" THEN price1 = "0" <-- Sets price1 to "0" It is easier to do this by using the null-handling string, as in this example: SET xyz = "$3.00" <-- This is a normal price SET price1 = xyz "L$" "0.00" <-- Sets price1 to "3.00" SET xyz = "$" <-- This is just a dollar sign! SET price2 = xyz "L$" "0.00" <-- Sets price2 to "0.00" In the first case (xyz = "$3.00"), all we had to do was strip away the dollar sign to obtain the price. In the second case (xyz = "$"), we ended up with a null value after stripping away the dollar sign. In such case, the null-handling string ("0.00") specified the default value. 15 Sometimes you don't want to do any triming, but you do want to check for a default value. Nevertheless, the trimming string must be included in the command, as a "place holder". For example: SET xyz = "Fred Smith" <-- This is a normal name SET name1 = xyz "" "Unknown" <-- Sets name1 to "Fred Smith" SET xyz = "" <-- This is a null string SET name2 = xyz "" "Unknown" <-- Sets name2 to "Unknown" As you can see, no trimming was necessary, but we nevertheless had to include the trimming string, even though it was null (meaning, "Don't do any trimming"). -------------- The IF Command -------------- FORMAT: IF value1 [comparator] value2 var1 value3 [value4] PURPOSE: If value1 equals value2, var1 is set to value3. Otherwise, it is set to value4 (if value4 is missing, nothing is done, and var1 is not changed). NOTES: For an explanation of comparators, see "Using Comparators". In the following explanation, we will demonstrate the command using only the "literally identical" ("=") comparator. ALTERNATIVES: The BEGIN command Here is an example of the IF command... SET EARNING = $FLINE[20 23] IF EARNING = "0.00" THEN BONUS = "0.00" ELSE "1.00" This obtains the value between columns 20 and 26, then checks if it equals "0.00". If it does, the variable BONUS is set to 0.00. If not, BONUS is set to "1.00". The "THEN" and "ELSE" are "padding" and can be omitted. 16 ============================================================================ OUTPUT COMMANDS ============================================================================ ---------------- The KEEP Command ---------------- FORMAT: KEEP value1 PURPOSE: KEEP does a page eject if less than the specified number of lines remain on the page. PARAMETERS: value1 is the minimum number of lines required to avoid doing a page eject. ALTERNATIVES: The OUTPAGE command, used with $LINECOUNTER NOTES: KEEP has no effect unless the page length is set, using the PAGELEN command. SEE ALSO: "The OutHdg Command" When you are sending output to a file that will be printed (or sending output directly to a printer -- see "How Parse-O-Matic Opens an Output File"), you sometimes want to ensure that certain lines of data are kept together on the same page. The most common situation involves a heading that precedes some associated data; you do not want to have the heading by itself at the bottom of one page, with the data on the next. Consider this POM file: PAGELEN "55" <-- Set the output page length SET part = $FLINE[ 1 10] <-+ SET type = $FLINE[12 20] | Extract the fields SET quantity = $FLINE[22 30] <-+ BEGIN lasttype <> type <-- Detect a change in part type SET lasttype = part <-- Remember this part type KEEP "4" <-- Make sure we have 4 lines left OUTEND | <-- Output a blank line OUTEND |PART TYPE: {type} <-- Output a header END <-- End of code block OUTEND {part} {quantity} <-- Output part data This POM file will always make sure that at least two part numbers follow the heading; a heading will never be "stranded" by itself at the end of a page. 17 ----------------- The OFILE Command ----------------- FORMAT: OFILE value1 OFILE value1 [value2 [value3]] PURPOSE: OFILE specifies a new output file or device. PARAMETERS: value1 is the name of the output file or device value1 can also be a subcommand (i.e. a command for OFILE) value2 is the default extension for the DEDUCE subcommand value3 is the control value for the DEDUCE subcommand DEFAULTS: value2 defaults to "" for the DEDUCE subcommand value3 defaults to "WEAK" for the DEDUCE subcommand ALTERNATIVES: Specify the name of the output file on the POM command line SEE ALSO: "How Parse-O-Matic Opens an Output File" "Sending Output to a Device" When you start up Parse-O-Matic, you can specify the name of the output file on the command line. For example: POM MYPOM.POM INPUT.TXT OUTPUT.TXT In this case, the output file is named OUTPUT.TXT. All data from the output commands (OUT, OUTEND etc.) are sent to this file. If you omit the output file name from the POM command, like this: POM MYPOM.POM INPUT.TXT then Parse-O-Matic assumes the output file is named POMOUT.TXT (in the current directory). 18 ----------- Basic Usage ----------- Once the name of the output file has been determined, Parse-O-Matic will use that file until it is changed, using the OFILE command. For example: OFILE "C:\XYZ.TXT" This will change the output file to C:\XYZ.TXT. If the file already exists, it will be renamed with a BAK extension. However, you can tell Parse-O-Matic to append to the end of an existing file by placing a plus sign in front of the file name: OFILE "+C:\XYZ.TXT" (See "Appending to Output Files" and "POM and Wildcards" for additional details on appending to output files). ----------------------- Closing the Output File ----------------------- Sometimes you want your POM file to explicitly close the output file before doing additional processing. By closing the file explicitly, you ensure that all output has been written to the disk. To explicitly close the output file, use OFILE's CLOSE subcommand: OFILE "(CLOSE)" It is rarely necessary to explicitly close the output file, since this is done automatically when Parse-O-Matic finishes processing. You only need to do this if you suspect that a failure may occur between an OUT[END] command and the next OFILE command (or the completion of processing). ---------------- Strong Deduction ---------------- Sometimes you want to relate the name of the output file to the input file. For example, if the input file is XYZ.TXT, you might want to name the output file XYZ.OUT -- in other words, the same root name (XYZ) but a different extension (OUT). The OFILE command can do this with the DEDUCE subcommand, as follows: OFILE "(DEDUCE)" "OUT" "STRONG" The (DEDUCE) subcommand tells OFILE that you want it to use the same root name as the input file. The "OUT" part of this example is the extension you want to use. "STRONG" means that you want to override the existing output file, even if the user specified one on the command line. 19 Here is another example: OFILE "(DEDUCE)" "CSV" "STRONG" This uses the same root name as the input file, but uses the CSV extension (CSV usually signifies a comma-separated-value file). Once again, the "STRONG" parameter means that the new output file will be opened even if the user specified an output file on the POM command line. -------------- Weak Deduction -------------- Sometimes you want to set the output file name only if the user has not specified one on the POM command line. You can do this with OFILE's DEDUCE subcommand: OFILE "(DEDUCE)" "CSV" "WEAK" If the user ran POM with this command: POM MYFILE.POM ACCOUNT.DAT then the output file would be set to ACCOUNT.CSV. However, if the user explicitly specified an output file: POM MYFILE.POM ACCOUNT.DAT MYOUTPUT.TXT then the weak OFILE command would be ignored and output would continue to be directed to the MYOUTPUT.TXT file. Weak deduction is generally used in the PROLOGUE section of a POM file. It is particularly useful when wildcards are used (See "POM and Wildcards"). Consider this POM command: POM MYFILE.POM *.DAT You could put the following OFILE command in the PROLOGUE: OFILE "(DEDUCE)" "TXT" <-- "WEAK" is the default deduction This would create a separate output file for each file that is processed. So if you had three DAT files, named A.DAT, B.DAT and C.DAT, you would create three output files, named A.TXT, B.TXT and C.TXT. Because this is a weak deduction, the user is not forced to use your output method. If the user typed the POM command this way: POM MYFILE.POM *.DAT MYOUTPUT.TXT then all the output (from all of the input files) would go to MYOUTPUT.TXT. 20 NOTE: When a user does not specify an output file, Parse-O-Matic temporarily assumes that the output file is named POMOUT.TXT. If the user actually types POMOUT.TXT on the command line, Parse-O-Matic treats it as if no output file name had been typed. ------------------- Appended Deductions ------------------- An alternative to the DEDUCE subcommand is the +DEDUCE subcommand. This will figure out the output file name as before, but all output will be appended to the end. This is useful for daily processing when you want to accumulate data in your output file or files. For example: OFILE "(+deduce)" "txt" "strong" <-- Note that case does not matter If the user started POM with this command: POM MYFILE.POM ACCOUNT.DAT then the output file would be set to ACCOUNT.TXT, and all output would be placed at the end of the existing file of that name. --------------------------- The OUT and OUTEND Commands --------------------------- FORMAT: OUT[END] [value1 [comparator] value2] |output-picture PURPOSE: The OUT command generates output without an end-of-line (i.e. carriage return and linefeed characters). The OUTEND command generates output and also adds an end-of-line. NOTES: For an explanation of comparators, see "Using Comparators". In the following explanation, we will demonstrate the command using only the "literally identical" ("=") comparator. When value1 equals value2, a line is sent to the output file, according to the output picture. Within the output picture, all text is taken literally (i.e. " is taken to mean literally that -- a quotation mark character). The only exception to this is variable names, which are identified by the { and } characters. For example, a POM file that contained the following single line: OUTEND "X" = "X" |{$FLINE} would simply output every line from the input file (not very useful!). 21 The "X" = "X" part of the command is the comparison which controls when output occurs. In the example above, both values being compared are the same, so output will always occur. You can not use substrings after the "|" marker. Thus, the following line is NOT legal: OUTEND $FLINE[1 3] = "IBM" |{$FLINE[1 15]} The correct way to code this is as follows: SET CODE = $FLINE[1 15] OUTEND $FLINE[1 3] = "IBM" |{CODE} This outputs the first 15 characters of any line that contains the letters "IBM" in the first three positions. ----------------------- Generating a Blank Line ----------------------- To send a blank line to a text output file, specify OUTEND without any data following the | marker, as follows: OUTEND | -------------- Missing Output -------------- If you find that an OUT or OUTEND command is not displaying a variable, but puts "nothing" in its place, check the spelling of the variable name in each place it is used. Consider this example: SET varablex = $FLINE[1 12] OUTEND |XX{variablex}ZZ The SET command spelled "variablex" incorrectly -- it left out the "i". When OUTEND encounters the variable named "variablex", it sees that it does not have a value, so it replaces it with "nothing", yielding the result "XXZZ". ------------------ The OUTHDG Command ------------------ FORMAT: OUTHDG value1 PURPOSE: OUTHDG is used to place text headers in your output. ALTERNATIVES: The OUTEND command, used with PROLOGUE SEE ALSO: "The PageLen Command" and "The OutPage Command" 22 If you were parsing data to create an employee report, you might use OUTHDG like this: SET EMPNUM = $FLINE[ 1 5] SET NAME = $FLINE[10 28] SET PHONE = $FLINE[30 45] OUTHDG "EMPL# NAME PHONE NUMBER" OUTHDG "----- ------------------- ------------" OUTEND |{EMPNUM} {NAME} {PHONE} The value following the OUTHDG command is sent to the output file only once. That is to say, after an OUTHDG sends a value to the output file, subsequent encounters with that OUTHDG command are ignored -- unless the PAGELEN command is used. To specify a blank line in a heading, use the following command: OUTHDG "" If your output is bound for a continuous-paper printer (e.g. a dot-matrix printer with tractor feed), you may find it useful to use one or more blank lines at the beginning of the header, in order to skip over the perforation in the paper. ------------------- The OUTPAGE Command ------------------- FORMAT: OUTPAGE [value1 [comparison] value2] PURPOSE: Sends a page eject to the output file (or device). NOTES: For an explanation of comparators, see "Using Comparators". SEE ALSO: "The Pagelen Command", "The OutHdg Command", "$LINECOUNTER" If the comparison in the OUTPAGE command is true, or if it is omitted, OUTPAGE will send a "page eject" to the output file or device. (See "Sending Output to a Device") Some exceptions apply, however. The page eject is not sent under the following circumstances: - If the comparison is false (e.g. OUTPAGE "Y" = "N") - If the page length is set to "0" (the default). Use the PAGELEN command to specify a different page length. - If the output file is not yet open. That is to say, if no output has been sent to the output via one of the other output commands (e.g. OUT, OUTEND, OUTHDG), then OUTPAGE will do nothing. (See "How Parse-O-Matic Opens an Output File") - If the output is already at the top of a page. 23 If form feeds are enabled (via the PAGELEN command), OUTPAGE sends a page eject by sending a Form Feed character (ASCII 12) to the output. If form feeds are not enabled, OUTPAGE sends blank lines (i.e. linefeeds) until the requisite number of lines appear on the page. OUTPAGE does NOT automatically place OUTHDG text at the top of the page. OUTHDG text is not "stored"; it is executed in the POM file at the place it occurs. Here is an example of using OUTPAGE and OUTHDG together: PAGELEN "55" "Y" SET partnum = $FLINE[ 1 7] SET descrip = $FLINE[12 60] OUTPAGE partnum <> oldpartnum OUTHDG |PARTNUM DESCRIPTION OUTHDG |------- ----------- OUTEND |{partnum} {descrip} SET oldpartnum = partnum This will generate a new page, complete with headings, when the partnum variable is different from the oldpartnum variable. Also, because of the interaction between OUTHDG and PAGELEN, they headings will appear on a new page if you run out of room on the current page. ------------------- The PAGELEN Command ------------------- FORMAT: PAGELEN value1 [value2] PURPOSE: The PAGELEN command specifies the length of the output page. PARAMETERS: value1 is the page length value2 specifies if form feeds should be used NUMERICS: Tabs, spaces and commas are stripped from value1 DEFAULTS: value2 = "Y" When text is sent to an output file by OUTHDG and OUTEND, the lines are counted. The default value for page length is zero, which means that the output is a single page of infinite length. As such, OUTHDG headings appear only the first time they are encountered, and OUTPAGE commands are ignored. If you specify a page length greater than zero, OUTHDG headings become re-enabled once the specified number of output lines have been generated, or after an OUTPAGE command is performed. A typical value is as follows: PAGELEN "55" This is an ideal page length for most laser printers. Dot matrix printers typically use a page length of 66. 24 Parse-O-Matic inserts a "form feed" (ASCII 12) character between pages. You can turn this off, however, by specifying the page length this way: PAGELEN "66" "N" The "N" specification means, "No, don't use form feeds". Another acceptable value is "Y", meaning "Yes, use form feeds", but since this is the default, you do not have to specify it. 25 ============================================================================ INPUT COMMANDS ============================================================================ --------------- The GET Command --------------- ** ADVANCED COMMAND FOR EXPERIENCED USERS ** FORMAT: GET var1 value1 [value2] (Variable length records) GET var1 value1 "END" [value3] (Delimiter-terminated data) GET var1 "EOF" (Detects end-of-file) PURPOSE: Manually reads bytes from the input file. RESTRICTIONS: The input file must be described with CHOP or SPLIT. NOTES: Data is normally read automatically from the input file. GET is used only when you want precise control of the reading process. GET works only with files whose format is defined by a CHOP or SPLIT command. (You can read a file a byte at a time by using CHOP 1-1 in your POM file. You can also use CHOP 0 to do all reading manually.) SEE ALSO: "The Chop Command" and "The Split Command" The GET command is especially helpful for: 1) Variable length records 2) Delimiter-terminated data (such as zero-terminated text strings) 3) Text files with embedded binary data These methods are described in detail below. 26 ----------------------- Variable Length Records ----------------------- FORMAT: GET var1 value1 [value2] PURPOSE: Reads a variable-length record. PARAMETERS: var1 is the variable being set value1 specifies how many bytes to read, expressed as: A value in text format (example: GET x "10") A predefined data type (example: GET x "INTEGER") A value in byte format (example: GET x len "BYTE") value2 specifies the data representation used by value1 This can be "TEXT" (the default) or "BYTE" May also include "STRICT" (the default) or "LOOSE" NUMERICS: Tabs, spaces and commas are stripped from value1, if it is numeric, and in text format DEFAULTS: value2 = "TEXT" SEE ALSO: "Predefined Data Types" GET can read up to 255 bytes into a variable, as specified by value1. For example: GET xyz "10" This reads 10 bytes from the input file into the xyz variable, and advances the file pointer. That is to say, after the GET command shown above is executed, the next data Parse-O-Matic reads will be 10 bytes further along. If the requested number of bytes is not available in the input file, Parse-O-Matic terminates with an error message. In a typical application, variable-length data is preceded in the input file by a byte that gives its length. You can read the length, then use it directly, as follows: GET len "1" "TEXT" <-- Get the length byte GET xyz len "BYTE" <-- Read in the data In the first command, the word "TEXT" means that the length specification (i.e. "1") is plain text. ("TEXT" is the default, so you can omit it.) In the second command, GET reads len bytes from the input file. The word "BYTE" means that the length specification is a binary number, not a text string. 27 To clarify this, let us assume that the input file contains a length byte (say hex 4F, which equals 79 in decimal). This is followed by 79 bytes of data. The first GET command (GET len "1") reads in the length byte (hex 4F or decimal 79). The second GET command (GET xyz len "BYTE") reads 79 bytes and places the result in the xyz variable. The maximum variable length that a single GET command can handle is 255 bytes (i.e. the largest number represented by a single byte). Here are some additional examples of the GET command: SAMPLE COMMAND EXPLANATION ----------------- ----------- GET x "5" "TEXT" Reads 5 bytes into the x variable GET x "5" Same as above (since "TEXT" is the default) GET x len In this case, len must contain a text number (e.g. "7") GET x len "BYTE" In this case, len must be a byte (i.e. binary format) When the number is in "TEXT" format, spaces and tabs are ignored. Thus, the following command is valid: GET abc " 5 " "TEXT" You can also specify the length of the data as a predefined data type (see "Predefined Data Types" and "The MakeData Command"). Some examples... SAMPLE COMMAND EXPLANATION ----------------- ----------- GET x "INTEGER" Reads in an integer value (2 bytes long) GET x "SHORTINT" Reads in a short integer value (1 byte long) GET x "BYTE" Reads in a byte value (1 byte long) GET x "LONGINT" Reads in a long integer (4 bytes long) GET x "REAL" Reads in a real value (6 bytes long) GET x "REAL 2" Same as above (the decimal precision value 2 is ignored) TECHNICAL NOTE: In some applications, you will find that a variable-length record may be followed by a "noise" byte. This can occur if the program that created the input file "aligns data to word boundaries" and the record you are reading has an odd number of bytes. In such case, your POM file must determine (using CALC commands) if the length byte is odd or even, and react accordingly. 28 ------------------------- Delimiter-Terminated Data ------------------------- FORMAT: GET var1 value1 "END" [value3] PURPOSE: Reads delimiter-terminated data from the input file. PARAMETERS: var1 is the variable being set value1 is the terminating character you are searching for "END" means you are searching for a terminating character value3 is "I" (for Include) or "X" (for eXclude) DEFAULTS: value3 = "X" ALTERNATIVES: The PARSE and PEEL commands The FINDPOSN command used with the COPY command One common way to represent variable-length text data in a file is to terminate the text string with the null (ASCII 0) character. You can read in this kind of data with the GET command, as follows: GET abc #0 "END" <-- #0 means ASCII zero (See "Values") This reads the input file until the null (ASCII 0) character is found, or until 255 characters have been read in (whichever comes first). The terminating character is not included in the string unless you explicitly request it. There are two forms of GET command that control this behavior: GET abc #0 "END" "X" <-- Exclude the terminating character (default) GET abc #0 "END" "I" <-- Include the terminating character Here is a sample POM file that reads a data file that consists entirely of zero-terminated strings: CHOP 0 <-- This means you will handle all file reading GET abc #0 "END" <-- Read in the data OUTEND |{abc} <-- Send the data to the output file 29 --------------------------------------- Handling Long Delimiter-Terminated Data --------------------------------------- If some of the data is more than 255 characters long, you can handle it as follows: CHOP 0 <-- Handle all file reading manually GET data #0 "END" "I" <-- Include the terminating character SETLEN len data <-- Get the length of the string COPY lastchar data len <-- Get the last character BEGIN lastchar = #0 <-- Test the last character DELETE data len <-- Remove the last character (the terminator) OUTEND |{data} <-- Output the string, and start a new line ELSE OUT |{data} <-- Output the string, but stay on the same line END All of the examples given above assume that the terminating character is ASCII 0 (i.e. #0), because this is by far the most common terminator. However, you can use other values, if required: GET data "X" "END" In actual usage, it is not likely that you will find data strings that are terminated by an "X" character, but the capability is there if the need arises. ------------------------- Using GET with Text Files ------------------------- While the GET command is normally used with a file which is CHOPped (see "The Chop Command"), you may occasionally find it useful with ordinary text files. For example, an input file may be almost entirely text (i.e. each line ends with a carriage return and a linefeed), but may also contain some binary data. In such cases, you may find it useful to use the GET command to process the binary data. However, since GET is not available under standard text processing, you must describe the file with the SPLIT command. Here is an example which detects and extracts binary data until the character $FF is encountered: 30 SPLIT 1-255 <-- Process this as a SPLIT file BEGIN $FLINE[1 10] = "Binary:" <-- Detect the start of binary data BEGIN <-- Start of loop GET x "1" <-- Get a byte OUT |{x} <-- Send it to output AGAIN x <> $FF <-- See if we are finished ELSE <-- Handle ordinary text OUTEND |{$FLINE} <-- Output ordinary text END <-- End of BEGIN/ELSE/END code block Note that SPLIT 1-255 does not mean that each text line is 255 characters long; it means that each line is UP TO 255 characters long. -------------------------- End-of-File Considerations -------------------------- When you ask the GET command to get something (e.g. a certain number of bytes), it will normally cause Parse-O-Matic to terminate if it reaches the end of the input file before it has fulfilled its mission. The assumption here is that when you ask for something, you want precisely that -- not something less. However, in some parsing applications you do not know precisely what "lies ahead" in the input file. In such cases, you may run up against the end of file unexpectedly. GET provides two ways to handle this. The first method simply checks to see if you are already at the end of the file: GET x "EOF" This sets the x variable to "Y" (for Yes) if you are at the end of the input file. Otherwise, it sets the x variable to "N" (for No). Here is an example, which simply copies a file: CHOP 0 <-- Handle all file-reading manually BEGIN GET x "EOF" <-- Check if we're at the end of the input file BEGIN x = "N" GET byte "1" <-- Get a byte OUT |{byte} <-- Send it to the output file END AGAIN x = "N" An alternative method is to use a "LOOSE" GET command, as in this example: 31 CHOP 0 BEGIN GET x "1" "TEXT LOOSE" OUT |{x} AGAIN x <> "" The LOOSE parameter tells Parse-O-Matic "don't terminate if you have less than the specified number of bytes". (The default is "STRICT", but since it IS the default, you never actually have to include it in the command.) Thus, all of the following commands are valid: GET x $05 "BYTE LOOSE" <-- Get from 0 to 5 bytes of data GET x $05 "BYTE STRICT" <-- Get exactly 5 bytes of data, or terminate GET x "5" "TEXT LOOSE" <-- Get from 0 to 5 bytes of data GET x "5" "LOOSE" <-- Same as "TEXT LOOSE" GET x "1" <-- Same as "TEXT STRICT" ------------------- The GETTEXT Command ------------------- ** ADVANCED COMMAND FOR EXPERIENCED USERS ** FORMAT: GETTEXT var1 value1 [value2] PURPOSE: Manually reads bytes from the input file, then converts them into text format. PARAMETERS: var1 is the variable being set value1 is the predefined data type in the input file value2 is the MAKETEXT "convert from" parameter DEFAULTS: If value2 is omitted, it is assumed to be the same as value1 NOTES: Before studying this command, you should already be familiar with the GET and MAKETEXT commands. SEE ALSO: "Predefined Data Types" When reading a binary file, you frequently need to read numeric values then convert them to text. For example: GET x "WORD" <-- Read a two-byte number from the file MAKETEXT y x "WORD" <-- Convert it into text form You can do both of these operations at once, using the GETTEXT command: GETTEXT y "WORD" This reads a "WORD" (two binary bytes) from the input file, and then converts it into text (e.g. "1234"). 32 You only need to use value2 if you are converting a number to a text-based data type such as "DATE". For example: ZERODATE "1936" "1" "1" <-- Set "day zero" GETTEXT date "LONGINT" "DATE Y/M/?d" <-- Get and convert a date The GETTEXT command is also helpful if you are reading text data from a fixed-length field, but it is padded with spaces or nulls: GETTEXT x "80" "TRIMMED" This reads in 80 characters, then removes tabs, spaces and nulls from either end of the string. -------------------- The READNEXT Command -------------------- FORMAT: READNEXT [value [comparator] value] PURPOSE: The READNEXT command gets the next line of the input file (in other words, it replaces the current $FLINE), while maintaining your place in the POM file. NOTES: For an explanation of comparators, see "Using Comparators". SEE ALSO: "The MinLen Command" and "Line Counters" READNEXT is helpful if you know for certain what type of information the next line will contain. Here is an example: SET note = "" SET customer = $FLINE[1 20] BEGIN $FLINE ^ "See note below" READNEXT SET note = $FLINE[1 20] END OUTEND |{customer} {note} If the input line contains the words "See note below", Parse-O-Matic will read the next line of the input file (replacing the current $FLINE), thus obtaining the comment about the customer. ---------------------- End of File Conditions ---------------------- If you do a READNEXT at the end of the input file, READNEXT will set $FLINE to null (""). The POM file will continue processing. 33 -------------------- Optional Comparisons -------------------- READNEXT can make a comparison. This is useful for skipping extraneous lines of input. For example: READNEXT $FLINE[1 5] = "NOTE:" This obtains the next input line if the current input line starts with "NOTE:". ------------------- Ignoring Null Lines ------------------- By default, READNEXT will read null lines from the input file. If you want it to ignore null lines, you can use an optional parameter of the MINLEN command to specify a minimum length for the READNEXT command. For details, see "The MinLen Command". If you are reading a DBF (DBase) file, you can not "ignore null lines", because the data is not in line format. In such case, you must check a particular field to see if it is null. (See "DBF Files") If you are using the CHOP or SPLIT commands, it may not be particularly useful to "ignore null lines", since by definition you are requesting a particular number of bytes each time the input is read. Nevertheless, if you do a READNEXT at the end of the input file, READNEXT will set $FLINE to null (""), and continue processing the POM file. ------------------------ Saving the Previous Line ------------------------ When you do a READNEXT, there is no simple way to return to the previous line of the input file. You could use the REWIND command, but if you need a line for other work, it is usually much easier to save a copy: SET note = "" SET customer = $FLINE[1 20] SET saveline = $FLINE BEGIN $FLINE ^ "See note below" READNEXT SET note = $FLINE[1 20] END SET custnum = saveline[22 25] OUTEND |{custnum} {customer} {note} The example above is not very efficient; it would make more sense to extract custnum BEFORE you use READNEXT. However, in some applications you may find it necessary to save $FLINE before doing a READNEXT. 34 ---------------- The PEEK Command ---------------- FORMAT: PEEK var1 PURPOSE: Looks at the next input line in a simple text file. PARAMETERS: var1 is the variable being set RESTRICTIONS: PEEK can not be used with SPLIT, CHOPped or DBF files. For SPLIT or CHOPped files, you can simulate PEEK by using the GET and REWIND commands. NOTES: If you are at the end of the input file, PEEK will set var1 to a null value (""). WARNING: You can use PEEK only once for each pass through the POM file. If you use it more than once, you will lose data. ALTERNATIVES: The READNEXT command If you are processing a simple text file (i.e. a text file in which no input line exceeds 255 characters), you can find out what is on the next input line by using the PEEK command. Consider this input file: Fred Jones Pencil $ 0.25 Pen $ 1.25 Mary Smith Protractor $ 1.00 Compass $ 2.50 Pen $ 1.25 Calligraphy Kit $ 15.30 In this input file, a customer's name is followed by one or more items that he or she purchased. The input file does not contain a number telling us how many of each item there are. If it is our intention to add up the prices to obtain a total, it would be helpful if we had some warning that we have reached the last item. Fortunately, a null line (or end of file) follows each list of items. We can use the PEEK command to take advantage of this, as in the following POM file: 35 BEGIN $FLINE[18] <> "$" <-- Is this a price line? OUTEND |Customer Name: $FLINE <-- Output customer's name DONE <-- We're finished with this line END PEEK nextline <-- Find out what the next line contains BEGIN nextline = "" <-- If it's null, we have no more items OUTEND |Total: {total} <-- Output the total for all purchases SET total = "0.0" <-- Reset the total ELSE CALCREAL total = total "+" $FLINE[19 24] <-- Accumulate the total END ------------------ The REWIND Command ------------------ FORMAT: REWIND [value1] PURPOSE: REWIND backs up to an earlier point in the input file. NUMERICS: Tabs, spaces and commas are stripped from value1 DEFAULTS: If value1 is omitted, input file is rewound to the beginning RESTRICTIONS: The input file must be described with CHOP or SPLIT. SEE ALSO: "Using GET with Text Files" ----------------------- Why REWIND is Necessary ----------------------- Normally, when you process an input file, you read forward in the file. However, on occasion you may find it necessary to back up to an earlier point in the file. Here is a typical situation where this is necessary: you are looking for one of several strings of data, and when you find one of them, you want to: - Back up in the input file, to the beginning of the string you found - Use DONE to start processing the POM file from the top Because you rewound, the processing will include the text that you found. This is a handy alternative to saving the text and appending it to the front of $FLINE -- see "The Leftover Technique" for an example of that method. In another situation, you need to know something near the middle or end of the file, and once you have found out what that is, you want to rewind to the beginning and start processing again. This is known as "multiple-pass" processing, because you pass through the input file more than once. 36 ------------ Using REWIND ------------ If REWIND is used without any parameters (or if you specify REWIND "0"), Parse-O-Matic resets the input file to the beginning. This will usually be done in a BEGIN/END block, because if you reset the file each time you process the POM file, it will run forever. If REWIND is given a numeric parameter, it will back up that many bytes. (If you are processing a text file, remember to include two bytes for carriage return and linefeed, as necessary.) If the REWIND command is asked to rewind before the beginning of the file, it simply resets to the top. For example, if you have read 15 bytes out of the input file, and you issue the command REWIND "99", the next byte you read will be the first byte of the file. ------- Example ------- Consider this POM file: CHOP 0 <-- Handle all reading manually GET x "1" <-- Get a byte GET x "1" <-- Get another byte OUT |{x} <-- Output the byte REWIND "1" <-- Back up one byte GET x "1" <-- Get the same byte again OUT |{x} <-- Output the byte NEXTFILE <-- Stop processing this input file The code shown above will output the second character of an input file twice, then stop -- not very useful, except as a demonstration! 37 ============================================================================ INPUT FILTERS ============================================================================ ------------------ The MINLEN Command ------------------ FORMAT: MINLEN value1 [value2] PURPOSE: MINLEN specifies the minimum length an input line must be to be considered for parsing. PARAMETERS: value1 is the minimum input line length value2 is the minimum length for a READNEXT command NUMERICS: Tabs, spaces and commas are stripped from value1 and value2 DEFAULTS: value2 = "0" SEE ALSO: "The ReadNext Command" If you omit the MINLEN command, the minimum length is assumed to be 1. That is to say, all lines 1 character or longer will be processed and shorter lines (null lines in other words) will be ignored. MINLEN is useful for ignoring brief information lines that clutter up a report that you are parsing. For example, in the sample file EXAMPL02.POM, the MINLEN command is set to 85 to ensure that all lines shorter than 85 characters long will be ignored. This simplifies the coding considerably. The longest allowable input line is 255 characters, unless you use the SPLIT or CHOP command (see "The Split Command" and "The Chop Command"). The optional setting value2 specifies the minimum length for a READNEXT command. If omitted, this value is assumed to be "0", meaning that READNEXT will, by default, read null lines. If you set value2 to "1", READNEXT will keep reading until it finds an input line of 1 or more characters, or hits the end of file. The value2 setting has no effect if you are reading a DBF (DBase) file. 38 ------------------ The IGNORE Command ------------------ FORMAT: IGNORE value1 [comparator] value2 PURPOSE: When the comparison is true, the input line is ignored and all further processing on the input line stops. NOTES: For an explanation of comparators, see "Using Comparators". ALTERNATIVES: The ACCEPT and BEGIN commands Here is a typical application of the IGNORE command: IGNORE $FLINE[3 9] ^ "Date" This skips any input line that contains the word "Date" between columns 3 and 9 ($FLINE is the line just read from the input file). ------------------ The ACCEPT Command ------------------ FORMAT: ACCEPT value1 [comparator] value2 PURPOSE: The ACCEPT command accepts the input line if the comparison is true. value2. ACCEPT commands can be "clustered" to allow a series of related tests. NOTES: For an explanation of comparators, see "Using Comparators". In the following explanation, we will demonstrate the command using only the "literally identical" ("=") comparator. ALTERNATIVES: The IGNORE command If the entire POM file reads as follows: ACCEPT $FLINE[15 17] = "YES" OUTEND "X" = "X" |{$FLINE} then any input line that contains "YES" starting in column 15 is sent to the output file. All other lines are ignored. 39 ----------------- Clustered Accepts ----------------- Sometimes you have to check more than one value to see if the input line is valid. You do this by using "clustered ACCEPTs", which are several ACCEPT commands in a row. Briefly stated, if you have several ACCEPTs in a row ("clustered"), they are all processed to determine if the input line is acceptable or not. If even one ACCEPT matches up, the line is accepted. To express this in more detail... When the comparison is true, the line is accepted, and processing of the POM file continues for that input line, even if the immediately following ACCEPTs do NOT produce a match. After all, we've already got a match! If value1 does NOT contain value2, Parse-O-Matic looks at the next command in the POM file. If it is not another ACCEPT, the input line is ignored. If it is another ACCEPT, maybe it will product a match -- so Parse-O-Matic moves to that command. The following POM file uses clustered ACCEPTs to accept any line that contains the name "FRED" or "MARY" between columns 5 and 8, or contains the word "MEMBER" between columns 20 and 25. SET NAME = $FLINE[5 8] <-- Set the variable ACCEPT NAME = "FRED" <-- Look for FRED ACCEPT NAME = "MARY" <-- Look for MARY ACCEPT $FLINE[20 25] = "MEMBER" <-- Look for MEMBER OUTEND "X" = "X" |{$FLINE} <-- Output the line if we get this far The following example will NOT work, however: ACCEPT $FLINE[20 25] = "MEMBER" SET NAME = $FLINE[5 8] ACCEPT NAME = "FRED" ACCEPT NAME = "MARY" OUTEND "X" = "X" |{$FLINE} It will not work because the ACCEPTs are not clustered; if the first ACCEPT fails, the input line is rejected as soon as the SET command is encountered. The next two ACCEPTs are not reached in such case. 40 ============================================================================ FLOW CONTROL COMMANDS ============================================================================ ----------------- The BEGIN Command ----------------- FORMAT: The basic format for the BEGIN command is as follows: BEGIN value1 [comparator] value2 : Dependant code : END PURPOSE: If the comparison is true (e.g. value1 equals value2), then the dependant code (the POM lines between the BEGIN and the END) are executed. If the comparison is false, then the dependant code is skipped. NOTES: For an explanation of comparators, see "Using Comparators". In the following explanation, we will demonstrate the command using only the "literally identical" ("=") comparator. SEE ALSO: "The Else Command" and "The Again Command" It is traditional in programming to indent code that appears in blocks such as Parse-O-Matic's BEGIN/END technique. This makes the logic of the POM file easier for us to understand. For example: BEGIN datatype = "Employee" SET phone = $FLINE[ 1 10] SET address = $FLINE[12 31] END BEGIN/END blocks can be nested. That is to say, you can have BEGIN/END blocks inside other BEGIN/END blocks. Here is an example, with arrows to indicate the levels of each BEGIN/END block... BEGIN datatype = "Employee" <--------------------- SET phone = $FLINE[ 1 10] | SET address = $FLINE[12 31] | SET areacode = phone[1 3] | First BEGIN areacode = "514" <------- Second | Level SET local = "Y" | Level | Block SET tax = "Y" <------- Block | END | END <--------------------- 41 In this case, the "inner" block (starting with BEGIN areacode = "514") is reached only if the "outer" block (BEGIN datatype = "Employee") is true. If the outer block is false, the inner block is ignored. A nested BEGIN/END block must always be completely inside the outer block. Study the following (incorrect) example: BEGIN datatype = "Employee" <---- SET phone = $FLINE[ 1 10] | First SET areacode = phone[1 3] | Level BEGIN areacode = "514" <--- | Block? SET local = "Y" | | END | <---- SET tax = "Y" | END <--- Second Level Block? Parse-O-Matic does not pay attention to the indenting -- it is only a tradition we use to make the file easier to read. The code will be understood this way: BEGIN datatype = "Employee" <--------------------- SET phone = $FLINE[ 1 10] | First SET areacode = phone[1 3] | Level BEGIN areacode = "514" <--- Second | Block SET local = "Y" | Level | END <--- Block | SET tax = "Y" | END <--------------------- You can nest BEGIN/END blocks up to 25 deep -- although it is unlikely you will ever need that much nesting. Here is an example of code that uses nesting up to three deep: BEGIN datatype = "Dog" <---------------------------------- SET breed = $FLINE[1 10] | First BEGIN breed = "Collie" <----------------------- | Level SET noise = "Woof" | Second | Block BEGIN name = "Spot" <------ Third | Level | SET attitude = "Friendly" | Level | Block | END <------ Block | | END <----------------------- | BEGIN breed = "Other" <----------------------- Another | SET noise = "Arf" | Second | SET attitude = "Unknown" | Level | END <----------------------- Block | END <---------------------------------- Once again, the indentation is for clarity only and does not affect the way the POM file runs. However, you will find that it makes your POM file much easier to understand. 42 ---------------- The CALL Command ---------------- FORMAT: CALL value1 PURPOSE: Executes a subroutine (see "The Code Command") PARAMETERS: value1 is the name of the subroutine The basic concepts of the CALL and CODE commands are discussed in the section entitled "The Code Command". -------------- Variable CALLs -------------- value1 will usually be a literal (e.g. CALL "Output Field"). However, it can also be a variable. For example: IF $FLINE[1] = "0" THEN routine = "Handle Type 0" ELSE "Handle Type 1" CALL routine This calls the subroutine "Handle Type 0" if the first character of $FLINE is "0". Otherwise, it calls the subroutine "Handle Type 1". An alternative to this approach is: 43 BEGIN $FLINE[1] = "0" (code to handle Type 1 records) ELSE (code to handle other types of records) END Both methods accomplish the same thing, so the one you choose is largely a matter of personal preference. In general, however, you should use CALL and CODE only when the same code is required in several different places in the POM file. There are some exceptions however, as described in the next section... ----------------------------- Making Your Technique Obvious ----------------------------- Consider this example: APPEND routine "Type" $FLINE[1] CALL routine You can use this method to call subroutines named "Type0", "Type1", "Type2" and so on, according to the first character of $FLINE. This can make your POM file a lot easier to understand. Additionally, you may sometimes want to break up large parsing jobs into distinct pieces. Thus, the main part of your POM file might read as follows: CALL "Break Up Fields" CALL "Clean Up Data" CALL "Send to Output" If each of these steps is quite long and involved, your POM file may be easier to read and understand by using CALL, as shown, to portray the basic steps. ------------------------------ Avoiding Unknown Code Sections ------------------------------ If you attempt to call a subroutine that does not exist, Parse-O-Matic will terminate with an error message, saying it can not find the CODE section. It is a good idea to check for this possibility, as in this example: LOG "012345" ~ $FLINE[1] "Unknown record type: " $FLINE[1] HALT "012345" ~ $FLINE[1] "Unknown record type -- see processing log" APPEND routine "Type" $FLINE[1] CALL routine This will provide a more meaningful error message if the first character of $FLINE is not one that you expected (zero through five, in the example given above). 44 ---------------- The CODE Command ---------------- FORMAT: The format of a subroutine is as follows: CODE value1 : Code run when this subroutine is invoked by the CALL command : END PURPOSE: Defines a subroutine PARAMETERS: value1 is the name of the subroutine NOTES: value1 can be up to 255 characters long and may contain any characters (including spaces). Case is not important, so "Read New Data" is treated the same as "READ NEW DATA". Leading and trailing spaces and tabs are ignored. Subroutines are useful when a POM file has to perform the same series of commands in several different circumstances. Consider this POM file: SET field = $FLINE[10 20] <-- Get a field CALL "Output Date Field" <-- Run a subroutine SET field = $FLINE[30 40] <-- Get a field CALL "Output Date Field" <-- Run a subroutine CODE "Output Date Field" <-- Start of subroutine <-+ CHANGE field " " "/" <-- Correct date format | CHANGE field "\" "/" <-- Correct date format | CHANGE field "-" "/" <-- Correct date format | Subroutine CVTCASE field field <-- Convert to uppercase | OUTEND |{field} <-- Output the field | END <-- End of subroutine <-+ In this example, the subroutine (named "Output Date Field") tidies up the data before sending it to the output file. For example, if the date field is "10 31 98", it is converted to "10/31/98". As you can see, subroutines can reduce the number of lines in a POM file. Also, if you give your subroutines meaningful names, your POM file will be much easier to understand. Subroutines also make it easier to update a POM file. For example, if you noticed that the date field is sometimes delimited with the "plus" symbol, you could simply add the line CHANGE field "+" "/" to the "Output Date Field" subroutine. In other words, you would add a single line of code instead of several lines. Subroutines are ignored by Parse-O-Matic unless they are explicitly invoked by a CALL command. 45 -------------------------- Performance Considerations -------------------------- If you have a POM file with a lot of subroutines, you can improve its performance slightly by moving all the CODE sections to the end of the file, and placing a DONE command in front of them. This saves Parse-O-Matic the trouble of reading through them each time it processes a line. ----------------------- Nested Subroutine Calls ----------------------- You can call a subroutine from within a subroutine. For example: CODE "Clean Up Date Field" CHANGE field " " "/" CHANGE field "\" "/" CHANGE field "-" "/" CVTCASE field field END CODE "Output Date Field" CALL "Clean Up Date Field" OUTEND |{field} END The second subroutine ("Output Date Field") calls the first subroutine ("Clean Up Date Field"). This feature is useful if several subroutines require the same processing. You must be careful not to have circular references (a subroutine that calls a subroutine which in turn calls the first subroutine). This will call Parse-O-Matic to fail. Subroutine calls can be nested up to 25 deep. That is to say, a subroutine can call a second subroutine, which can call a third subroutine, and so on, all the way up the 24th subroutine calling the 25th subroutine. It is, however, unlikely that you will ever nest subroutines more than two or three deep. ---------------------- Variable Code Sections ---------------------- You may be wondering if you can place a variable in the value1 position of the CODE command, just as you can have a variable CALL command. While it is possible to do so (i.e. Parse-O-Matic will not fail), we do not officially support this usage, as we see no reason for it. If you do find a reason for it, make sure that the variable is first converted to uppercase. 46 --------------------------------- A Note to Experienced Programmers --------------------------------- Experienced programmers of other languages (e.g. Pascal, C, Basic, etc.) may wonder if there is a way to pass one or more variables to a subroutine. Parse-O-Matic does not currently support traditional features such as functions, variable-passing or local variables. These features (and other niceties such as code libraries) may be added in a later release. Parse-O-Matic's design is guided primarily by customer demand. ---------------- The ELSE Command ---------------- FORMAT: The format of a BEGIN/ELSE/END block is as follows: BEGIN value1 [comparator] value2 : Code that is run if the comparison is true : ELSE : Code that is run if the comparison is false : END PURPOSE: The ELSE command tells Parse-O-Matic to execute the following block of code (up until the END command) if the corresponding BEGIN comparison is NOT true. NOTES: The ELSE command is not the same as the ELSE used to pad the IF statement (e.g. IF xyz = "3" THEN x = "Y" ELSE "N"). In the IF command, ELSE makes the statement more clear, but it can be omitted (e.g. IF $FLINE[1] "3" x "Y" "N"). Here is an example of a BEGIN/ELSE/END block: BEGIN $FLINE[1 10] = "JOHN SMITH" SET x = "This is John" ELSE SET x = "This is not John" END If you are using several levels of nesting, it is a good idea to indent your code to show the relationship of the BEGIN, ELSE and END statements: 47 BEGIN datatype = "Dog" <---------------------------------- SET breed = $FLINE[1 10] | First BEGIN breed = "Collie" <----------------------- | Level SET noise = "Woof" | Second | Block BEGIN name = "Spot" <------ Third | Level | SET attitude = "Friendly" | Level | Block | END <------ Block | | ELSE | | SET noise = "Arf" | | SET attitude = "Unknown" | | END <----------------------- | END <---------------------------------- The ELSE is at "Level 2". This is because there are three BEGINs ahead of it, but only one END (3 - 1 = 2). --------------- The END Command --------------- FORMAT: END PURPOSE: Marks the end of a BEGIN, PROLOGUE or EPILOGUE code block. The END command marks the end of a "code block". A code block is a series of lines in a POM file that may be run if the conditions are right. For a more detailed discussion of the END command, see the following sections: - "The Begin Command" - "The Prologue Command" - "The Epilogue Command" 48 ----------------- The AGAIN Command ----------------- FORMAT #1: BEGIN [value1 [comparator] value2] : Code executed if the BEGIN comparison is true or omitted : AGAIN [value1 [comparator] value2] FORMAT #2: BEGIN value1 [comparator] value2 : Code executed if the BEGIN comparison is true : ELSE : Code executed if the BEGIN comparison is false : AGAIN [value1 [comparator] value2] PURPOSE: Controls the repetition of a BEGIN block. NOTES: For an explanation of comparators, see "Using Comparators". SEE ALSO: "The ReadNext Command", "The Begin Command", "The Else Command" and "Uninitialized and Persistent Variables" DEFAULTS: If the comparison part of the AGAIN command is omitted: - AGAIN repeats if the BEGIN comparison was true or omitted - AGAIN does not repeat if the BEGIN comparison was false ADVISORY: If you are familiar with other computer languages, you may be tempted to use AGAIN to create loops when none are required. Remember that a POM file is repeated (i.e. looped) each time a record or line is read from the input file. The AGAIN command is most appropriate when you have input records with a variable number of items. The AGAIN command allows you to implement "loops". A loop is a section of code that can be repeated one or more times. AGAIN returns to the corresponding BEGIN if the comparison is true, or if it is omitted. Since the BEGIN can also have a comparison, and can be used with the ELSE command, this allows many variations: 49 COMMAND ARRANGEMENT EFFECT ------------------------------ ----------------------------------------- BEGIN AGAIN Loops forever BEGIN comp AGAIN Loops until the BEGIN comparison is false BEGIN AGAIN comp Loops until the AGAIN comparison is false BEGIN comp AGAIN comp Loops until either comparison is false BEGIN comp ELSE AGAIN comp Loops until either comparison is false BEGIN comp ELSE AGAIN Loops until the BEGIN comparison is false In the last two examples, the ELSE code is run when the BEGIN comparison is false, then processing continues on the POM line after the AGAIN command. When a BEGIN comparison is false, the comparison (if any) of the AGAIN command is not evaluated. To put it another way: the AGAIN comparison is considered only if the BEGIN comparison is true or omitted. ------------------------------------ Using AGAIN for Variable-Length Data ------------------------------------ Let us say you have a text file that contains the names of people belonging to various clubs. The file lists the name of the club, then the number of people in each club, and then the names: Chess Club 3 John Smith Mary Jones Fred Williams Hopscotch Club 0 Tennis Club 2 Jack Martin Debbie Harris 50 You could process this input file with the following POM file: PAD $FLINE "R" " " "17" <-- Pad the club name out with spaces OUT |{$FLINE} <-- Send the club name to the output file READNEXT <-- Get the number of members SET members = $FLINE <-- Remember this number BEGIN members = "0" <-- Check if we have any members OUT |(None) <-- Report if we have no members ELSE <-- If we have members, do the next part SET count = "0" <-- Initialize a counter BEGIN <-- Start the loop READNEXT <-- Get the person's name SET count = count+ <-- Count this person OUT |{$FLINE} <-- Send the name to the file OUT count #< members |/ <-- Add a separator if not the last name AGAIN count #< members <-- Go back if we have more members END <-- Corresponds to the first BEGIN OUTEND | <-- Start a new line after each club This POM file would generate the following output: Chess Club John Smith/Mary Jones/Fred Williams Hopscotch Club (None) Tennis Club Jack Martin/Debbie Harris ------------------------------ Pointless Command Combinations ------------------------------ Some combinations of BEGIN, ELSE and AGAIN are pointless. The following command arrangements contain code that is never run: COMMAND ARRANGEMENT NOTE ------------------------------ ------------------------------------------ BEGIN ELSE AGAIN comp The ELSE portion is never executed BEGIN ELSE AGAIN Loops forever; the ELSE portion never runs 51 -------- Examples -------- Either of these two POM files will read a text file and ignore any lines that contain the words "COW": Using the AGAIN Command Using the IGNORE Command ----------------------- ------------------------ BEGIN $FLINE ^ "COW" IGNORE $FLINE ^ "COW" READNEXT OUTEND |{$FLINE} AGAIN OUTEND |{$FLINE} The shorter POM file is more efficient, but the results would be the roughly the same for both. Remember that a POM file is processed each time Parse-O-Matic reads an input record (or line), so the second version is, in effect, looping as many times as there are records in the file. The following POM file will read one line from a file, then send the string [6]123[7]123[6]123[7]123 to the output file: SET z = "0" BEGIN <----------------------------- SET y = "5" | First BEGIN y <> "7" <------------------ | Level SET x = "0" | Second | (Outermost) SET y = y+ | Level | Loop APPEND s s "[" y "]" | Loop | BEGIN x <> "3" <-------- | | SET x = x+ | Third | | APPEND s s x | Level | | AGAIN <-------- | | AGAIN <------------------ | SET z = z+ | AGAIN z <> "2" <----------------------------- OUTEND |{s} NEXTFILE The third level (innermost) loop ... generates 123 The second level (middle) loop ..... generates [6]123[7]123 The first level (outermost) loop ... generates [6]123[7]123[6]123[7]123 52 The following POM file will read one line from a file, then send the string "XXXY" to the output file: SET z = "0" <-- Initialize a counter SET s = "" <-- Initialize the string we will output BEGIN z < "3" <-- Check if the counter has reached "3" SET z = z+ <-- Add one to the counter APPEND s s "X" <-- Add an "X" to the end of the output string ELSE /__ The ELSE section is run when APPEND s s "Y" \ the BEGIN comparison is false AGAIN <-- Go back to the BEGIN OUTEND |{s} <-- Continue here after the ELSE portion NEXTFILE <-- Stop reading the input file ---------------- The DONE Command ---------------- FORMAT: DONE [value1 [comparator] value2] PURPOSE: The DONE command will discontinue processing the POM file and proceed to the next input line, whereupon the POM file will restart at the top. NOTES: For an explanation of comparators, see "Using Comparators". In the following explanation, we will demonstrate the command using only the "literally identical" ("=") comparator. ALTERNATIVES: The NEXTFILE, IGNORE and ACCEPT commands The DONE command is most useful when you have a long series of BEGIN/END blocks which make a related comparison. For example: 53 SET salesrep = $FLINE[11 50] SET region = $FLINE[ 1 2] BEGIN region = "US" OUTEND |Sales representative for U.S.A.: {salesrep} DONE END BEGIN region = "CN" OUTEND |Sales representative for Canada: {salesrep} DONE END BEGIN region = "EU" OUTEND |Sales representative for Europe: {salesrep} DONE END : etc. As you can see, if one of the BEGIN comparisons is true, all of the following ones will inevitably be false. Rather than processing all the others, you can use the DONE command to bail out and get ready for the next input line. The DONE command provides two benefits: - It can speed up processing slightly - It makes full traces easier to understand For an explanation of traces, see the section entitled "Tracing". Unless you use a comparison (explained later), the DONE command is useful only inside BEGIN/ELSE/END blocks. If you write a POM file like this: SET custnum = $FLINE[ 1 10] SET custname = $FLINE[11 50] DONE OUTEND |{custname} {custnum} then the OUTEND statement will NEVER be reached. Here is how you specify a comparison for the DONE command: DONE $FLINE = "End of Data" This discontinues the POM file, and proceeds to the next input line, if the current input line ($FLINE) is "End of Data". 54 -------------------- The NEXTFILE Command -------------------- FORMAT: NEXTFILE [value [comparator] value] PURPOSE: NEXTFILE discontinues processing the current input file and proceeds to the next one, restarting the POM file from the top. NOTES: For an explanation of comparators, see "Using Comparators". In the following explanation, we will demonstrate the command using only the "literally identical" ("=") comparator. ALTERNATIVES: The HALT command The NEXTFILE command is useful when you process multiple input files (see "POM and Wildcards"). Here is an example, which we will call TEST.POM: BEGIN $FLINE = "End of Data" OUTEND |{numlines} lines of data printed SET numlines = "" NEXTFILE END SET numlines = numlines+ OUTEND |{$FLINE} Let's say you have three text files: DATA1.XYZ, DATA2.XYZ and DATA3.XYZ. The last line of each file says "End of Data". You could copy all three files to the file OUTPUT.TXT with this command: POM TEST.POM DATA?.XYZ OUTPUT.TXT This would copy the data from each file, but when it gets to the line reading "End of Data", it records the number of lines of data that were printed. Any lines after the "End of Data" line are skipped, because of the NEXTFILE command. The NEXTFILE command can specify a comparison. Here is an example: NEXTFILE $FLINE = "End of Data" OUTEND |{$FLINE} Assuming the same input files (DATA1.XYZ etc.), and using the same POM command as last time, this POM file would simply copy up to (but not including" the line that reads "End of Data" in each input file. 55 ---------------- The HALT Command ---------------- FORMAT: HALT value1 comparison value2 value3 [value4] PURPOSE: The HALT command will terminate Parse-O-Matic processing if the comparison is true. PARAMETERS: value1 is any value value2 is any value value3 is the message to be displayed value4 is the optional error level (between 100 to 199) NUMERICS: Tabs, spaces and commas are stripped from value4 ALTERNATIVES: The NEXTFILE and SETERROR commands SEE ALSO: "The MsgWait Command" Here is an example of the HALT command: HALT sales = "0" "Zero sales!" If the variable named sales is "0", Parse-O-Matic will display an error box reading "Zero sales!" and terminate after you have pressed a key. A copy of the message is also placed in the processing log POMLOG.TXT (see "Logging"). When a HALT condition occurs, Parse-O-Matic terminates with a DOS error level of 100. You can specify a different value, using value4. This is useful if you are calling Parse-O-Matic from a batch file or application program and want to handle different errors in different ways. You can set value4 to any number between 100 and 199. Consider these examples: HALT sales = "0" "Zero sales" "150" HALT sales[1] = "-" "Negative sales" "160" This terminates Parse-O-Matic with an error level of 150 if sales are zero. If the first character of sales is a minus sign, Parse-O-Matic terminates with an error level of 160. When coding batch files, remember that the IF ERRORLEVEL command is considered "True" if the error is the specified value or higher. This means you should always test the higher value first. See your operating system manual for details. 56 -------------------- The SETERROR Command -------------------- FORMAT: SETERROR [value1 comparison value2] value3 PURPOSE: The SETERROR command will set the program return code if the comparison is true (or if it is omitted). PARAMETERS: value1 is any value value2 is any value value3 is the program return code value (between 100 to 199) NUMERICS: Tabs, spaces and commas are stripped from value3 ALTERNATIVES: The HALT command The SETERROR command sets the program return code, which you can check when Parse-O-Matic has completed processing. This lets you pass information back to the batch file or program that called Parse-O-Matic. When coding batch files, remember that the IF ERRORLEVEL command is considered "True" if the error is the specified value or higher. This means you should always test the higher value first. See your operating system manual for details. Here is a sample POM file, which we will name ERRTEST.POM: SET custname = $FLINE[ 1 20] SET owed = $FLINE[25 30] SET status = $FLINE[62 68] SETERROR status = "OVERDUE" "100" OUTEND |{custname} {owed} {status} Here is a sample batch file that uses ERRTEST.POM: @ECHO OFF POM ERRTEST.POM INPUT.TXT OUTPUT.TXT IF ERRORLEVEL 200 GOTO PERR IF ERRORLEVEL 100 GOTO OVERDUE GOTO QUIT :PERR ECHO A syntax or processing error was detected by Parse-O-Matic GOTO QUIT :OVERDUE ECHO We have at least one overdue account :QUIT 57 -------------------- The PROLOGUE Command -------------------- FORMAT: The format for PROLOGUE (used with the END command) is: PROLOGUE : Dependant code : END PURPOSE: PROLOGUE defines dependant code which is run before the first line of the input file is read. SEE ALSO: "The Epilogue Command" PROLOGUE can be used to set up some variables, or set up a heading -- anything you only want to do once per input file, at the very start. Here is an example of the PROLOGUE command: PROLOGUE SET both = "B" SET space = " " END SET firstname = $FLINE[ 1 10] SET lastname = $FLINE[15 25] TRIM firstname both space TRIM lastname both space OUTEND |{firstname} {lastname} When the input file is first opened, the PROLOGUE section sets the variables "both" and "space". Once they're set, you don't have to change them (since you are just using them to make the code easier to read). Thus, it makes sense to set them only at the beginning of processing and not bother setting them each time the POM file is executed (i.e. each time an input line is read). If you are working with multiple files (see "POM and Wildcards"), the PROLOGUE is run for each input file. If you want to run some code for the first file only, you can set a "flag", as in this example: 58 BEGIN firstfile = "" SET firstfile = "N" OUTEND |First file only ELSE OUTEND |Subsequent files END NEXTFILE If you run this POM file on several files at once, using wildcards, the first line of the output file will contain the words "First file only", since the variable "firstfile" has not yet been assigned a value. On subsequent files, the variable will have the value "N", so the following lines of the output file will read "Subsequent files". -------------------- The EPILOGUE Command -------------------- FORMAT: The format for EPILOGUE (used with the END command) is: EPILOGUE : Dependant code : END PURPOSE: EPILOGUE defines dependant code which is run after the last line of the input file is read and the POM file is executed to process it. In other words, once all the input data is finished, the POM file runs one last time -- but only the code in the EPILOGUE section. SEE ALSO: "The Prologue Command" You can use EPILOGUE to output final results. Let's say your input file looks like this: DESCRIPTION UNITS SOLD UNIT PRICE Wildebeest food 325 $ 9.99 Horse cologne 13 $ 3.25 Moose alarm 210 $ 5.95 : : : : : (Column positions) 1 18 27 33 41 You can find out the total number of units sold (of all types) with the following POM file: 59 IGNORE $FLINE[1 7] = "DESCRIP" CALC units = units "+" $FLINE[18 27] EPILOGUE OUTEND |Total units sold = {units} END This POM file adds up the number of units sold. The only output is the single line generated by the OUTEND in the EPILOGUE. If you are processing multiple files (see "POM and Wildcards"), the EPILOGUE is run after each input file is finished. 60 ============================================================================ VARIABLE MODIFIERS ============================================================================ ---------------- The TRIM Command ---------------- FORMAT: TRIM var1 value1 value2 PURPOSE: TRIM removes the character in value2 from var1. PARAMETERS: var1 is the variable being set value1 is "A" = All; "B" = Both ends; "L" = Left side only; "R" = Right side only; "M" = Multiples value2 is the character to be removed. ALTERNATIVES: The SET, CHANGE and REMAP commands The TRIM command gets rid of unwanted characters in a variable. It is typically used to remove blanks from either side of text, or leading zeros from numeric data. For example, to remove commas and the leading dollar sign from a number: SET PRICE = $FLINE[20 26] <-- Get the variable from the input file TRIM PRICE "A" "," <-- Trim All TRIM PRICE "L" "$" <-- Trim Left Here is how it works: If the input contains the string: "$25,783" The first TRIM changes it to: "$25783" The second TRIM changes it to: "25783" You can also squeeze out multiple occurrences of a given character. For example, to remove multiple spaces from a variable named xyz, use this command: TRIM xyz "M" " " If xyz has the value " A B C ", the command shown above changes it to " A B C ". You can then get rid of the spaces at both ends with this command: TRIM xyz "B" " " This changes the xyz variable to "A B C". 61 --------------- The PAD Command --------------- FORMAT: PAD var1 value1 value2 value3 : : : : MEANING: Variable Control Char Number PURPOSE: PAD makes var1 a specified length, padded with a specified character. PARAMETERS: var1 is the variable being set value1 is "L", "R", or "C" (Left, Right or Center) value2 is the character used to pad the string value3 is the desired string length NUMERICS: Tabs, spaces and commas are stripped from value3 ALTERNATIVES: The CHANGE command Here is an example of the PAD command. If the variable ABC is already set to "1234" ... PAD ABC "L" "0" "7" left-pads it 7 characters wide with zeros ("0001234") PAD ABC "R" " " "5" right-pads it 5 characters wide with spaces ("1234 ") PAD ABC "C" "*" "8" centers it, 8 wide, with asterisks ("**1234**") If the length is less than the length of the string, it is unchanged. For example, if you set variable XYZ to "PINNACLE", then PAD XYZ "R" " " "3" leaves the string as-is ("PINNACLE"). Thus, PAD can not be used to shorten a string. If it is your intention to make XYZ 3 letters long, you can use the SET command: SET XYZ = XYZ[1 3] ------------------ The CHANGE Command ------------------ FORMAT: CHANGE var1 value1 value2 PURPOSE: The CHANGE command replaces ALL occurrences of value1 with value2. ALTERNATIVES: The TRIM command. (The CHANGE command is more powerful than TRIM, but is not as efficient). 62 Here is an example of the CHANGE command in action: SET DATE = $FLINE[31 38] CHANGE DATE "/" "--" If the SET command assigns DATE the value: "93/10/15" Then the CHANGE command converts it to: "93--10--15" ------------------- The CVTCASE Command ------------------- FORMAT: CVTCASE var1 value1 [value2] PURPOSE: CVTCASE converts a value to uppercase or lowercase. PARAMETERS: var1 is the variable being set value1 is the value being converted value2 is the optional control setting DEFAULTS: value2 = "UI" ALTERNATIVES: The PROPER or REMAP commands; $FLUPC CVTCASE converts value1 to uppercase or lowercase and places the result in var1. Here are some examples: COMMAND DESCRIPTION --------------------------- -------------------------------- CVTCASE xyz "Test Case" "U" Sets variable xyz to "TEST CASE" CVTCASE xyz "Test Case" "L" Sets variable xyz to "test case" CVTCASE xyz "Test Case" Sets variable xyz to "TEST CASE" In the last example, the optional control parameter (value2) was omitted. In such case, CVTCASE will convert the value to uppercase. 63 ---------------- Control Settings ---------------- The control setting (value2) can be one or two characters long, or it can be omitted (in which case it is assumed to be "UI"). Here are the available settings for value2: SETTING CONVERT TO CHARACTER SET ------- ---------- ------------------ "L" Lowercase IBM Extended ASCII "LI" Lowercase IBM Extended ASCII "L7" Lowercase 7-bit ASCII "U" Uppercase IBM Extended ASCII "UI" Uppercase IBM Extended ASCII "U7" Uppercase 7-bit ASCII The IBM Extended ASCII character set defines diacritical (accented) characters such as "U Umlaut" and "C Cedille"; these are located in the ASCII table above value 127. It is the standard character set used by MS-DOS and PC-DOS. The 7-bit ASCII character set is concerned only with the characters in the original definition of ASCII (American Standard Code for Information Interchange), which does not support diacritical characters. As such, uppercasing and lowercasing affect only alphabetic characters ("A" to "Z", and "a" to "z"). This character set is used by many mini-computers, and is the standard character set of the Unix operating system. The eighth bit is not ignored if you use CVTCASE with the 7-bit ASCII character set. If you wish to set the eighth bit to zero (perhaps because it is a parity bit), you should use the REMAP command. ------------------ The PROPER Command ------------------ FORMAT: PROPER var1 [value1 [value2]] PURPOSE: The PROPER command converts uppercase text (LIKE THIS) to mixed-case text (Like This). PARAMETERS: var1 is the variable being set value1 is the methods setting value2 is the name of the Properization Exception File DEFAULTS: value1 = "IW" ALTERNATIVES: The CHANGE command; $FLUPC (uppercase version of $FLINE) 64 The PROPER command is useful when you have a list of names of people and addresses. You can also use PROPER to change text that has been typed in uppercase into normal text, with capital letters at the beginning of sentences. The simplest way to convert a variable is as follows: PROPER CustName If CustName contains "JOHN SMITH", it will be changed to "John Smith". The conversion routine is fairly intelligent. For example, if it is converting the words "JAGUAR XJS", it can tell that XJS is not a word (since it does not contain any vowels) and so the end result will be "Jaguar XJS". Other "strange-looking" items such as serial numbers can often be recognized by the PROPER command, and left untouched. Nevertheless, it is impossible to handle all situations, so the PROPER command supports a "Properization Exceptions File" (known as a PEF file). A PEF file lists unusual combinations of letters (typically abbreviations, such as Dr.). The Parse-O-Matic package includes a file named GENERIC.PEF, which you may find helpful. You can view it with the SEE program provided with Parse-O-Matic. A PEF file is prepared with a text editor and contains one "exception" per line. Null or blank lines, or lines that start with a semicolon, are ignored. The longest word that can be specified is 255 characters. Spaces are permitted, but leading and trailing spaces and tabs are ignored. To use the PEF file in your PROPER command, place the file name after the variable name and method setting. For example: PROPER CustName "W" "GENERIC.PEF" The "W" is the method setting (explained later). "GENERIC.PEF" is the name of the PEF file. When Parse-O-Matic looks for the PEF file, it looks for it in the current directory unless an explicit path is specified, then searches elsewhere, if necessary. (For details, see the section entitled "How Parse-O-Matic Searches for a File".) If it can not find it there, it looks in the directory where POM.EXE is located. You can, if you wish, specify a complete path to the file, as in this example: PROPER Address "W" "C:\MYFILES\MYPEF.XYZ" If you don't need an exceptions file, you should not use it, since it slows down processing somewhat. Needless to say, the more items you have in the PEF file, the more it slows down processing. 65 The method setting allows you to specify what PROPER does. There are several kinds of controls, as follows: METHOD DESCRIPTION ------ ----------- I Intelligent determination of non-words S Upcase the first character of each sentence U Upcase the first alphanumeric character of the line W Upcase the first letter of each word The default method setting is "IW", so if you omit the method setting, or specify a null setting (e.g. PROPER CustName "" "XYZ.PEF"), PROPER will upcase non-words, and the first letter of each word. NOTE: If you specify a PEF file, you must also specify a method setting, even if it is null. The line PROPER x "GENERIC.PEF" would not be understood by Parse-O-Matic. The correct format would be: PROPER x "" "GENERIC.PEF" The examples provided with Parse-O-Matic demonstrate some ways you can use the PROPER command. To see the examples, enter INFO at the DOS prompt, or run INFO.BAT from Windows, then select TUTORIAL. ------------------ The INSERT Command ------------------ FORMAT: INSERT var1 value1 value2 PURPOSE: The INSERT command inserts text on the left or right of var1, or at a "found text" position. PARAMETERS: var1 is the variable being set value1 is "L" or "R" (Left or Right) or a find-string (e.g. "One " "hour " sets xyz to "One hour a day" The < prefix means "insert value1 before the found text". The > prefix means "insert value1 after the found text". If the find-string is not found, nothing is done. NOTE: Prior to version 3.40 of Parse-O-Matic, the "insert before" opera- tion was denoted by the @ prefix rather than the < prefix. This still works, so you do not have to change your POM files. ------------------ The APPEND Command ------------------ FORMAT: APPEND var1 value1 value2 [value3 [value4]] PURPOSE: The APPEND command concatenates (adds together) two or more values and places the result in var1. NOTES: No variable can hold more than 255 characters. ALTERNATIVES: The INSERT command Here is an example of the APPEND command: APPEND xyz "AB" "CD" "EF" "GHIJ" This command sets the variable xyz to "ABCDEFGHIJ". The third and fourth values (value3 and value4 in the FORMAT shown above) are optional. Thus, you can use APPEND with only two values. For example: SET x1 = "AB" SET x2 = "CD" APPEND x3 x1 x2 This sets the variable x3 to "ABCD". You can concatenate a maximum of four values with a single APPEND command. If you require additional concaten- ations, you can use more APPEND commands: 67 APPEND myvar "ABC" "DEF" "GHI" "JKL" APPEND myvar myvar "MNO" "PQR" The first line sets the variable myvar to "ABCDEFGHIJKL". The second line set myvar to its previous value, plus "MNOPQR", so that its final value is "ABCDEFGHIJKLMNOPQR". ------------------- The OVERLAY Command ------------------- FORMAT: OVERLAY var1 value1 value2 PURPOSE: The OVERLAY command overwrites a portion of a variable with a value, at the specified position. The length of the variable is padded with spaces if necessary. PARAMETERS: var1 is the variable to be overwritten value1 is the data which will be placed into var1 value2 is the position, in var1, where value1 will be placed NOTES: No variable can hold more than 255 characters. NUMERICS: Tabs, spaces and commas are stripped from value2 value2 must be between "1" and "255" SEE ALSO: "The Append Command" and "The Insert Command" The OVERLAY command lets you replace part of a variable, starting at a particular position. Here is an example: SET x = "ABCDEFG" SET y = "pom" OVERLAY x y "4" This will change the x variable so it contains "ABCpomG", since it places the "pom" at position "4". If value2 is greater than the length of var1, var1 is padded with spaces to allow value2 to be placed at the specified position. For example: SET x = "ABCDEFG" SET y = "pom" OVERLAY x y "10" This will change the x variable to "ABCDEFG pom". Since var1 was less than 10 characters, it had to be extended to allow the "pom" to start in position "10". 68 Since no variable can be more than 255 characters, OVERLAY will truncate the result if it extends beyond that: SET x = "ABCDEFG" SET y = "pom" OVERLAY x y "254" The x variable will be 255 characters long (starting with "ABCDEFG", followed by many spaces) and end with "po", since the "p" is in position "254" and the "o" is in the last legal position (i.e. "255"). Here are some additional examples: SET x = "" <-- Set x to a null (empty) string OVERLAY x "ABC" "1" <-- Set x to "ABC" (i.e. the "A" is at position "1") OVERLAY x "DEF" "4" <-- Change x from "ABC" to "ABCDEF" OVERLAY x "IJ" "9" <-- Change x to "ABCDEF IJ" OVERLAY x "GH" "7" <-- Change x to "ABCDEFGHIJ" ------------- Simple Arrays ------------- You can use OVERLAY to perform some simple array processing, provided the total length of the data items does not exceed 255 characters. For example, you could store 25 items of up to 10 characters each in a 250-character variable. Here is a helpful subroutine (see "The Code Command") for this kind of operation: CODE "Calc Array Indexes" SET ItemSize = "10" CALC IndexStart = Index- "*" ItemSize CALC IndexEnd = IndexStart "+" ItemSize SET IndexStart = IndexStart+ END This will provide pointers for an array of 25 items of up to 10 characters each, with indexes of 1 to 10. You can use this information with both the OVERLAY and COPY commands. You could extract the 15th item as follows: SET Index = "15" CALL "Calc Array Indexes" COPY x MyArray IndexStart IndexEnd For a more sophisticated and flexible method of handling arrays, see "Array Variables". 69 ------------------- The MAPFILE Command ------------------- FORMAT: value1 value2 [value3] PURPOSE: MAPFILE reads a file containing data for the REMAP command PARAMETERS: value1 is the name of the map file value2 is the map name, used by the REMAP command value3 is the control settings (AnyCase/MatchCase/Transpose) DEFAULTS: value3 = "MATCHCASE" NOTES: The maximum length of a map name is 12 characters SEE ALSO: "The Remap Command", "How Parse-O-Matic Searches for a File" The MAPFILE command reads in a file which contains data for the REMAP command, and assigns a name to the collection of data so the REMAP command can refer to it. ------------------- What is a Map File? ------------------- A map file is an ordinary text file; you can create or edit the file with a standard text editor, or a word-processor in "generic text" mode. Map files are usually given the .MPF extension. A map file contains a list of "mappings". Here are some other words with approximately the same meaning as "mapping": Translation Correlation Substitution Equivalence Replacement In other words, the map file contains a list of data items that should be replaced by other data items. ---------------- Sample Map Files ---------------- The following map files are included in the standard Parse-O-Matic package: FILE NAME DESCRIPTION ------------ ---------------------------------------------------------- BIN2CHAR.MPF Converts binary data into printable characters and periods BIN2CODE.MPF Converts binary data into hex codes (e.g. 3F 2C A3) ASC2EBCD.MPF Converts ASCII data to EBCDIC and vice-versa 70 --------------- Map File Format --------------- The map file contains one mapping per line. Each mapping consists of two Parse-O-Matic literals, separated by one or more spaces or tabs. The first value is the "find" column, while the second value is the "replace" column. Here are some examples: "123" "LOS ANGELES" <-- Both values are "literal text strings" $39 "9" <-- A hex code and a literal text string #48 "Zero" <-- A decimal code and a literal text string $FF$30 #00#00 <-- Hex and decimal literals These columns are lined up for clarity; there is no need to start in a particular column. Any leading or trailing spaces are removed from a line, and any number of spaces or tabs can appear between the columns. The following line is NOT valid: 123 LOS ANGELES The line must use text, hex or decimal literals (e.g. "text", $FF, #FF). Null or blank lines, or lines that start with a semicolon, are ignored. The longest line that can be specified is 255 characters. The longest value that can be specified is 80 characters (after translation, if it is in hex or decimal mode). ------------ Search Order ------------ The REMAP command performs substitutions in the order that they appear in the map file. In most cases, the longer "find" strings should appear first. For example, let us say you create a map file named FORWARD.MPF, which looks like this: "123" "in Los Angeles" "12" "in Montreal" "1" "in Town of Mount Royal" "2" "in Podunk" 71 Now let's say you run the following POM file, named FORWARD.POM: MAPFILE "FORWARD" SET comment = "Forward the memo to office 123" REMAP comment OUTEND |{comment} This will produce the following output: Forward the memo to office in Los Angeles This happens because the string "123" is replaced by the string "in Los Angeles". If the order of the lines in FORWARD.MPF are reversed, FORWARD.POM will produce the following output: Forward memo to Office in Town of Mount RoyalinPodunk3 This happens because the "1" is found (and replaced) first, followed by the "2". Since there is no "3" in FORWARD.MPF, it is left alone. Parse-O-Matic does NOT enforce the principle of "progressively shorter 'find' strings". If you are processing a lot of data, you can improve processing speed slightly by placing a short, frequently-used "find" string near the top of the list. As long as it is not a sub-string of (i.e. contained within) one of the following strings, it will not cause any problems. 72 ------------- Case Matching ------------- You can set value3 to "AnyCase" or "MatchCase" (the default). ANYCASE: The find string need not match in case ("John" = "JOHN") MATCHCASE: The find string must match ("John" does not match "JOHN") Processing is faster if you use the default setting (MatchCase). --------------- Reverse Mapping --------------- If you want the mapping process to work "backwards", you can use the "Transpose" control setting in value3. For example: MAPFILE "MYFILE.MPF" "MYFILE" "AnyCase Transpose" This reverses the mapping process: the "find" column is treated like the "replace with" column, and vice-versa. The standard Parse-O-Matic package contains a map file (ASC2EBCD.MPF) which will translate ASCII files into EBCDIC files -- and vice-versa. NOTE: EBCDIC is a character representation used on certain large mainframe computers. Both ASCII and EBCDIC characters are eight bits long, but EBCDIC uses different bit patterns for most characters. Since both the "find" and "replace with" columns in ASC2EBCD.MPF are only one character wide, and since there is no duplication within either column, the translation process is perfectly reversible. For example: 73 PROLOGUE CHOP 1 20 <-- Read 20 bytes at a time MAPFILE "ASC2EBCD.MPF" "EBCDIC" <-+ MAPFILE "ASC2EBCD.MPF" "ASCII" "TRANSPOSE" | Set up maps MAPFILE "BIN2CODE.MPF" "CODE" <-+ END SET x = $FLINE <-+ REMAP x "CODE" | Display original text in OUTEND |[ORIGINAL] [{$FLINE}] | normal & hex-coded form OUTEND |[ORIGINAL] [{x}] <-+ REMAP $FLINE "EBCDIC" <-+ SET x = $FLINE | Convert to EBCDIC and REMAP x "CODE" | display in coded form OUTEND |[EBCDIC ] [{x}] <-+ REMAP $FLINE "ASCII" <-+ SET x = $FLINE | Convert EBCDIC back to REMAP x "CODE" | ASCII; display hex code OUTEND |[ASCII ] [{x}] <-+ OUTEND | <-- Output a separator line You can run this POM file against any file, then view the output file. You will see how the original text is converted into EBCDIC and then the EBCDIC is converted back to ASCII. (Most of the data in the output file is represented in "hex dump" format, since your computer is not designed to display EBCDIC.) TRANSPOSE will often let you use a single map file instead of two, but before using this technique you should carefully consider how mapping will take place (see "Irreversible Mapping", below). -------------------- Irreversible Mapping -------------------- Consider the following POM file: MAPFILE "MYMAP.MPF" "XYZ" MAPFILE "MYMAP.MPF" "ZYX" "TRANSPOSE" REMAP $FLINE "XYZ" REMAP $FLINE "ZYX" OUTEND |{$FLINE} In many cases, this is equivalent to the following one-line POM file: OUTEND |{$FLINE} because the first REMAP changes $FLINE one way, and the second REMAP changes it back. 74 This is not true in ALL cases, however. In some circumstances a REMAP is not reversible. Consider the following map file: "XYZ" "CAB" "ABC" "C" "DEF" "ABC" Now consider the following sequence of events. (The * and # characters show what gets replaced in each step.) Original string . . . . . . . . . ABCDEF ***### *### Remap produces this result . . . . CABC *** *** Transposed remap of result . . . . XYZC If you follow the steps of the substitutions, you will see where the confusion arises. As a general rule, simple substitutions (with no duplications in whole or in part) are reversible, but if you have any doubts, you can always take the safe route and use a separate map file for each direction. (See "Search Order", above, for additional insight into this matter.) ------------------ Memory Limitations ------------------ The MAPFILE command reads the map data into RAM memory. You will normally have sufficient memory for thousands of bytes worth of mappings. However, if you do not have enough memory to hold the data, Parse-O-Matic will display an error message, then terminate. (See "Solving Memory Problems") To help you track memory usage, the MAPFILE command records memory status (bytes used and bytes left) in the processing log (see "Logging"). ----------------------- An Example of Remapping ----------------------- The standard Parse-O-Matic package contains two sample map files: BIN2CODE.MPF maps single bytes to hex codes (e.g. Hex $31 becomes "31 ") BIN2CHAR.MPF maps single bytes to either printable characters or periods You can view these files with the SEE program (included with Parse-O-Matic) or you can load them into a text editor program. Here is a POM file that uses the sample map files to create a hex dump of a binary file: 75 CHOP 1 16 <-- Read the file 16 bytes at a time SETLEN w $FLINE <-- Get the actual number of bytes read BEGIN w <> "16" PAD $FLINE "R" #0 "16" <-- If less than 16 bytes, pad with nulls END SET x = $FLINE <-- Make a copy of $FLINE SET y = $FLINE <-- Make a copy of $FLINE MAPFILE "BIN2CHAR" "CHAR" <-- See Note MAPFILE "BIN2CODE.MPF" "CODE" REMAP x "CHAR" <-- Change the bytes to printable characters REMAP y "CODE" <-- Change the bytes to hex codes OUTEND |x y <-- Output the line Note: Since the file name (value1) does not have an extension, Parse-O-Matic will add the .MPF extension. Thus, the actual file name Parse-O-Matic looks for is "BIN2CHAR.MPF". ----------------- The REMAP Command ----------------- FORMAT: REMAP var1 value1 PURPOSE: REMAP transforms sub-strings into other strings PARAMETERS: var1 is the variable being transformed value1 is the map name (see "The MapFile Command") ALTERNATIVES: The LOOKUP and CHANGE commands SEE ALSO: "The MapFile Command" The REMAP command performs intensive substitutions on a variable. It is equivalent to a large number of CHANGE commands, but has the following advantages: - It is faster than using a large number of CHANGEs - It does not expend your available values (see "Values") - It prevents multiple substitutions 76 ------------------- REMAP Versus CHANGE ------------------- The "multiple substitution" issue is most important distinction between CHANGE and REMAP. REMAP protects substituted text from being resubstituted. Consider the following POM lines: SET x = "cat dog mouse" CHANGE x "cat" "dog" CHANGE x "dog" "cat" You might expect these lines to change x to "Dog Cat Mouse", but the actual result is "cat cat mouse". The first CHANGE command sets the x variable to "dog dog mouse". The next command changes the dogs into cats! You can avoid this problem by using intermediate substitutions or some such work-around, but this ends up complicating the POM file considerably. Moreover, this approach can be unwieldy if you have to perform a large number of substitutions. ----------- Using REMAP ----------- To accomplish the "cat/dog" substitution mentioned earlier, you can create a map file (named CATDOG.MPF) with a text editor. It will look like this: "cat" "dog" "dog" "cat" Your POM file will then look like this: MAPFILE "CATDOG.MPF" "PETS" SET x = "cat dog mouse" REMAP x "PETS" This will change the x variable to "dog cat mouse". For another example of the REMAP command, see "The MapFile Command". 77 ============================================================================ FREE-FORM COMMANDS ============================================================================ ---------------------------- What are Free-Form Commands? ---------------------------- The free-form commands are used for extracting information from an input line that does not have its data in precise columns. Consider the following input file: Mouse Gazelle Mouse Elephant Dog Giraffe Elk Mongoose Monkey Snake Caribou Trout | | | | Column 1 Col 11 Col 21 Col 31 Extracting data that is arranged in tidy columns is simple -- all you need is the SET command. However, you will need a more powerful command if the data is "free-form", like this: Mouse,Gazelle,Mouse,Elephant Dog,Giraffe,Elk,Mongoose Monkey,Snake,Caribou,Trout The data is not arranged in tidy columns. For tasks like this, you need the free-form commands. ----------------- The PARSE Command ----------------- FORMAT: PARSE var1 value1 value2 value3 [value4] : : : : : MEANING: Variable Source From To Control PURPOSE: PARSE sets var1 to the text (found in value1) between text fragments specified by value2 and value3. PARAMETERS: var1 is the variable being set value1 is the source text being read value2 specifies the starting position (decapsulator) value3 specifies the ending position (decapsulator) value4 is the optional control setting DEFAULTS: value4 = "X" ALTERNATIVES: The PEEL command, and COPY used with FINDPOSN 78 Consider the following free-form data: Mouse,Gazelle,Mouse,Elephant Dog,Giraffe,Elk,Mongoose Monkey,Snake,Caribou,Trout The PARSE command lets you extract the "Nth" item. For example, to extract the third item in each line in the free-form example above, you could use this command: PARSE xyz $FLINE "2*," "3*," This means "set the variable xyz by looking in $FLINE (the line just read from the input file) and taking everything between the second comma and the third comma". For the three lines in the sample input file, the variable xyz is set to Mouse, then Elk, then Caribou. ------------- Decapsulators ------------- In the "From" specification in the previous example (i.e. the "2*," part of the command): 2 means "the second occurrence" * is a delimiter to mark the end of the occurrence number , is the text you are looking for Both the "From" and "To" specifications use this format. Commands using this format are said to use "decapsulators", because you are extracting text that is encapsulated (i.e. surrounded) by other text. Decapsulators may be used to find more than a single character. The surrounding text can be up to 80 characters long. Let's say the input file looks like this: Mouse%:Gazelle%:Mouse%:Elephant Dog%:Giraffe%:Elk%:Mongoose Monkey%:Snake%:Caribou%:Trout You can extract the third item in each line with this command: PARSE xyz $FLINE "2*%:" "3*%:" ___ ______ _ ___ _ ___ | | | | | | Variable to set | | | | | The value to parse | | | "To" text being sought "From" occurrence number | "To" occurrence number "From" text being sought This command sets the variable xyz to Mouse, then Elk, then Caribou. 79 ------------------ Sample Application ------------------ The PARSE command is particularly useful for extracting information from comma-delimited files. Here is an example of a comma-delimited file: "Mouse","Gazelle","Mouse","Elephant" "Dog","Giraffe","Elk","Mongoose" "Monkey","Snake","Caribou","Trout" You can extract all the fields with this series of commands (note the use of doubled-up quotes to represent a single quotation mark -- see the section "Delimiters" for details): PARSE field1 $FLINE "1*""" "2*""" PARSE field2 $FLINE "3*""" "4*""" PARSE field3 $FLINE "5*""" "6*""" PARSE field4 $FLINE "7*""" "8*""" For the first line of the sample input file, field1 is set to Mouse, field2 is set to Gazelle, and so on. --------------------- The Occurrence Number --------------------- The occurrence number must be between 1 and 255. The following lines are not valid PARSE commands: PARSE xyz $FLINE "0*," "1*," <-- "From" decapsulator invalid: uses 0 PARSE xyz $FLINE "1*," "256*," <-- "To" decapsulator invalid: uses 256 The occurrence number must always be followed by a "*" so you can search for a number. Consider the following example (the meaning of which would be unclear without the "*" delimiter): PARSE xyz "XXX2YYY2ZZZ2" "1*2" "2*2" This sets xyz to the text occurring between the first "2" and the second "2". In other words, xyz is set to "YYY". 80 --------------------------- Finding the Last Occurrence --------------------------- A decapsulator can refer to "the LAST occurrence": PARSE xyz "AaaBAbbBAccB" ">*A" ">*B" In both decapsulators, the ">" symbol means "the last occurrence". Thus, the command tells Parse-O-Matic, "Set the xyz variable to everything between the last A and the last B". This sets the xyz variable to "cc". You can also use the "<" character to mean "the FIRST occurrence", although this is somewhat redundant, since the following commands are equivalent: PARSE xyz "AaaBAbbBAccB" "<*A" "<*B" PARSE xyz "AaaBAbbBAccB" "1*A" "1*B" PARSE xyz "AaaBAbbBAccB" "A" "B" All three commands set the xyz variable to "aa". --------------------- Unsuccessful Searches --------------------- If PARSE does not find the search text, the variable will be set to a null (""). Here are two examples: PARSE abc "ABCDEFGHIJ" "1*K" "1*J" <-- There is no "K" PARSE abc "ABCDEFGHIJ" "1*A" "1*X" <-- There is no "X" If the "from" value is less than the "to" value, Parse-O-Matic will display an error message, then terminate. For example: PARSE abc "ABCDEFGHIJ" "1*J" "1*A" <-- "J" comes after "A" This kind of failure typically happens if the input data contains an odd arrangement of text that you had not foreseen. 81 ------------------- The Control Setting ------------------- The PARSE command has an optional "Control" parameter, which tells PARSE whether to include or exclude the surrounding text that was found. By default (as shown in all of the preceding examples), the delimiting text is excluded. However, if you want to include it, you can add "I" at the end of the PARSE command, as in this example: PARSE xyz "aXcaYcaZc" "2*a" "2*c" "I" This tells Parse-O-Matic to give you everything between the second "a" and the second "c" -- including the "a" and "c". In other words, this sets the variable xyz to "aYc". You can also set the Control specification to "X" (meaning "exclude"), although since this is the default setting for PARSE, it really isn't necessary. Here is an example: PARSE xyz "a1ca2ca3c" "2*a" "2*c" "X" This sets the variable xyz to "2". ---------------------- The Plain Decapsulator ---------------------- The occurrence number is not always needed. Either the "From" or "To" decapsulator can be represented as a plain string, as follows: PARSE $FLINE "ABC" "XYZ" This means: - Start at the first "ABC" found in the value being parsed - End with the first "XYZ" found in the value being parsed 82 --------------------- The Null Decapsulator --------------------- Here is helpful variation of the "From" decapsulator: "" means "Start from the first character in the value being parsed" A similar variation can be used with the "To" decapsulator: "" means "End with the last character in the value being parsed" If you use the null ("") decapsulator for "From" or "To", the "found" value (the first character for "From", or the last character for "To") will always be included (see "Overlapping Decapsulators" for the single exception to this rule). Here is an example: PARSE xyz "ABCABCABC" "" "2*C" This sets the variable xyz to "ABCAB". The "From" value (i.e. the first character) is NOT excluded. However, when PARSE finds the "To" value (i.e. the second occurrence of the letter C) it IS excluded. If you want to include the second "C", you should write the command this way: PARSE xyz "ABCABCABC" "" "2*C" "I" The following two commands accomplish the same thing: PARSE xyz "ABCD" "" "" SET xyz "ABCD" They are equivalent because the PARSE command means "Set the variable xyz with everything between (and including) the first and last character". 83 --------------------------------------- Why Null Decapsulators Work Differently --------------------------------------- The reason that PARSE treats the null ("") decapsulator differently may not be immediately obvious, since the examples given here are very simple, and not representative of "real world" applications. However, in day-to-day usage, you will frequently find it helpful to be able to specify a command that says, "Give me everything from the beginning of the line to just before such-and-such". Here is a command that means "Give me everything from just after the dollar sign, to the end of the line": PARSE xyz "I'd like to have $250.00" "1*$" "" This sets xyz to "250.00". If you want to include the dollar sign, write the command this way: PARSE xyz "I'd like to have $250.00" "1*$" "" "I" ------------------------- Overlapping Decapsulators ------------------------- Earlier, it was mentioned that the text found by the null decapsulator is "always included" and is not affected by the "X" (Exclude) control. There is one exception to this: if the null decapsulator's "found text" is contained in the text found by the other decapsulator, it WILL be affected. For example: PARSE x "ABCDEFABCDEF" "" "1*AB" "X" This command tells Parse-O-Matic "give me everything between the first character and the first occurrence of AB". Since the two items overlap (i.e. the first "AB" includes the first character), the first character does indeed get excluded. As a result, the x variable is set to an empty string (""). Here is another example: PARSE x "ABCDEFABCDEF" ">*F" "" "X" This command tells Parse-O-Matic "give me everything between the last occurrence of F and the last character". Both decapsulators refer to the same character (i.e. the final "F"), so it is excluded. As a result, the x variable is set to an empty string (""). NOTE: In some circumstances, the FINDPOSN command is NOT affected by this exception. It will do its best to make sense of your request if the decapsulators overlap, and one of them is a null decapsulator. For details, see "The FindPosn Command". 84 -------------------- Parsing Empty Fields -------------------- Consider the following command: PARSE x ",,,JOHN,SMITH" "2*," "3*," There is nothing between the second and third comma, so the x variable is set to "" (an empty string). Now consider this command: PARSE x ",,,JOHN,SMITH" "" "," You are asking for everything from the first character to the first comma (which also happens to be the first character). Obviously, there is nothing "between" the two characters, so the x variable would be set to "" (an empty string). ------------------- Additional Examples ------------------- For more examples of the PARSE command, see the demonstrations provided with Parse-O-Matic (type INFO at the DOS prompt, or run INFO.BAT from Windows, then select TUTORIAL). ---------------- The PEEL Command ---------------- FORMAT: PEEL var1 var2 value1 value2 [value3] : : : : : MEANING: Variable Source From To Control PURPOSE: The PEEL command works just like PARSE, but after setting var1, it REMOVES the parsed value (including the delimiters) from var2. PARAMETERS: var1 is the variable being set var2 is the source text being read value1 specifies the starting position (decapsulator) value2 specifies the ending position (decapsulator) value3 is the optional control setting DEFAULTS: value3 = "X" (See "The PARSE Command" for discussion) When you are breaking up a complex line into fields, PEEL can simplify matters considerably, because the line being interpreted gradually becomes less complex. 85 Here is a simple example. Let's say you have an input file containing a single line: AA/BB/CC/DD If you run this POM file against the input file: PEEL x $FLINE "" "/" <-- Strip out the AA and remove the / OUTEND |{x} PEEL x $FLINE "" "/" <-- Strip out the BB and remove the / OUTEND |{x} PEEL x $FLINE "" "/" <-- Strip out the CC and remove the / OUTEND |{x} OUTEND |{$FLINE} then the output file will look like this: AA BB CC DD What is happening is that $FLINE is gradually being stripped of the text that is being found. After the first PEEL, $FLINE contains "BB/CC/DD", and so on. After the final PEEL, $FLINE only contains "DD". ------------------- The Control Setting ------------------- The "I" and "X" control parameters behave the same way as they do in the PARSE command: they specify whether or not the surrounding text is included in var1. Take note, however, that the starting and ending characters are always removed from var2, along with the "found" text, regardless of the control parameter. In other words, the control parameter only affects the first variable (x in the example above), not the second ($FLINE in the example). 86 -------------------- Parsing Empty Fields -------------------- Consider the following commands: SET z = ",,,JOHN,SMITH" PEEL x z "2*," "3*," There is nothing between the second and third comma, so the x variable is set to "" (an empty string). After the PEEL command, the z variable will be two commas shorter (",JOHN,SMITH,23.00"). If you are trying to extract data from a comma-delimited line, this is probably not what you want (since it gets rid of two commas). When taking apart a delimited file, it often makes sense to start peeling from the left side of the string. Consider these commands: SET z = ",,,JOHN,SMITH" PEEL x z "" "," You are asking for everything from the first character to the first comma (which also happens to be the first character). Obviously, there is nothing "between" the two characters, so the x variable would be set to "" (an empty string). After the PEEL command, the z variable will be one comma shorter (",,JOHN,SMITH"). 87 -------------------------- The Left-Peeling Technique -------------------------- You can use the "left-peeling" technique to take apart an entire line. This is especially useful when interpreting a comma-delimited file. SET z = ",,MARY,JONES," PEEL a z "" "," <-- Sets the a variable to "" PEEL b z "" "," <-- Sets the b variable to "" PEEL c z "" "," <-- Sets the c variable to "MARY" PEEL d z "" "," <-- Sets the d variable to "JONES" SET e = z <-- Sets the e variable to "" The e variable is null because there is nothing after the last comma -- in other words, the final field is empty. If the initial value of the z variable was ",,MARY,JONES,99" then the e variable would be set to "99". ---------------------- The Leftover Technique ---------------------- Sometimes you are faced with a parsing task in which the input lines are more than 255 characters long, yet there is no way to know where each field begins. This makes it impossible to use the SPLIT command in the usual way. This type of problem generally arises when processing comma-delimited or tab-delimited files. Here is a sample POM file which handles input lines up to 300 characters long, provided that no field is more than 155 characters long. 88 SPLIT 1-100, 101-200, 201-300 <-- Process input lines in three segments IGNORE $FLINE = "" <-- Ignore any splits that yield nothing BEGIN leftover <> "" <-- See if we have anything left over APPEND $FLINE leftover $FLINE <-- Append what's left over SET leftover = "" <-- We've used up the left over data END BEGIN <-- Loop through the parts we can extract FINDPOSN x $09 <-- Look for a tab BEGIN x <> "0" <-- See if we found a tab SET foundtab = "Y" <-- Yes, we found a tab PEEL x $FLINE "" #09 <-- Peel away everything up to the tab OUTEND |{x} <-- Process the text we peeled away ELSE SET foundtab = "N" <-- No, we didn't find a tab BEGIN $SPLIT = "3" <-- See if this is the last of the text OUTEND |{$FLINE} <-- Output whatever is left over ELSE SET leftover = $FLINE <-- Save this part for the next split END END AGAIN foundtab = "Y" <-- Continue if this segment had a tab ----------------- The PEELX Command ----------------- FORMAT: PEELX var1 var2 value1 value2 [value3] : : : : : MEANING: Variable Source From To Control PURPOSE: The PEELX command works just like PEEL, but if it can not find the delimiters being sought, it sets var1 to var2, and var2 to null. PARAMETERS: var1 is the variable being set var2 is the source text being read value1 specifies the starting position (decapsulator) value2 specifies the ending position (decapsulator) value3 is the optional control setting DEFAULTS: value3 = "X" (See "The PARSE Command" for discussion) The PEELX command works exactly the same as the PEEL command, except that if it can not find the text being sought (as specified by value1 and value2), it sets var1 to var2 and sets var2 to null. This is very useful when you are stripping away one item at a time from the left side of a string. For example, let us say you had an input line that contained part numbers arranged like this: A100-34 A100-35 A202-34 A303-35 B143-99 B716-34 89 Now let us say that you wanted to output every part number that contained "-34". The obvious choice is to peel off each part number at the space, but the last part number does not have a space after it. You could take the extra step of appending a space, but PEELX saves you the trouble, as in this example: BEGIN PEELX word $FLINE "" " " OUTEND word ^ "-34" |{word} AGAIN $FLINE <> "" The PEELX command strips out the part numbers as "A100-34", "A100-35" and so on, until $FLINE contains only "B716-34". Then, when PEELX looks for a space, it does not find it. So it sets the variable "word" to "B716-34" and sets $FLINE to null. 90 ============================================================================ POSITIONAL COMMANDS ============================================================================ ------------------ General Discussion ------------------ NOTE: If you are a programmer, you may be tempted to use positional commands even when other Parse-O-Matic commands are more efficient. The positional approach is reminiscent of the parsing strategies used in traditional programming languages, so you may use them because of their familiarity. The following material discusses this issue, to help you to create shorter, faster POM files. ----------------------------- What are Positional Commands? ----------------------------- Parse-O-Matic's positional commands let you work with the numeric position of one text string in another. For example, if the variable xyz contains the value "ABCD": SEARCH POSITION STRING IN xyz COMMENTS ------ -------- ----------------------------------------- "A" "1" "A" appears in the 1st position of "ABCD" "AB" "1" "ABCD" "1" "C" "3" "C" appears in the 3rd position of "ABCD" "CD" "3" "D" "4" "AC" "0" "0" since "AC" does not appear in "ABCD" ---------------------------- Why Use Positional Commands? ---------------------------- Positional commands give you the precise control you need for certain difficult parsing tasks. For example, if you want to obtain the last three characters of a string of known length (e.g. "ABCDEFG"), the standard approach is: SET abc = "ABCDEFG" SET xyz = abc[5 7] However, if the length of the string is not known, you can not use the substrings in [square brackets]. (To make Parse-O-Matic run as fast as possible for standard parsing jobs, you can not use variables within square brackets.) 91 If the length of the string is not known, you can use positional commands to obtain the last three characters. Here is an example: SET abc = "Unknown" SETLEN len abc CALC lenminus = len "-" "2" COPY xyz abc lenminus len The SETLEN command finds the length (i.e. the last character position) of the abc variable. In this case, the answer is "7", since "Unknown" is seven characters long. The CALC command subtracts "2" from this length, setting the lenminus variable to "5". Finally, the COPY command copies from position "5" to "7", setting the variable xyz to "own" -- the last three characters of the abc variable. ----------------- A Cautionary Note ----------------- Positional commands are useful for some applications, but many parsing jobs do not require them. The commands SET, IF, PARSE and PEEL can usually do the same job with less effort. For example, the following approaches are equivalent: STANDARD APPROACH POSITIONAL APPROACH ----------------- --------------------- SET abc "AB/CD" SET abc = "AB/CD" PARSE xyz abc "/" FINDPOSN n abc "/" COPY xyz abc n+ The positional approach requires more lines than the standard approach to extracting the characters after the "/" character. Another problem is that because positional commands give you fine control of the parsing process, it is up to you to guard against exceptional situations. Consider this example: FINDPOSN x $FLINE "/" CALC x = x "+" "1" COPY xyz $FLINE x If $FLINE (the current input line) contains the value "ABC/DEF": FINDPOSN sets x to "4" (the position of the "/" character) CALC increases x to "5" COPY sets xyz to "DEF" -- from position "5" to the end of $FLINE Unfortunately, a problem occurs if $FLINE does not contain a slash: FINDPOSN sets x to "0" (meaning the "/" was not found) CALC increases x to "1" COPY copies from position "1" to the end of $FLINE 92 This may not be what you intended. If you want to return a null string when $FLINE does not contain a slash, you could use a single PARSE command: PARSE xyz $FLINE "/" This copies anything after the slash to the xyz variable. If $FLINE does not contain a slash, xyz is set to "". The precise control provided by Parse-O-Matic's positional commands makes them indispensible for certain parsing applications. Just remember that with added power comes added responsibility: you will sometimes have to add extra code to handle unusual situations. --------------------------- Negative Positional Indices --------------------------- In most cases, you will use the COPY, DELETE and EXTRACT commands with absolute character positions. For example: SET x = "ABCDE" COPY y x "2" "4" This sets variable y to "BCD". However, sometimes you need to work from the other end (i.e. the right-hand side) of the variable. For this reason, COPY, DELETE and EXTRACT support negative indices, which count back from the right edge of the variable. For example, let's say you want to find out what the last two characters of a variable are. You could do it this way: SET x = "ABCDE" <-- Sets the variable SETLEN xlen x <-- Finds out how long it is CALC x1 = xlen "-" "1" <-- Calculates the 2nd-to-last position COPY z x x1 xlen <-- Copies the last two characters This sets the variable z to "DE", which is indeed the last two characters of the x variable. 93 An easier method is to use negative positional indices (available in the COPY, DELETE and EXTRACT commands). In this case, the solution is much simpler: SET x = "ABCDE" COPY z x "-2" "-1" This sets the variable z to "DE". The "from" specification of "-2" means "the second-to-last character". The "-1" specification means "the last character". (Strictly speaking, you could omit the "-1", since the default "to" value is "copy to the end of the variable".) In most parsing applications, it is unusual to work from the right-hand side of the variable (see "The Left-Peeling Method" for comparison) because data tends to be read from left to right. However, negative positional indices do give you some additional flexibility that may be useful in difficult parsing situations. ------------------ The SETLEN Command ------------------ FORMAT: SETLEN var1 value1 PURPOSE: SETLEN sets var1 to the length of value1. Here is an example of the SETLEN command: SET x = "ABCD" SETLEN y x This sets variable y to "4". One handy application for SETLEN is to underline text. For example: SET name = $FLINE[1 15] TRIM name "B" " " SETLEN nlen name SET uline = "" PAD uline "L" "-" nlen OUTEND |{name} OUTEND |{uline} If the input line contains the name "JOHN SMITH", the output would be: JOHN SMITH ---------- For another example that does underlining, see "POM and Wildcards". 94 ------------------ The DELETE Command ------------------ FORMAT: DELETE var1 value1 [value2] PURPOSE: The DELETE command removes a range of characters (specified as a starting and ending position) from a variable. PARAMETERS: var1 is the variable from which characters will be removed value1 is the starting position (e.g. "1" = First character) can be a negative index ("-3" = 3rd-to-last character) value2 is the optional ending position; if it is omitted, it is assumed to mean "the last character in var1" can be a negative index ("-2" = 2nd-to-last character) NOTES: If value1 is null or "0", value1 = "1" If value2 is null or "0", value2 = "last character in var1" ALTERNATIVES: The PEEL, TRIM, CHANGE, SET and APPEND commands Here is an example of the DELETE command: SET x = "ABC///DEF" DELETE x "4" "6" This deletes from position 4 to 6, so the variable x is set to "ABCDEF". If value2 is omitted, DELETE assumes you wish to delete everything from the starting position to the end of the string. For example: SET x = "ABC///DEF" DELETE x "4" This sets x to "ABC". 95 ---------------- The COPY Command ---------------- FORMAT: COPY var1 value1 value2 [value3] PURPOSE: The COPY command copies a range of characters (specified as a starting and ending position) from a value to a variable. PARAMETERS: var1 is the variable being set value1 is the source value, from which you will copy text value2 is the starting position (e.g. "1" = First character) can be a negative index ("-3" = 3rd-to-last character) value3 is the optional ending position; if it is omitted, it is assumed to mean "the last character in value1" can be a negative index ("-2" = 2nd-to-last character) NUMERICS: Tabs, spaces and commas are stripped from value2 and value3 NOTES: If value2 is null or "0", value1 = "1" If value3 is null or "0", value3 = "last char in value1" ALTERNATIVES: The SET command Here is an example of the COPY command: SET x = "ABC///DEF" COPY y x "4" "6" This copies from position 4 to 6, so the variable y is set to "///". If value2 is omitted, COPY assumes you wish to copy everything from the starting position to the end of the string. For example: SET x = "ABC///DEF" COPY y x "4" This sets y to "///DEF". To make your POM files easier to read, you might consider padding the COPY command with an equals sign to remind you that a variable is being set. For example: COPY y = x "4" "6" This emphasizes that the variable y is being set to a substring of x. For more information about padding, see "Padding for Clarity". 96 ------------------- The EXTRACT Command ------------------- FORMAT: EXTRACT var1 var2 value1 [value2] PURPOSE: The EXTRACT command works like COPY, but removes the characters from the source variable after copying them to a variable. PARAMETERS: var1 is the variable that will contain the characters extracted from var2 var2 is the variable from which characters will be copied to var1, then removed value1 is the starting position (e.g. "1" = First character) can be a negative index ("-3" = 3rd-to-last character) value2 is the optional ending position; if it is omitted, it is assumed to mean "the last character in var2" can be a negative index ("-2" = 2nd-to-last character) NUMERICS: Tabs, spaces and commas are stripped from value1 and value2 NOTES: If value1 is null or "0", value1 = "1" If value2 is null or "0", value2 = "last character in var2" ALTERNATIVES: The PEEL command Here is an example of the EXTRACT command: SET x = "ABC///DEF" EXTRACT y x "4" "6" This copies from position 4 to 6, so the variable y is set to "///". The characters copied to variable y are removed from x, so that it now contains the value "ABCDEF". If value2 is omitted, EXTRACT assumes you wish to extract everything from the starting position to the end of the string. For example: SET x = "ABC///DEF" EXTRACT y x "4" This sets y to "///DEF", while the variable x is set to "ABC" (i.e. the original value for x, with the extracted characters removed). 97 -------------------- The FINDPOSN Command -------------------- FORMAT: FINDPOSN var1 value1 value2 [value3 [value4]] : : : : : MEANING: 1) Variable Source Find : : 2) Variable Source From To Control PURPOSE: The FINDPOSN command finds one text string in another. It locates the starting or ending position of a string, or a string delimited by one or two other strings. PARAMETERS: var1 is the variable that will contain the position if the string is found (e.g. "2" means it was found in the second position of value1; "0" means the string was not found) value1 is the string being searched value2 is the string being sought, or... the left-most part of a string being sought value3 is the right-most part of the string being sought; if it is set to null (""), it is assumed to mean "the last character in value1" value4 is the control setting DEFAULTS: value4 = "IS" ALTERNATIVES: The SCANPOSN command SEE ALSO: This section is much easier to understand if you have studied "The Parse Command". There are two ways to use the FINDPOSN command: the "Plain String Find" and the "Embedded String Find". These are discussed below. --------------------- The Plain String Find --------------------- In its simplest form, the Plain String Find locates a string (value2) in another string (value1) and assigns its position to a variable (var1). Here is an example: FINDPOSN x $FLINE "Fred" This looks for the first occurrence of "Fred" in $FLINE (the current input line). If $FLINE contains "Hello Fred!", the command will set the variable x to "7", since "Fred" starts in the seventh character position. 98 --------------------------- Using a Single Decapsulator --------------------------- Sometimes you don't want to find the FIRST occurrence, but the second, third, and so on. You can use a single decapsulator (see "The Parse Command") to specify this. For example: SET z = "This is the way to demonstrate the FINDPOSN command" FINDPOSN x z "the" FINDPOSN y z "2*the" The first FINDPOSN command finds the first occurrence of "the", using a plain string, so it sets the variable x to "9", since the first "the" starts in the ninth position. The second FINDPOSN command uses a decapsulator with the occurrence number "2*", which means "look for the second occurrence". Thus, it sets the variable y to "32", since the second "the" occurs in that position. Incidentally, the first FINDPOSN could also have been written this way: FINDPOSN x z "1*the" which is another way of saying, "Look for the first occurrence". However, if no occurrence number is specified, FINDPOSN assumes you are looking for the first occurrence. ---------------------------- The Encapsulated String Find ---------------------------- NOTE: The Encapsulated String Find is very similar to the PARSE command. If you do not find the following discussion sufficiently instructive, you can gain some additional insight by reading the section of this manual entitled "The Parse Command". The Encapsulated String Find looks for a string that is encapsulated by (i.e. located between) two other strings. This is useful if your input data contains text that is surrounded by delimiters. One common example is the "comma-delimited" file (see "Why You Need Parse-O-Matic -- An Example" for a sample). Here is another situation where data is surrounded by delimiters: 99 |Mouse |Gazelle|Mouse |Elephant| |Dog |Giraffe|Elk |Mongoose| |Monkey|Snake |Caribou|Trout | One can imagine an application that would create tabular data like this -- cleverly (but annoyingly) reducing the column widths to the minimum. This would make the column starting and ending positions unpredictable. You could use the PARSE command to obtain values from each column, but if you have a lot of data, it would be more efficient to determine the starting and ending positions at the outset. Let's say you wanted to extract the third column. You could set up your POM file like this: BEGIN startposn = "" FINDPOSN startposn $FLINE "3*|" "4*|" "XS" FINDPOSN endposn $FLINE "3*|" "4*|" "XE" HALT startposn = "0" "Missing delimiter!" END COPY animal $FLINE startposn endposn OUTEND |{animal} The lines between the BEGIN and END are run only once for the entire parsing job, since they set the startposn variable to something other than a null ("") string. (See "Uninitialized and Persistent Variables") The first FINDPOSN command uses the decapsulators "3*|" and "4*|" to locate the text between the third and fourth "|" delimiters, but because of the "XS" control value (described later), startposn is set to the position AFTER the delimiter. (Briefly, "XS" means "exclude the found text, and refer to the starting position of the text that follows it.) Thus, the variable startposn is set to "12"; "Mouse" starts in the twelfth position. The second FINDPOSN command sets the ending position (endposn) in a similar way. It finds the third and fourth "|" delimiters, but because of the "XE" control setting, it sets endposn to the position BEFORE the fourth delimiter. (Briefly, "XE" means "exclude the found text, and refer to the ending position of the text that precedes it.) The HALT command is simply a safeguard to ensure that the input data follows the correct format. If the first FINDPOSN fails to find the third or fourth "|" delimiter, it will set startposn to "0" (meaning "not found"). The COPY command copies $FLINE (the current input line) from the starting position (startposn) to the ending position (endposn). This value is then output by the OUTEND command. 100 ---------------- Control Settings ---------------- The control settings give you precise control of the part of the string to which you are referring. Valid control settings are: SETTING MEANING ------- ------- IS Include found text and report where the entire text starts IE Include found text and report where the entire text ends XS Exclude found text and report where the delimited text starts XE Exclude found text and report where the delimited text ends NOTE: While FINDPOSN greatly resembles the PARSE command, the default control setting is different. In PARSE, the control setting is assumed to be "X" if it is omitted. In FINDPOSN, however, the control setting is assumed to be "IS" if it is omitted. Let us assume that the we set the variable z as follows: SET z = "ABzzzCDEFzzzGH" This produces the following results: COMMAND VALUE FOR x VARIABLE --------------------------------- -------------------- FINDPOSN x z "1*zzz" "2*zzz" "IS" "3" FINDPOSN x z "1*zzz" "2*zzz" "XS" "6" FINDPOSN x z "1*zzz" "2*zzz" "XE" "9" FINDPOSN x z "1*zzz" "2*zzz" "IE" "12" The following illustration may make the results easier to understand: +------------------------------------------------------------------------+ | | | Measuring Scale: 12345678901234 | | -------------- | | Command: FINDPOSN x "ABzzzCDEFzzzGH" "zzz" "2*zzz" "" | | | | | | | | Control Value: IS XS XE IE | | Results: 3 6 9 12 | | | +------------------------------------------------------------------------+ In the example, the control values have the following specific meanings: "IS" ("Include, Start") = start of entire text (from "1*zzz" to "2*zzz") "XS" ("Exclude, Start") = start of text after the "from" item ("1*zzz") "XE" ("Exclude, End") = end of text before the "to" item ("2*zzz") "IE" ("Include, End") = end of entire text (from "1*zzz" to "2*zzz") 101 ------------------ Insoluble Searches ------------------ FINDPOSN returns "0" (zero) when it can not find a string, or if it is presented with an insoluble dilemma. Here are some examples: FINDPOSN x "CatDog" "Moose" <-- "Moose" can not be found FINDPOSN x "ABCDEF" "A" "G" <-- "G" can not be found FINDPOSN x "ABCDEF" "A" "2*E" <-- There is no second "E" Here is another insoluble search: FINDPOSN x "ABCDEF" "C" "D" "XS" FINDPOSN x "ABCDEF" "C" "D" "XE" There is nothing between the "from" and "to" delimiters. Since we are excluding the delimiters themselves (with "XS" and "XE" specifications), we can not provide a "start" or "end" value for what we found -- we didn't find anything! Hence, we have nothing for which to return a starting or ending position. ------------------ Null Decapsulators ------------------ Consider these next two commands: FINDPOSN x "ABCDEF" "F" "" "XS" FINDPOSN x "ABCDEF" "F" "" "XE" What comes between "F" and the end of the string? Bear in mind, however, that when you use a null ("") to mean "the last character", it is not excluded (see "The Null Decapsulator" in the section entitled "The Parse Command", for a discussion). Thus, the two FINDPOSN commands "find" the final character "F", and both return "6". These both return "6" because the "F" is both the starting and ending position of what we found, and we included (rather than excluded) the starting and ending delimiters ("F" and the last character, respectively). Similarly, the following commands return a "1": FINDPOSN x "ABCDEF" "" "A" "XS" FINDPOSN x "ABCDEF" "" "A" "XE" Even though there is nothing between "A" and "the first character", the first character is not excluded, since we are using a null decapsulator. As a result, we find the string "A" and return its position, which is "1". 102 --------------------- Finding The Last Word --------------------- One common use for FINDPOSN is to find the last occurrence of a word in a line of text. Consider the following lines: SET z = "Parse-O-Matic is a fine product!" FINDPOSN x z ">* " "" "XS" This will set the x variable to 25 (the position of the final word). The command looks for the last "space" character (which is in position 24), then (because of the "XS" control) returns the position of the character following it. --------------- Who Needs This? --------------- At this point, you may be wondering, "Why do I need to have this kind of precise control?" Well, in most cases you don't, so you will tend to use the "Plain String Find" (described earlier). However, certain complex parsing applications demand that you make a distinction between the text that encapsulates a piece of text, and the encapsulated text itself. When faced with this kind of task, you will see that Parse-O-Matic's FINDPOSN command lets you accomplish in one line what would take dozens of lines in a traditional programming language. 103 -------------------- The SCANPOSN Command -------------------- FORMAT: SCANPOSN var1 var2 value1 value2 [value3] : : : : : MEANING: from to source scanlist control PURPOSE: SCANPOSN searches the source value for one of the scanterms in the scanlist (see "Terminology", below). SCANPOSN finds out which scanterm provides the best match, then returns the "from" (starting) and "to" (ending) positions of that scanterm in the source value. TERMINOLOGY: scanterm An item in a scanlist; one of the things you are searching the source value for. scanlist A list of scanterms. Here is an example of a scanlist: "/Mr/Mrs/Ms" PARAMETERS: var1 is the variable that will contain the starting position if one of the scanterms is found (e.g. "2" means it was found in the second position of value1; "0" means SCANPOSN did not find any of the scanterms) var2 is the variable that will contain the ending position if one of the scanterms is found value1 is the source string -- the string being searched value2 is the scanlist (see "Terminology", above) value3 is the optional control string DEFAULTS: value3 = "I" (i.e. Ignore case) ALTERNATIVES: The FINDPOSN command A common requirement in parsing is to find out if one of several strings can be found in another string. For example, you might want to find out if a name starts with a "salutation" (Mr., Mrs., Ms.). You can do this by looping through the various strings and comparing each one, but SCANPOSN lets you do all this with a single command. For example, to search for a salutation in a string: SCANPOSN from to $FLINE "/Mr./Mrs./Miss/Ms." If $FLINE (the line just read from the input file) contains one of the scanterms in the scanlist, SCANPOSN will set the "from" and "to" variables. Thus, if $FLINE contains "Ms. Mary Jones", the "from" variable is set to "1" and the "to" variable is set to "3" (since "Ms." goes from positions 1 to 3 in $FLINE). If none of the scanterms is found, the "from" variable is set to "0". Thus, if $FLINE contains "John Smith", no salutation is found, and the SCANPOSN command shown above will set the "from" variable to "0". 104 ------------ The Scanlist ------------ The scanlist can contain one or more scanterms. The FIRST character in the scanlist is interpreted as the delimiter (separator) for the scanterms. Thus, the following scanlists are all valid: "/Mr./Mrs./Miss/Ms." <-- Delimiter is: / "xMr.xMrs.xMissxMs." <-- Delimiter is: x "@Library@School@Gymnasium@Clinic/Hospital" <-- Delimiter is: @ "/Cow" <-- Delimiter is: / The first example ("/Mr./Mrs./Miss/Ms.") has already been demonstrated. The second example uses the letter "x" as a delimiter. This would cause a problem if one of the scanterms contained an "x", since it would be treated as TWO scanterms. For example: "xJohnxTrixiexFred" The name "Trixie" contains an "x", so it would be broken down into two scanterms ("Tri" and "ie"). You should always choose a scanlist delimiter that does not appear in the list of scanterms. ----------------------- Accommodating Variation ----------------------- When you design a scanlist, you should take into account the possibility that the input might contain strange variations. Consider this command: SCANPOSN x y "Mr John Smith" "/Mr./Mrs./Ms." This will set the x variable to "0" because the "Mr" is followed by a space, not a period. A more "forgiving" command would be: SCANPOSN x y "Mr John Smith" "/Mr./Mrs./Ms./Mr /Mrs /Ms " This would successfully locate the "Mr " string, and set x to "1" and y to "3". (The "3" points to the space.) 105 ------------------------------ Handling Prefixes and Suffixes ------------------------------ When designing a scanlist, you should consider that a scanterm might be part of a word. For example: SCANPOSN x y "Mississipi Sue" "/MR./MRS./MISS/MS." This will find the "Miss" in Mississippi, even though this is not part of a salutation. A more appropriate command would be: SCANPOSN x y "Mississipi Sue" "/MR./MRS./MISS /MS." The space after "Miss" in the scanlist ensures that if it is found, it will be separate from any word following it. The trailing space is not necessary for the scanterm "MR.", since no word contains a period. However, if you do include spaces after the periods (as in "/MR. /MRS. /MISS /MS. ") it may simplify your subsequent parsing operations. You must also take suffixes into account. For example: SCANPOSN x y "Zinc Enterprises" "/INC/CO/ENTERPRISES" This will find the "inc" in "Zinc". You can add a space in front of each scanterm to ensure that it is separated from any other word: SCANPOSN x y "Zinc Enterprises" "/ INC/ CO/ ENTERPRISES" You may be tempted to put spaces on both sides of a word, to handle both prefixes and suffixes. However, consider this example: SCANPOSN x y "Wazoo Inc" "/ INC / CO / ENTERPRISES " None of the scanterms is found, because the "Inc" in the source string does not end in a space. You can address this kind of problem with the control settings (described next). ---------------------- Controlling the Search ---------------------- Unless otherwise instructed, SCANPOSN will find the first scanterm that appears anywhere in the source string, and return its start and end positions. You can modify this behavior by using the optional control parameter (value3). The control parameter contains one or more characters, each of which has a special meaning. 106 CHARACTER MEANING --------- ------- < Find the leftmost match > Find the rightmost match I Ignore case (e.g. Xyz matches XYZ) M Match case (e.g. Xyz does not match XYZ) Here are some valid control settings: SETTING MEANING ------- ------- "M" Find the rightmost match, which must match case "M" Find any match, but the case must be the same NOTES: If neither "I" nor "M" are specified, SCANPOSN assumes "I". If neither "<" nor ">" are specified, SCANPOSN does a "Find-Any" search (explained below). ----------------------------- Leftmost, Rightmost, Find-Any ----------------------------- The ">" (rightmost) control setting tells SCANPOSN to find the scanterm that has the highest "to" value with the lowest "from" value. This means that ALL of the scanterms are evaluated. Consider this command: SCANPOSN x y "SHREWxxxCATxxxMOUSExxx" "/CAT/DOGGY/MOUSE/ELK" ">" SCANPOSN finds "CAT", but continues looking to see if there are any better matches to the right. Eventually it finds MOUSE and sets x to "15" and y to "19" (pointing at "MOUSE"). If you use the "<" (leftmost) parameter, SCANPOSN will check all the scanterms to find out which one has the lowest "from" position with the highest "to" value. SCANPOSN x y "SHREWxxxCATxxxMOUSExxx" "/CAT/DOGGY/MOUSE/ELK" "<" This will set x to "9" and y to "11" (pointing at "CAT"). If you do not specify "<" or ">", SCANPOSN finds the first scanterm it can, and ignores the rest. SCANPOSN x y "SHREWxxxCATxxxMOUSExxx" "/CAT/DOGGY/MOUSE/ELK" The first scanterm is "CAT", and this can be found at positions 9 to 11. SCANPOSN will return those values, and ignore the rest of the scanterms. The absence of a "<" or ">" is known as a "Find-Any" search. You can use this if you want to know if one of the scanterms appears in the source string, but you are not interested in finding out which one. 107 ------------------------ The Best Match Principle ------------------------ NOTE: The "Best Match" principle does not apply to the "Find-Any" search. It applies only to the Leftmost ("<") and Rightmost (">") searches. To use the SCANPOSN command effectively, you must understand the concept of "the best match". This can be illustrated with an example: SCANPOSN x y "MegaWhizco International" "/CO/WHIZCO/MEGAWHIZ" ">" The SCANPOSN command finds the scanterm CO at positions 5 to 6. However, it continues looking for an even better match. It finds that WHIZCO is just as far to the right (i.e. it ends at position 6), but has a lower starting position. This makes it a "better" match. The next scanterm (MEGAWHIZ) has a lower starting position, but its ending position is not as good (i.e. not as far to the right). It is rejected because we are looking for the rightmost string. As a result, SCANPOSN will set x to "1" and y to "6". In other words, when SCANPOSN is looking for the rightmost scanterm, it will first identify the "found" scanterms which have the highest ending position, and then choose the longest one. Here is an example using a leftmost search: SCANPOSN x y "Our catalog is enclosed" "CAT/MOOSE/CATALOG/DOG" "<" The SCANPOSN finds CAT at positions 5 to 7, but as it continues checking the scanterms, it finds that CATALOG is just as far to the left (i.e. it starts at position 5), but it is a better match since it has a higher ending position. As a result, SCANPOSN will set x to "5" and y to "11". The "Best Match" principle does not affect "Find-Any" searches. For example: SCANPOSN x y "Our catalog is enclosed" "CAT/MOOSE/CATALOG/DOG" This sets x to "5" and y to "7". Since this is a "Find-Any" search (i.e. neither "<" nor ">" are specified in the control settings), SCANPOSN stops looking as soon as it has found a match. 108 When doing a Find-Any search, you can not be sure if any of the other scan terms appear in the source string. For example: SCANPOSN x y "Our cat and dog are upstairs" "CAT/DOG" This will find CAT and stop looking for additional matches. If you change the order of the scanlist, you will get different values: SCANPOSN x y "Our cat and dog are upstairs" "DOG/CAT" Thus, a Find-Any search is useful only for detecting if one of the scanterms appears in the source string. After doing a Find-Any search, you can check if the "from" value is "0" (meaning no scanterms were found). If it is not "0", it means one of the terms WAS found. For example: SET source = "Our cat catalog is enclosed" SET scanlist = "CATALOG/MOOSE/CAT/DOG" SCANPOSN from to source scanlist BEGIN from = "0" OUTEND |None of the scanterms appeared in the string {source} ELSE OUTEND |At least one of the scanterms appears in the string {source} END 109 ============================================================================ DATE COMMANDS ============================================================================ ------------------ General Discussion ------------------ Parse-O-Matic's date-oriented commands provide you with a convenient way to work with dates. While you can accomplish the same thing using other Parse-O-Matic commands (LOOKUP, PAD etc.), the date functions are optimized for speed, so if your parsing job does a lot of date-format conversions, it will run faster. -------------------- The POMDATE.CFG File -------------------- When a date command is first executed, Parse-O-Matic reads in a file named POMDATE.CFG. (The method by which Parse-O-Matic finds the file is discussed in the section "How Parse-O-Matic Searches for a File".) POMDATE.CFG is a self-documenting text file that contains the default date format string (explained later), and the names of the twelve months. You can edit this file with a standard text editor, or a word-processor in "generic text" mode. As originally supplied with Parse-O-Matic, the default date format string is "?y/?n/?d", which produces YY/MM/DD dates (e.g. July 1 1998 becomes 98/07/01). You can change this to reflect your own preference. If you are parsing data in a language other than English, you can also change the names of the months. ------------ Date Formats ------------ A date format is a sequence of characters that briefly describes the appearance of a date. For example, the format "Y-T-?n" describes a year/month/day format that looks like this: 1998-JULY-02 The following characters have a special meaning in the date format string: d M m n T t Y y ? For these special characters, uppercase and lowercase are important. For example, "T" is not the same as "t". All characters other than the special characters are interpreted "as-is", and are included in the final date string. 110 The following table explains the meaning of the special characters used to specify year, month and day, using the date July 2, 1998 for the examples: CHAR MEANING SAMPLE FORMAT SAMPLE RESULT ---- -------------------------- ------------- ------------- Y 4-digit year d-m-Y 2-Jul-1998 y 1- or 2-digit year d-m-y 2-Jul-98 n 1- or 2-digit month d/n/y 2/7/98 m 3-letter month d/m/y 2/Jul/98 M 3-letter month (uppercase) d M y 2 JUL 98 t Month t d, Y July 2, 1998 T Month (uppercase) T d Y JULY 2 1998 d Day y/m/d 98/7/2 The ? character can be used in the date format to pad out one-digit values to two digits. The following table uses the date February 3, 2001 for the examples: SAMPLE DATE FORMAT SAMPLE RESULT ------------------ ------------- y-?n-?d 1-02-03 ?y/m/?d 01/Feb/03 ?n/?d Y 02/01 2001 t '?y February '01 As the last example shows, it is not necessary to use month, day and year; you can omit any item to obtain an abbreviated date. 111 ----------------- The TODAY Command ----------------- FORMAT: TODAY var1 [value1] PURPOSE: The TODAY command sets a variable (var1) to today's date, in a variety of formats. DEFAULTS: If value1 is not specified, TODAY uses the default date format, which is specified in the file POMDATE.CFG. NOTES: For a discussion of date formats (including the default date format), see the "General Discussion" section at the beginning of this chapter. SEE ALSO: "The Date Command" Assuming today's date is July 1 1998, here are some examples: COMMAND THE VARIABLE xyz IS SET TO... ------------------- ----------------------------- TODAY xyz The default date format TODAY xyz "" The default date format TODAY xyz "Y-M-?d" 1998-JUL-01 TODAY xyz "t d Y" July 1 1998 TODAY xyz "t 'y" July '98 As the last example shows, it is not necessary to use month, day and year; you can omit any item to obtain an abbreviated date. 112 ---------------- The DATE Command ---------------- FORMAT: DATE var1 value1 value2 value3 [value4] PURPOSE: The DATE command sets a variable (var1) to given year (value1), month (value2) and day (value3), or a subset of these items, in a variety of formats, as specified by the format string (value4). PARAMETERS: var1 is the variable being set value1 is the year (e.g. "1998" or "98") value2 is the month (e.g. "1" = January) value3 is the day (1 to 31) value4 is the date format NUMERICS: Tabs, spaces and commas are stripped from value1, 2 and 3 DEFAULTS: If value4 is omitted, DATE uses the default date format, which is specified in the file POMDATE.CFG. NOTES: For a discussion of date formats (including the default date format), see the "General Discussion" section at the beginning of this chapter. SEE ALSO: "The Today Command" Assuming the date being set is July 1 1998, here are some examples: COMMAND THE VARIABLE xyz IS SET TO... ---------------------------------- ----------------------------- DATE xyz "98" "07" "01" The default date format DATE xyz "1998" "07" "01" "" The default date format DATE xyz "98" "7" "1" "Y-M-?d" 1998-JUL-01 DATE xyz "98" "07" "01" "t d Y" July 1 1998 DATE xyz "98" "7" "01" "t 'y" July '98 DATE xyz "98" "7" "" "t 'y" July '98 As the last two examples show, it is not necessary to use month, day and year; you can omit any item to obtain an abbreviated date. If a date is outside a valid range, Parse-O-Matic halts with an error. Acceptable value ranges are: Year 0 to 9999; Month 1 to 12; Day 1 to 31 If the year is between 0 and 99, Parse-O-Matic makes the following assumptions: - If the number is between 80 and 99, it means 1980 to 1999 - If the number is between 0 and 79, it means 2000 to 2079 Parse-O-Matic does not check that a date is "possible", so you could set a date to "February 31, 2001", even though February never has 31 days. 113 -------------------- The MONTHNUM Command -------------------- FORMAT: MONTHNUM var1 value1 PURPOSE: The MONTHNUM command sets the month number of a given month. ALTERNATIVES: The LOOKUP command Here is an example of the MONTHNUM command: MONTHNUM xyz "February" This will set the variable xyz to "2". The comparison is performed on the basis of the number of characters available, without regard to case, so the following would also work: MONTHNUM xyz "FEB" If the result is ambiguous, Parse-O-Matic returns the first match. For example: MONTHNUM xyz "JU" This will set xyz to "6", although it could refer to either June or July. If MONTHNUM can not find a match, it will return a null ("") string. For example: MONTHNUM xyz "ZZZ" Since no month starts with "ZZZ", this will set xyz to "". If you are writing a Parse-O-Matic application that will be run in several languages (using different POMDATE.CFG files), you should carefully study the names of the months in each language to avoid problems. In English, it is always sufficient to provide the first three letters. In French, however, you need at least four letters, to distinguish between "Juin" (June) and "Juillet" (July). Parse-O-Matic can use only one POMDATE.CFG file at a time, so the MONTHNUM command can not be used to translate month names from one language to another. You can, however, accomplish the same thing with the LOOKUP command. 114 -------------------- The ZERODATE Command -------------------- ** ADVANCED COMMAND FOR EXPERIENCED USERS ** FORMAT: ZERODATE value1 value2 value3 PURPOSE: Specifies "day zero" for the date serial number used by the MAKETEXT command when it uses the DATE predefined data type. PARAMETERS: value1 is the year (e.g. "1900") value2 is the month (e.g. "12" for December) value3 is the day (e.g. "5") NUMERICS: Tabs, spaces and commas are stripped from value1, 2 and 3 DEFAULTS: If the ZERODATE command is omitted, the zero date is assumed to be Jan. 1, 1753 (equivalent to ZERODATE "1753" "1" "1"). SEE ALSO: "The MakeText Command" and "Predefined Data Types" A "date serial number" is a common method of representing a date in a data file. It works by counting the number of days since a given date, taking into account the extra days for leap years. Leap years occur in every year that is divisible by four, with the exception of century years -- unless they are divisible by 400. Thus, 1900 is not a leap year, but 2000 is. The ZERODATE command specifies "Day 0". For example, if you specify ZERODATE "1918" "11" "11" (November 11, 1918), you get the following: DATE DATE SERIAL NUMBER ----------------- ------------------ November 9, 1918 -2 November 10, 1918 -1 November 11, 1918 0 November 12, 1918 1 November 13, 1918 2 ... and so on. Most programs set the zero date far enough back that negative numbers are not encountered in normal usage. ZERODATE will not accept a starting year before "1753", which was the first full year that most of the Western world started using the Gregorian calendar. 115 ============================================================================ CALCULATION COMMANDS ============================================================================ ---------------- The CALC Command ---------------- FORMAT: CALC var1 value1 operation value2 PURPOSE: The CALC command performs an integer arithmetic operation on the two values and assigns the answer to var1. NUMERICS: Tabs, spaces and commas are stripped from value1 and value2 ALTERNATIVES: The CALCREAL command SEE ALSO: "Inline Incrementing and Decrementing" Integer arithmetic refers to whole numbers. 1, 10 and 10000 are integers, while 2.0, 3.14159 and 98.5 are not. Let's say your input file looks like this: DESCRIPTION UNITS SOLD UNIT PRICE ---------------- ---------- ---------- Dog collar 15 $ 3.00 Cat collar 25 $ 2.50 Cat caller 3 $ 7.25 Birdie num-nums 1,305 $ 6.25 ---------------- ---------- ---------- End of Data : : : : : : : : : : (Column positions) 1 18 27 33 41 116 You can find out the total number of units sold (of all types) with the following POM file: IGNORE $FLINE[1 7] = "DESCRIP" IGNORE $FLINE[1 7] = "-------" BEGIN $FLINE = "End of Data" OUTEND |Total units sold = {units} ELSE CALC units = units "+" $FLINE[18 27] END As you can see from the example, all spaces and commas are stripped from the number. Tab characters (ASCII 09) are also stripped. You will also notice that CALC can not be used for the prices, since they are not integer data. To add up the prices, you must use the CALCREAL command (see "The CalcReal Command"). Note in particular that the operation ("+" in this case) is in quotes. If you omit the quotes, Parse-O-Matic will report an error. The following operations can be performed with CALC: SYMBOL DESCRIPTION --------- -------------------------------------------- "+" value1 plus value2 "-" value1 minus value2 "*" value1 times value2 "/" value1 divided by value2 (remainder ignored) "HIGHEST" the larger number (value1 or value2) "LOWEST" the smaller number (value1 or value2) Here are some more examples of the CALC command: COMMAND ANSWER -------------------------------- ------ CALC answer = "12" "/" "4" "3" CALC answer = "12" "HIGHEST" "4" "12" CALC answer = "12" "LOWEST" "4" "4" CALC answer = "12" "-" "4" "8" CALC answer = "12" "+" "4" "16" CALC answer = "12" "*" "4" "48" CALC can handle numbers between -2,147,483,648 and 2,147,483,647. 117 -------------------- The CALCREAL Command -------------------- FORMAT: CALCREAL var1 value1 operation value2 [fixed-decimals] PURPOSE: CALCREAL works the same way as CALC, except that it handles decimal numbers. NUMERICS: Tabs, spaces and commas are stripped from value1, value2, and the "fixed-decimals" value ALTERNATIVES: The CALC command SEE ALSO: "The Rounding Command" Using the sample data given in the CALC section, you could write the following POM file: IGNORE $FLINE[1 7] = "DESCRIP" IGNORE $FLINE[1 7] = "-------" BEGIN $FLINE = "End of Data" OUTEND |Total units sold = {units} OUTEND |Total value sold = {value} ELSE CALC units = units "+" $FLINE[18 27] CALCREAL value = value "+" $FLINE[33 41] END CALCREAL can handle values +/- 99,999,999,999, but its accuracy decreases when you are dealing with large numbers, as approximated below: Accurate to 1 decimal place between +/- 9,999,999,999 Accurate to 2 decimal places between +/- 999,999,999 Accurate to 3 decimal places between +/- 99,999,999 Accurate to 4 decimal places between +/- 9,999,999 Accurate to 5 decimal places between +/- 999,999 You can specify a fixed number of decimal positions in the answer by using the optional "fixed-decimals" value. For example: SET z = "3.14159" CALCREAL x = z "+" "0" "2" <-- This sets x to "3.14" CALCREAL x = z "+" "0" "4" <-- This sets x to "3.1415" You will notice, in the second example, that no "rounding" takes place. The number is simply truncated at the requested decimal position. 118 Here are some more examples of the CALCREAL command: COMMAND ANSWER ----------------------------------------------- -------- CALCREAL answer = "12.0" "*" "4.0" "2" "48.00" CALCREAL answer = "12.0" "HIGHEST" "4.0" "2" "12.00" CALCREAL answer = "12" "LOWEST "4" "1" "4.0" CALCREAL answer = "12" "-" "4" "3" "8.000" CALCREAL answer = "12" "+" "4" "1" "16.0" CALCREAL answer = "7" "/" "2" "2" "3.50" CALCREAL answer = "7" "/" "2" "3.5" CALCREAL answer = "7" "*" "2" "14.0" As shown in the examples, if you do not use the optional fixed-decimal value, calculations are in "floating point". That is to say, the answer has as many decimal places as necessary. (Bear in mind the accuracy restrictions mentioned earlier.) Trailing zeros are removed, unless there are no digits after the decimal point, in which case a 0 is added. ------------------------ The INC and DEC Commands ------------------------ FORMAT: INC var1 [value1] DEC var1 [value1] PURPOSE: INC: Adds 1 (or whatever value1 is set to) to var1 DEC: Subtracts 1 (or whatever value1 is set to) from var1 NUMERICS: Tabs, spaces and commas are stripped from var1 and value1. Both var1 and value1 must be integers. var1 must be between -2147483640 and 2147483647. value1 must be between -100000000 and 100000000. DEFAULTS: If value1 is not specified, it is assumed to be "1". ALTERNATIVES: The CALC command SEE ALSO: "Inline Incrementing and Decrementing" The INC command adds "1" to var1, while the DEC command subtracts "1". You can also specify a number other than "1" by specifying the optional value1. Here is a sample sequence of commands... SET x = "" <-- Parse-O-Matic treats null numerics as equal to "0" INC x <-- Sets x to "1" INC x <-- Sets x to "2" DEC x <-- Sets x to "1" INC x "2" <-- Sets x to "3" DEC x "3" <-- Sets x to "0" 119 Note that if you specify a negative value for value1, INC and DEC work the opposite way. For example, if you subtract "-1" from "2", you would get "3". Here are some examples: SET x = "10" DEC x "-2" <-- Sets x to "12" INC x "-3" <-- Sets x to "9" Incrementing or decrementing a variable by "0" leaves it unchanged. -------------------- The ROUNDING Command -------------------- FORMAT: ROUNDING "Y" or ROUNDING "N" PURPOSE: Controls rounding of answers given by the CALCREAL command when it is in "fixed-decimal" mode. PARAMETERS: ROUNDING "Y" turns on rounding (the default) ROUNDING "N" turns off rounding ALTERNATIVES: The CALCREAL command can be used in "floating point" mode, and you can perform rounding and truncation operations yourself, within the POM file. Due to the way that real numbers (as opposed to integers) are calculated in binary, CALCREAL can sometimes return unexpected results. For example: CALCREAL x = "400.00" "-" "390.60" produces the answer "9.399999" rather than the expected answer of "9.4". This discrepancy is due to the nature of real-mode calculations in binary (i.e. inside the computer). The answer is actually very close indeed to the correct answer, yet it could cause problems if you specify the actual number of digits of precision ("fixed-decimals" mode instead of "floating point" mode). For example: ROUNDING "N" <-- This is explained later CALCREAL x = "400.00" "-" "390.60" "2" <-- 2 digits of precision This sets the x variable to "3.39", which is clearly wrong. What has happened is that the remaining digits of the floating-point answer were simply truncated (i.e. removed). 120 Normally, Parse-O-Matic's built-in rounding will add a small value appropriate to the number of digits of fixed-decimal precision you have specified: PRECISION SAMPLE NUMBER ROUNDING VALUE --------- ------------- -------------- 1 9.9 + 0.05 2 9.99 + 0.005 3 9.999 + 0.0005 ... and so on. Thus, CALCREAL "400.00" "-" "390.60" "2" first generates the floating-point answer "9.3999999999", adds "0.005", yielding "9.4049999999". This is then truncated to two digits, yielding "9.40", which is the correct answer. You can turn this rounding behavior off with the following command: ROUNDING "N" You can turn it back on with this command: ROUNDING "Y" By default, rounding is enabled, so unless you explicitly turn it off, there is no need to use the ROUNDING "Y" command. No rounding is performed when CALCREAL is used in "floating-point" mode. -------------------- The CALCBITS Command -------------------- ** ADVANCED COMMAND FOR EXPERIENCED USERS ** FORMAT: CALCBITS var1 value1 operation value2 PURPOSE: CALCBITS performs logical operations SEE ALSO: "The MakeData Command" The CALCBITS command performs "bit-wise" operations on single bytes. The following operations can be performed with CALCBITS: SYMBOL DESCRIPTION --------- --------------------------------- "AND" value1 AND value2 "OR" value1 OR value2 "XOR" value1 XOR value2 "SHR" Shift value1 right by value2 bits "SHL" Shift value1 left by value2 bits Let us say you want to strip the high bit from all of the bytes in an input file. You could accomplish this with the following POM file: 121 CHOP 1-1 <-- Read the input file one byte at a time CALCBITS z $FLINE "AND" $7F <-- Remove the high bit from the byte OUT |{z} <-- Send the result to the output file Note that because we are reading the file one byte at a time, $FLINE is always one byte long. Parse-O-Matic will terminate with an error message if you attempt to use CALCBITS with a value longer than one byte. Thus, assuming the variable xyz contains "ABCDEF", the following line is valid: CALCBITS answer = xyz[3] "AND" $7F However, the following line would not be permitted because it refers to more than one byte: CALCBITS answer = xyz[3 4] "AND" $7F Here are some more examples of the CALCBITS command: COMMAND ANSWER COMMENTS ------------------------------- ------ ------------------------------ CALCBITS answer = $FF "AND" $7F $7F CALCBITS answer = "9" "AND" $39 $39 $39 is the character "9" CALCBITS answer = $F0 "OR" $0F $FF CALCBITS answer = $7F "XOR" $08 $77 CALCBITS answer = $80 "SHR" $01 $40 $80 = 10000000; $40 = 01000000 CALCBITS answer = $01 "SHR" $01 $00 $01 = 00000001; $00 = 00000000 CALCBITS answer = $01 "SHL" $01 $02 $01 = 00000001; $02 = 00000010 CALCBITS answer = $80 "SHL" $01 $00 $80 = 10000000; $00 = 00000000 In most of these examples, we use hex notation (e.g. $01), but you can also use single characters (e.g. "3" which is equivalent to $33) or decimal notation (e.g. #64 which is equivalent to $40). However, you should always bear in mind that you are working with the underlying bit pattern. The following lines are NOT equivalent: CALCBITS answer = $7F "SHL" $01 <-- Shifts left one bit CALCBITS answer = $7F "SHL" "1" <-- This is not the same! The second line interprets "1" as hex $31 (decimal 49). There is obviously no point in shifting an eight-bit byte 49 positions to the left. 122 ============================================================================ INPUT PREPROCESSORS ============================================================================ ----------------- The SPLIT Command ----------------- FORMAT: SPLIT from-position to-position [,from-pos'n to-pos'n] [...] SEE ALSO: "The Leftover Technique" IMPORTANT: This command is analyzed at compile time, which means it can not be used conditionally (i.e. in a BEGIN/END block). The maximum length of an input line from a text file is 255 characters. If your input file is wider than that, you must break up the file into manageable chunks, using the SPLIT command. This command lets you specify the way in which each input line is broken up so that it will look like several SEPARATE lines. For example, if your input lines were up to 300 characters wide, you could specify: SPLIT 1 255, 256 300 This breaks up each line as if it was two lines. (If some of the lines are less than 256 characters, they will still be treated as two lines, although the second line will be null (i.e. empty).) You can specify up to 130 splits (use multiple SPLIT commands if necessary). With SPLIT, Parse-O-Matic can handle large input records, up to a maximum total length of 32767 characters. The best way of handling SPLIT or CHOPped files is to use a combination of the $SPLIT variable (explained in more detail later) and BEGIN/END. For example: SPLIT 1 250, 251 300 BEGIN $SPLIT = "1" SET a = $FLINE[ 1 10] SET b = $FLINE[11 20] END BEGIN $SPLIT = "2" SET x = $FLINE[ 1 10] SET y = $FLINE[11 20] OUTEND |{a} {b} {x} {y} END This outputs the data which appears (in the input file) in columns 1-10, 11-20, 251-260 and 261-280. 123 ------------------------------ Indicating Actual Input Length ------------------------------ The final split must indicate the maximum length of the line. Thus, if you have a text file with a maximum line length of 275, you still have to indicate this, even if you are only interested in the first 100 characters: SPLIT 1-100, 101-275 You could use IGNORE $SPLIT = "2" to get rid of the additional text lines. --------------------- Non-Contiguous Splits --------------------- Your splits do not have to be contiguous. For example, the following SPLIT command is legal: SPLIT 5 39, 41 100, 247 285 The first four characters of each split would be ignored, so your first split would contain only the characters at positions 5 to 39 of the line. Similarly, the second split would contain the 41st through 100th character, and the third split would contain the 247th through 285th character. 124 ---------------- The CHOP Command ---------------- FORMAT: CHOP from-position to-position [,from-pos'n to-pos'n] [...] CHOP 0 PURPOSE: Controls the number of bytes Parse-O-Matic will read from the input file each time it processes the POM file. SEE ALSO: "The Get Command" IMPORTANT: This command is analyzed at compile time, which means it can not be used conditionally (i.e. in a BEGIN/END block). The CHOP command works the same way as the SPLIT command, with one exception: it informs Parse-O-Matic that the input is a fixed-record- length file. In other words, it means that the input records are distinguished by having a particular (and exact) length, rather than being separated by end-of-line characters (Carriage Return, Linefeed) as is the case for a standard text file. Thus, if you have an input file containing fixed-length records, each of which is 200 characters wide, you could specify it like this: CHOP 1 200 If the input record is more than 255 characters, you must break it up into smaller chunks. For example, if the input record was 300 characters wide, you could break it up like this: CHOP 1 250, 251 300 By using CHOP, Parse-O-Matic can handle input records up to 32767 characters wide. You can use the $SPLIT variable to manage your use of CHOP. See the example in the section describing the SPLIT command. -------------- Manual Reading -------------- There is a special form of the CHOP command, which looks like this: CHOP 0 This tells Parse-O-Matic that you will handle all file reading yourself. In such case, $FLINE is always null. The only way to get data from the input file is with the GET command. When you use CHOP 0 for manual reading, the MINLEN and READNEXT commands have no meaning. If you place them in the POM file, they are ignored. 125 ============================================================================ LOOKUP COMMANDS ============================================================================ ------------------ The LOOKUP Command ------------------ FORMAT: LOOKUP var1 value1 PURPOSE: The LOOKUP command searches for value1 in a text file (the name of which is specified either by the LOOKFILE command or the /L startup parameter). When POM finds it, it sets var1 to another value found on the same line. ALTERNATIVES: The REMAP command Let us suppose you created a text file, named NAMES.TBL, like this: R. REAGAN Ronald Reagan D. EISENHOWER Dwight Eisenhower G. BUSH George Bush B. CLINTON Bill Clinton : : Column 1 Column 18 This file can be used to look up a name, as in this POM file: LOOKFILE "NAMES.TBL" LOOKCOLS "1" "17" "18" "34" SET oldname = $FLINE[21 37] TRIM oldname "R" " " LOOKUP newname = oldname OUTEND |{oldname} {newname} The LOOKFILE command specifies the name of the look-up file. The LOOKCOLS command specifies the starting and end columns for both the "text-to-look- for" field (known as the key field) and the "text-to-get" field (known as the data field). The LOOKUP command will look for oldname in NAMES.TBL. If oldname is set to "G. BUSH", LOOKUP sets newname to "George Bush". If, however, oldname is set to "G. WASHINGTON", which doesn't appear in NAMES.TBL, newname is set to "" (that is to say, an empty string). 126 ------------- Search Method ------------- When searching for the key field, LOOKUP compares text according to the length of the string you are looking for. If your LOOKUP file looks like this: ABCDEF 456 ABC 678 XYZABC 345 XYZ 123 then the command LOOKUP x = "XYZ" would match on "XYZABC". If this search procedure is a problem for you, there are two ways you can deal with it: 1) Pad your search strings before searching, as in this example: PAD search "R" " " "6" LOOKUP x = search If the search variable was original set to "XYZ", the PAD command would set it to "XYZ ", which would not match XYZABC. 2) Put the shorter key fields in the lookup file ahead of the longer ones (of which they are a sub-string), as in this example: ABC 678 ABCDEF 456 XYZ 123 XYZABC 345 It is worth pointing out that this look-up file is sorted in ASCII order (whereas the example given earlier was not). A sorted file can be more efficient, as explained in "The LookSpec Command". ----------- Limitations ----------- There is no limit to the number of lines that you can put in a look-up file. However, the more lines there are, the longer it will take to process (because there is more to search). The maximum length of a line in a look-up file is 255 characters. 127 ----------------------- Null Lines and Comments ----------------------- In the look-up file, null (empty) lines are ignored. You can also include comments in the file by starting the line with a semi-colon: ; Some of the Presidents of the United States R. REAGAN Ronald Reagan D. EISENHOWER Dwight Eisenhower G. BUSH George Bush The LOOKUP command can be used for more than just names, of course. You could use it to look up prices, phone numbers, addresses and so on. ---------------- Multiple Columns ---------------- You can use the same lookup file to find different items that are related to the same key field. For example, let's say you have created a lookup file, named EMPLOYEE.TBL, which looks like this: ; EMPLOYEE# NAME PHONE 00001 John Smith 555-1212 00002 Mary Jones 555-2121 00003 Fred Johnson 555-1122 You could look up an employee's name and phone number as follows: LOOKFILE "EMPLOYEE.TBL" LOOKCOLS "3" "7" "15" "37" LOOKSPEC "N" "Y" "N" LOOKUP empdata = "00002" SET name = empdata[ 1 12] SET phone = empdata[16 23] TRIM name "B" " " TRIM phone "B" " " You could, of course, specify a different LOOKCOLS prior to each LOOKUP, but that would mean reading the disk twice. It most cases, it is faster to obtain the data all at once, then extract it. 128 ------------------- LOOKUP Versus REMAP ------------------- If you have only a few thousand bytes of lookup data, you might be able to use the REMAP command instead of LOOKUP. However, you can not simply replace LOOKFILE and LOOKUP with MAPFILE and REMAP. REMAP does not return a null value if it can not find the item being sought, so you will have to change your POM file to compare the original string with the revised string, in order to see if it has changed (i.e. it was found). Even with this test, REMAP might "fool you" if it finds a partial match. If you are processing a lot of input data, using REMAP may speed up processing, since REMAP works in RAM memory, while LOOKUP reads the disk. However, if your disk uses "caching", the performance improvement may be negligible. -------------------- The LOOKFILE Command -------------------- FORMAT: LOOKFILE value1 PURPOSE: The LOOKFILE command specifies the name of the look-up file for the next LOOKUP command. SEE ALSO: "How Parse-O-Matic Searches for a File" LOOKFILE lets you use several look-up files in one POM file. For example: SET name = $FLINE[1 20] ; Look up the name LOOKFILE "NAMES.TBL" LOOKCOLS "1" "25" "30" "50" LOOKUP fullname = name ; Look up phone number LOOKFILE "PHONE.TBL" LOOKCOLS "1" "25" "30" "40" LOOKUP phone = name ; Output result OUTEND |{name} {fullname} {newname} If you only have one look-up file, you may omit the LOOKFILE command and specify the file name on the command line, using the /L parameter. For example, you could write a POM file like this: 129 SET name = $FLINE[1 20] ; Look up the name LOOKCOLS "1" "25" "30" "50" LOOKUP fullname = name ; Output result OUTEND |{name} {fullname} Your POM command could then look like this: POM MYPOM.POM INPUT.TXT OUTPUT.TXT /LC:\MYFILES\NAMES.TBL This technique allows you to use several different look-up files with the same POM file, simply by changing the command line. (The method by which Parse-O-Matic finds the file is discussed in the section "How Parse-O-Matic Searches for a File".) The longest line allowed in a look-up file is 255 characters long. If you specify a null look-up file name (e.g. LOOKFILE ""), Parse-O-Matic closes the current look-up file (if one is open). This is necessary if you wish to delete the file, using the ERASE command. -------------------- The LOOKCOLS Command -------------------- FORMAT: LOOKCOLS value1 value2 value3 value4 PURPOSE: The LOOKCOLS command specifies the starting and ending columns for the key and data fields in a look-up file (see the explanation of the LOOKUP command for an overview of look-up files). PARAMETERS: value1 specifies the starting column for the key field value2 specified the ending column for the key field value3 specifies the starting column for the data field value4 specified the ending column for the data field NUMERICS: Tabs, spaces and commas are stripped from value1, 2, 3 and 4 You can specify a null value to indicate "same as last time". For example: SET name = $FLINE[1 20] LOOKFILE "NAMES.TBL" LOOKCOLS "1" "25" "30" "50" LOOKUP fullname = name LOOKFILE "PHONE.TBL" LOOKCOLS "" "" "" "40" LOOKUP phonenum = name OUTEND |{name} {fullname} {phonenum} The second LOOKCOLS command uses the same numbers for the first three values that the first LOOKCOLS command used. 130 If you do not specify a LOOKCOLS command, the default values are: Key Field: Starting column = 1 Ending column = 10 Data Field: Starting column = 12 Ending column = 255 This is equivalent to LOOKCOLS "1" "10" "12" "255". -------------------- The LOOKSPEC Command -------------------- FORMAT: LOOKSPEC value1 value2 value3 PURPOSE: The LOOKSPEC command configures the way the next LOOKUP command will work. PARAMETERS: value1 = Trim ("Y" or "N" -- default "Y") value2 = Sorted ("Y" or "N" -- default "N") value3 = Case-sensitive ("Y" or "N" -- default "N") The Trim setting specifies whether or not the data field should have spaces stripped off both ends. The Sorted setting specifies whether or not the look-up file is sorted by the key field. A sorted file is much faster than an unsorted file. This is especially noticeable if you have a large look-up file and a lot of input to process. The Case-sensitive setting specifies whether or not LOOKUP should distin- guish between upper and lower case when searching. The default setting is "N" (No), so that LOOKUP would find "John Smith", even if it appeared in the look-up file as "JOHN SMITH". It is usually safest to set Case- sensitivity to "N", but if you set it to "Y", searching is slightly faster. You can specify a null value to indicate "same as last time". For example: SET name = $FLINE[1 20] LOOKFILE "DATA.TBL" LOOKCOLS "1" "25" "30" "50" LOOKSPEC "Y" "Y" "Y" LOOKUP fullname = name LOOKCOLS "" "" "60" "70" LOOKSPEC "N" "" "" LOOKUP phonenum = name OUTEND |{name} {fullname} {phonenum} The second LOOKSPEC command uses the same settings for Sorted and Case- sensitivity as the first one, but specifies a different Trim setting. 131 ============================================================================ DATA CONVERTERS ============================================================================ -------------------- The MAKEDATA Command -------------------- ** ADVANCED COMMAND FOR EXPERIENCED USERS ** FORMAT: MAKEDATA var1 value1 value2 PURPOSE: MAKEDATA converts text data into a binary format. PARAMETERS: var1 is the variable being set value1 is the text data you want to convert value2 is the predefined data type you want to create NUMERICS: Tabs, spaces and commas are stripped from value1, if it is numeric (as indicated by value2) SEE ALSO: "Predefined Data Types" and "The CalcBits Command" When you are writing to a binary file (using the OUT command), you often need to convert text information to a binary representation. MAKEDATA recognizes many standard data formats (see "Predefined Data Types"). -------------------- Creating Binary Data -------------------- Let us say you have a four-line text file that looks like this: 1234 -456 23 90211 Here is a POM file that reads the numbers from the file, then outputs them in binary format, as 16-bit signed integers: MAKEDATA z $FLINE "INTEGER" <-- Convert the number to an integer OUT |{z} <-- Send the integer to the output file We use OUT instead of OUTEND, since OUTEND would put an end-of-line (Carriage Return, Line Feed) after the data. If the POM file shown in the example was run with the input data shown, it would create an output file containing four integers. In other words, the file would be eight bytes long (four integers of two bytes each). 132 ---------------- Converting Dates ---------------- In some files, a date serial number (see "The ZeroDate Command") might be represented by a numeric format such as INTEGER or LONGINT. To write a date serial number to the output file, you must first convert the date with MAKEDATA, then use MAKEDATA again to convert the resulting number to the appropriate data type. The value1 part of the MAKEDATA must be in a precise format: "YYYY [M]M [D]D" <-- Square brackets indicate optional digits That is to say: 1) A four-digit year 2) A space 3) A one or two digit month (January = 1 or 01, December = 12) 4) A space 5) A one or two digit day of the month (e.g. 1 or 01 or 31) You can assemble the date string from various other data, using the DATE command. Let us say you have a one-line text file that contains the date in Month-Day-Year format: 01-01-2001 You can read this file and output a date serial number as a long integer (LONGINT) with the following POM file: ZERODATE "2000" "1" "1" <-- Set "day zero" SET year = $FLINE[7 10] <-- Get the year SET month = $FLINE[1 2] <-- Get the month SET day = $FLINE[4 5] <-- Get the day of the month DATE x year month day "Y ?n ?d" <-- Set x to "2001 01 01" MAKEDATA y x "DATE" <-- Set y to "366" MAKEDATA z y "LONGINT" <-- Set z to a long integer OUT |{z} <-- Place it in the output file A typical problem with date data is that the year does not include the first two digits (e.g. you have "97" instead of "1997"). In such cases, your POM file has to make a decision as to which century the date belongs to. Here is one way to handle this situation: BEGIN year #>= "50" CALC year = year + "1900" ELSE CALC year = year + "2000" END 133 This works around the problem as follows: Any year between is placed in Examples ---------------- ---------------- -------------- "50" and "99" the 20th century "1950" "1999" "00" and "49" the 21st century "2000" "2049" You have to be careful when choosing the "cut-off date" (1950, in the example above). You should make your decision only after studying your input data carefully. ------------------------ Practical Considerations ------------------------ The examples shown here assume that your input file contains only one kind of data. In most cases, you will use the CHOP command to obtain complete data records of fixed length, then use SET to extract portions thereof. If you are reading a file with variable-length records, you can use CHOP 0 (manual reading) and the GET command. -------------------- The MAKETEXT Command -------------------- ** ADVANCED COMMAND FOR EXPERIENCED USERS ** FORMAT: MAKETEXT var1 value1 value2 PURPOSE: MAKETEXT converts binary data into text format. PARAMETERS: var1 is the variable being set value1 is the data you want to convert value2 is the predefined data type of value1 NOTES: value1 is normally in binary (i.e. it looks like "garbage characters" if you output it to a text file). However, if value2 specifies the DATE data type, value1 must be in text form (e.g. "1234"). The reason for this difference is described in the "Converting Dates" section below. SEE ALSO: "Predefined Data Types" When reading a binary file (using the CHOP command), you often need to convert binary information to a text representation. MAKETEXT recognizes many standard data formats (see "Predefined Data Types"). 134 ---------------------- Converting Binary Data ---------------------- Let us say you have a binary file that contains several WORD values (unsigned integers, each of which is 2 bytes long). You can read and decode them with the following POM file: CHOP 1-2 <-- Read the file two bytes at a time MAKETEXT x $FLINE "WORD" <-- Convert the WORD to text format OUTEND |{x} <-- Output the data to a text file ---------------- Converting Dates ---------------- MAKETEXT can convert a date serial number (see "The ZeroDate Command") to a formatted date. Since there is no standard data format for date serial numbers, you must use MAKETEXT to convert the number into text form, and then use MAKETEXT again to format the date. Let us say you have a binary file that contains dates, represented as LONGINTs (4-byte signed integers). You could convert them to dates with the following POM file: CHOP 1-4 <-- Read 4 bytes at a time ZERODATE "1936" "1" "1" <-- Set the "zero date" MAKETEXT x $FLINE "LONGINT" <-- Convert the binary data to a text number MAKETEXT y x "DATE Y-M-?d" <-- Convert to text date (e.g. "1998-JUL-01") OUTEND |{y} <-- Output the date to a text file ------------------------ Practical Considerations ------------------------ The examples shown here assume that your input file contains only one kind of data. In most cases, you will use the CHOP command to obtain complete data records of fixed length, then use SET to extract portions thereof. If you are reading a file with variable-length records, you can use CHOP 0 (manual reading) and the GET command. 135 ============================================================================ MISCELLANEOUS COMMANDS ============================================================================ ----------------- The ERASE Command ----------------- FORMAT: ERASE value1 PURPOSE: Deletes a file (if it exists). PARAMETERS: value1 is the name of the file to be deleted SEE ALSO: "Long File Names in Win95" Here is an example of the ERASE command: ERASE "C:\XYZ.TXT" This will delete the file C:\XYZ.TXT if it exists. If it does not exist, nothing is done. You can not delete the current input file, output file, trace file or lookup file. If you attempt to do so, Parse-O-Matic will terminate with an error. You can not delete a device (e.g. ERASE "LPT1:"). The ERASE command simply ignores such requests. If value1 is preceded by a "+" character, the plus sign is ignored. See "How Parse-O-Matic Opens an Output File" for an explanation of the significance of the plus sign. 136 -------------------- The FILESIZE Command -------------------- FORMAT: FILESIZE var1 value1 PURPOSE: Determines the size of a file, in bytes PARAMETERS: var1 is the variable being set with the file size value1 is the name of the file SEE ALSO: "Long File Names in Win95" The FILESIZE command can be used to obtain the size of a file, or to see if a file exists. If the file does not exist, FILESIZE sets var1 to a null ("") string. If the file exists, FILESIZE sets var1 to the number of bytes in the file (from 0 to 2,147,483,647 bytes). For those rare files that are larger than 2,147,483,647 bytes, FILESIZE will produce an unpredictable result and may even cause Parse-O-Matic to fail. Nevertheless, it is possible to process such large files -- all of the other commands will work normally. 137 ------------------ The GETENV Command ------------------ FORMAT: GETENV var1 value1 PURPOSE: GETENV obtains a system environment variable PARAMETERS: var1 is the variable being set value1 is the name of the system environment variable NOTES: System environment variables are sometimes referred to as "DOS Environment Variables" or "SET Variables". SEE ALSO: Explanations of the SET & PATH commands in your DOS manual, or "The Environment Area" in your Windows or OS/2 manual. ALTERNATIVES: See "Command-line Parameters" in the "Command-Line Techniques" section of this manual. GETENV enables you to access certain important settings that concern your computer's operating system. To see what settings are available, enter the following command at the DOS prompt: SET This will display the contents of your computer's "environment area". Two of the most important values in the environment area are COMSPEC and PATH. These are briefly described later, but refer to your operating system manual for full details. GETENV removes all spaces, tabs and equals-signs ("=") from value1, converts it to uppercase, then looks it up in the system environment area. - If it finds it, var1 is set to the corresponding value. - If it does not find it, var1 is set to an empty (null) string. ---------------------------------- Disappearing Environment Variables ---------------------------------- Sometimes an environment variable disappears for no apparent reason. There are two likely reasons for this: 1) You ran out of environment space. There is only a limited amount of room in the system environment area (which is located in RAM memory). If you think this is the problem, type your DOS SET command to save a variable into the system environment, then type SET by itself to review the contents of the environment. If your variable does not appear, consult your operating system manual to find out how to expand your environment space. 138 2) It was set by a COPY of the operating system. If you are in Windows and you run DOS, then use the DOS SET command, it will only affect the environment area associated with the copy of DOS that you are running. When you exit this copy and start up another one, it will not contain the variable. You can address this problem by setting the variable in your AUTOEXEC.BAT file, or by running a batch file that sets the variable before running Parse-O-Matic. -------- Examples -------- The following command will determine which directories get searched when you are looking for a program or a file: GETENV path "PATH" To find out the name of your command interpreter (usually COMMAND.COM) and where it is located, try this command: GETENV comspec "COMSPEC" You can use GETENV as a simple "input routine" for Parse-O-Matic applications. For details, see "Controlling a POM File from the Command Line", in the section entitled "Effective Use of Batch Files". --------------- The LOG Command --------------- FORMAT: LOG value1 [comparator] value2 value3 [value4 [value5]] PURPOSE: LOG places a message (value3) in the processing log file (POMLOG.TXT) if the comparison is true. Both value4 and value5 are optional; if they are present, they are added to end of value3. NOTES: The processing log is described in the section "Logging". ALTERNATIVES: The SHOWNOTE command 139 Here is an example of the LOG command: SET emplnumb = $FLINE[ 1 9] SET sales = $FLINE[10 20] TRIM sales "B" " " LOG sales = "0" "WARNING! Zero sales for employee number:" LOG sales = "0" emplnumb This adds two warning lines to the processing log if the sales figures is zero. The logging feature lets you run Parse-O-Matic unattended, then come back later to review (via the processing log) any exceptional conditions. For some additional comments on logging, see "Unattended Operation". The maximum length of a LOG string (value3, plus value4 and value5 if present) is 245 characters. ------------------- The MSGWAIT Command ------------------- FORMAT: MSGWAIT value1 PURPOSE: MSGWAIT controls the amount of time that a processing error message appears on the screen before it times out. (Messages from the HALT command are treated as error messages.) PARAMETERS: value1 is the delay time in seconds NUMERICS: Tabs, spaces and commas are stripped from value1 DEFAULTS: If the MSGWAIT command is not included in the POM file, and an error occurs, Parse-O-Matic will wait until you press a key; the message will not time out. NOTES: If value1 is "0", error messages will not time out. The maximum value for value1 is 60000 (about 16 hours). You can set value1 to "1", but one second is usually too short a delay; a value of "60" (one minute) is better. SEE ALSO: "The Halt Command", "Unattended Operation", "Quiet Mode" The MSGWAIT command lets you control the behavior of error messages that appear during the processing of an input file. This is helpful if you have created POM applications that are run unattended. If Parse-O-Matic was invoked by a batch file or application program, you want may error messages to "time out", allowing Parse-O-Matic to terminate, and processing to continue. 140 ----------------- Standard Behavior ----------------- If Parse-O-Matic encounters an error while reading in a POM file (i.e. during the "compile" step), it displays a message on the screen and waits until you press a key. Parse-O-Matic then terminates. When running the actual POM file (i.e. while processing the input file), Parse-O-Matic will normally behave the same way: if an error occurs (or if a HALT command is encountered), it will display a message on the screen and wait for you to press a key before it terminates. ------------------------ Setting a Time-Out Delay ------------------------ You can use the MSGWAIT command to tell Parse-O-Matic to continue ("time out") after a certain number of seconds. For example: MSGWAIT "60" This tells Parse-O-Matic to wait about 60 seconds if an error is encountered while processing the input file. Parse-O-Matic will then terminate. (The actual delay depends on the type of computer you are using; a delay of "60" will typically last between 55 and 65 seconds). ---------- Color Cues ---------- If you have a color monitor, you can tell if a message will "time out" by the color of the "Press a key to continue" prompt: - If it is magenta (sometimes called "purple") it will NOT time out - If it is blue, if WILL time out ------------ Key Stacking ------------ To ensure that an error message is not inadvertently bypassed, "stacked" keystrokes are ignored by Parse-O-Matic. That is to say, if you press several keys before an error message is displayed, Parse-O-Matic gets rid of them before displaying the message. 141 ---------- Exceptions ---------- If Parse-O-Matic is processing an empty input file, it will display the warning "Input file is empty", then continue processing the POM file when you press a key, or after a delay of about 60 seconds. The MSGWAIT command does not affect messages that report errors detected during the compilation (initial read-in) of the POM file. The MSGWAIT command does not affect the "Retry or Cancel" message that appears if you are dealing with a device (see "Sending Output to a Device"). ----------------- A Word of Caution ----------------- A POM file should be thoroughly tested before setting the MSGWAIT time to a value other than "0". Most error messages are serious enough to justify waiting until the user acknowledges them. If you call Parse-O-Matic from a batch file or application program, you can check the success of the parsing job by checking the return code. (See "Effective Use of Batch Files" and "Running Parse-O-Matic from Another Program"). If there was a processing error and you did not check the parsing job (either by testing the program return code, or by consulting the processing log), the resulting oversight could be serious. ----------------- The PAUSE Command ----------------- FORMAT: PAUSE value1 PURPOSE: Delays the specified number of milliseconds PARAMETERS: value1 is the delay time (between 1 and 65500) NUMERICS: Tabs, spaces and commas are stripped from value1 NOTES: 1 millisecond = One thousandth of a second 100 milliseconds = One tenth of a second 1000 milliseconds = One second 60000 milliseconds = One minute 142 Here are some typical applications of the PAUSE command: - Slow down Parse-O-Matic so you can watch the processing screen - Give a slow laser printer extra time to eject a page after an OUTPAGE - Give you time to remove a page from a dot-matrix printer after an OUTPAGE - Give a communications device time to complete its current operation Here is an example of the latter application: OFILE "COM1:" <-- Direct output to the modem on COM1 OUTEND |ATZ <-- Send a modem initialization command PAUSE "1000" <-- Wait one second for the command to complete OUTEND |ATDT555-1234 <-- Send a dialing command to the modem If your PAUSE command is 200 milliseconds or longer, Parse-O-Matic displays a "PAUSED" message in the lower right corner of the processing screen. While this appears, you can press any key to end the pause. (We recommend that you use the spacebar -- and avoid the Esc key. Parse-O-Matic processing will be terminated if the PAUSE happens to end at the precise moment your finger is coming down on the Esc key!) ------------------ The RANDOM Command ------------------ FORMAT: RANDOM var1 value1 value2 PURPOSE: Generates a random integer number PARAMETERS: var1 is the variable being set value1 is the minimum number allowed (between 0 and 65534) value2 is the maximum number allowed (between 0 and 65534) NUMERICS: Tabs, spaces and commas are stripped from value1 The RANDOM command generates a random number in the range specified by value1 and value2. For example, to simulate the rolling of a six-sided die: RANDOM roll "1" "6" The first time you call this, it might set the roll variable to "3". The next time, it might set it to "1". The roll variable will never be lower than "1" or higher than "6". 143 The minimum value must always be less than or equal to the maximum value. Thus, the following statement will cause an error: RANDOM x "100" "1" <-- This is incorrect If value1 and value2 are the same, RANDOM will always generate the same number. SET n = "50" RANDOM z n n This will always set the z variable to 50. -------------------- The SHOWNOTE Command -------------------- FORMAT: SHOWNOTE value1 [value2] [value3] [value4] [value5] PURPOSE: Displays a message on the processing screen PARAMETERS: value1 to value5 are values that will be displayed ALTERNATIVES: The LOG command When you are doing especially intense parsing, it is nice to see some kind of indication of what is happening. The SHOWNOTE command displays a message up to 40 characters long in the lower left corner of the processing screen. For example: SHOWNOTE "Reading employee data" This will display the message "Reading employee data" until the parsing job ends, or until another SHOWNOTE command is encountered. You can remove a note from the processing screen by providing a null value: SHOWNOTE "" This clears a note that is already on the processing screen. You can display up to 5 values in a SHOWNOTE message. Each value will be separated from the others by a space. For example: SHOWNOTE "Employee:" empnum " -- Last name:" lastname This would display a note like this: Employee: 314159 -- Last name: Smith 144 Bear in mind that you have only 40 characters for the note. If the lastname variable in the preceding example was "Von Neumann", the display would be truncated to forty characters, and display the note as follows: Employee: 314159 -- Last name: Von Neuma ----------- Other Notes ----------- There are three kinds of notes that are displayed in the lower right corner of the processing screen: TYPE OF NOTE COLOR TEXT REASON ------------- ---------- -------------------- -------------------- SHOWNOTE Light Grey Whatever you specify The SHOWNOTE command PAUSE Light Red PAUSED The PAUSE command Error message Red TRYING TO PRINT Printer is offline In most cases, these messages will not interfere with each other. ------------ Slowing Down ------------ If you are running Parse-O-Matic on a fast machine, your SHOWNOTE messages blaze by so fast that there are not particularly helpful. Here is a POM file that demonstrates one solution: SET line_counter = line_counter+ SET note_control = note_control+ BEGIN note_control = "100" SET note_control = "0" SHOWNOTE "Processing line #" line_counter END OUTEND |{$FLINE} This will perform the SHOWNOTE command on every hundredth input line. ----------------- The SOUND Command ----------------- FORMAT: SOUND value PURPOSE: The SOUND command performs two functions: 1) It makes a noise, or ... 2) It sets the noise made when an error occurs 145 The SOUND command has a repertoire of nine distinctive noises: BEEP BIP BUZZ EDGE ERROR HUH PIP TRILL WHOOP These sounds are useful for alerting you to unusual situations. Let's say you wanted to be warned if one of the fields in a file comes up blank. You could write the code this way: BEGIN lastname = "" SOUND "WHOOP" SET lastname = "?" END Case is not important; the following commands are all equivalent: SOUND "WHOOP" SOUND "Whoop" SOUND "whoop" ------------------ The LISTEN Utility ------------------ You can listen to any given sound by using the LISTEN command at the DOS prompt. To hear what TRILL sounds like, enter this command: LISTEN trill By default, Parse-O-Matic error messages will alert you by playing the ERROR sound. To hear this sound, enter the following command at the DOS prompt: LISTEN error -------------------------------- Changing the Error Message Sound -------------------------------- If you find the error message sound noise annoying, you can replace it with one of the other sounds by using the special ERRMSG specification of the SOUND command. For example, to replace the ERROR sound with the BUZZ sound, place this line at the top of your POM file: SOUND "ERRMSG BUZZ" If you don't want any sound made when an error occurs, use this command: SOUND "ERRMSG QUIET" The ERRMSG specification will only affect errors generated during the actual running of the POM file. If an error is encountered while Parse-O-Matic is compiling the POM file, it will use the ERROR sound when it reports the problem. 146 ----------------- The TRACE Command ----------------- FORMAT: TRACE var1 PURPOSE: The TRACE command is an alternative to standard tracing (see "Tracing", in the "Programming Techniques" section). PARAMETERS: var1 is the variable being traced. When you include a TRACE command in your POM file, Parse-O-Matic will create a text file, named POM.TRC, and use it to keep a detailed record of POM's processing. Here is an example of the TRACE command: TRACE PRICE This traces the variable named "PRICE". After processing, the file POM.TRC will show everything that happened, and give the value of PRICE at the TRACE line. NOTE: Since trace files are so detailed, they can be very large. If you are trying to debug a POM file using TRACE, it is a good idea to use a small input file. 147 ============================================================================ TERMS ============================================================================ ------ Values ------ A value can be specified in the following ways: FORMAT OR SAMPLE DESCRIPTION ------------------ ----------- "text" A literal text string #number An ASCII character, in decimal (e.g. #32 = Space) #number#number... Several ASCII characters (e.g. #32#32 = 2 Spaces) $xx A byte, in hexadecimal (e.g. $2F = decimal 47) $xx$xx... Several hex bytes ($ff$ff = binary 1111111111111111) VARNAME The name of a variable VARNAME[start end] A substring of a variable VARNAME[start] A single character VARNAME+ Inline incremented variable (explained below) VARNAME- Inline decremented variable (explained below) (x,@y,z) Deduced variable name (see "Deduced Variables") x,y Explictly specified deduced variable --------- Variables --------- A variable is a named spot in your computer's memory that holds some data. Variable names can be up to 12 characters long. There is no distinction between upper and lower case in the variable name. A POM file can contain about 2000 variables and literals. In general, variable names should: - Use letters, numbers and the underscore character - Start with a letter It is possible to create variables that do not adhere to these rules, but this may cause you problems later on (particularly in subsequent releases of Parse-O-Matic which may use special characters to denote particular kinds of variables, or operations performed thereon). The # character is used to specify a literal text string of one or more characters. Follow each # with the decimal value of the ASCII character you want. Here are some useful values: #10 = Line Feed #12 = Form Feed #13 = Carriage Return 148 ----------------- Predefined Values ----------------- Parse-O-Matic predefines several variables. These are: VARIABLE DESCRIPTION ------------ ------------------------------------------------------------ $FLINE The line just read from the file (max. length 255 chars) (*) $FLUPC The line just read from the file, in uppercase (*) $LASTFLINE The previous value of $FLINE (*) $SPLIT The CHOP or SPLIT number you are currently processing (*) $LINECOUNTER The page line to which output will go next (*) $COMMAND The current POM command line (see "POM and Wildcards") $BRL The { character (used in OUT and OUTEND) $BRR The } character (used in OUT and OUTEND) $TAB The tab character (Hex $09; ASCII 09) (*) This is discussed in more detail, below. Although these predefined variables start with a dollar sign ($), it does not mean they are in some way "hexadecimal" (as in the case of the hex values mentioned earlier). In this case, the $ character indicates that the variables are defined by Parse-O-Matic. In general, you should avoid creating variables that start with anything but a letter. ------ $FLINE ------ CONTAINS: The line just read from the input file SEE ALSO: The READNEXT, GET and GETTEXT commands $FLINE contains the data just read from the input file (exception: see "DBF Files"). The data can be intelligible (human-readable) text, or it can be binary information -- Parse-O-Matic does not care what $FLINE contains. The maximum length of $FLINE is 255 bytes. You can use the SETLEN command to determine its actual length. 149 ------ $FLUPC ------ CONTAINS: The line just read from the input file, in uppercase SEE ALSO: The PROPER and CVTCASE commands $FLUPC is the same as the $FLINE variable, but in uppercase. Any changes to $FLINE will affect $FLUPC, since they are simply different "views" of the same data. $FLUPC is useful for checking for text inside the current input line if case is not important, and for obtaining the uppercase version of the data. For example: BEGIN $FLUPC ^ "PART#" <-- Detect "Part#", "part#" or "PART#" SET partnum = $FLUPC[10 15] <-- Extract the part number, in uppercase SET partname = $FLINE[16 30] <-- Extract the part name, as-is END Since $FLUPC is really just a different view of $FLINE, a modification of $FLUPC (for example, using the PEEL command) is transient and usually pointless; the next reference to $FLUPC will simply return the uppercase version of $FLINE. ---------- $LASTFLINE ---------- CONTAINS: The previous value of $FLINE SEE ALSO: The discussion of $FLINE, above When parsing data, we frequently need to refer back to the input line we just saw. This information is contained in the $LASTFLINE variable. The $LASTFLINE variable is affected by READNEXT, since that updates $FLINE. DBase files do not set $LASTFLINE, since they do not use $FLINE. At the start of parsing, $LASTFLINE is set to null (""). 150 ------ $SPLIT ------ CONTAINS: The CHOP or SPLIT number you are currently processing SEE ALSO: The GET and GETTEXT commands Since $FLINE has a maximum length of 255 characters, you will have to use the SPLIT or CHOP command if your input file is wider than that. NOTE: Use SPLIT for text files Use CHOP for fixed-record-length files The $SPLIT variable reports which segment of a CHOPped or SPLIT input line you are currently processing. For example, if you use this command... CHOP 1 255, 256 380 then $SPLIT will be set to "1" when Parse-O-Matic processes columns 1 to 255, and it will be set to "2" when processing columns 256 to 380. ------------ $LINECOUNTER ------------ CONTAINS: The page line to which output will go next SEE ALSO: "How Parse-O-Matic Opens an Output File" The $LINECOUNTER variable lets you know the number of the next output line. Consider this POM file: OUTEND |LINE# {$LINECOUNTER} OUTEND |LINE# {$LINECOUNTER} OUTEND |LINE# {$LINECOUNTER} DONE This will produce the following output: LINE# 1 LINE# 2 LINE# 3 The OUTPAGE command will reset the $LINECOUNTER to "1", since the next output line will be at the top of the page. OUTHDG commands also affect the $LINECOUNTER variable. The $LINECOUNTER variable is active even if you have not used the PAGELEN command to set the page length. By default, the page length is "0" (meaning "infinite"), which means that $LINECOUNTER will always be one greater than the total number of output lines. However, if you specify an explicit page length (e.g. PAGELEN "55"), $LINECOUNTER is reset to "1" after each page eject. 151 Using the SET command on $LINECOUNTER has no effect. In some special cases, however, you may wish to reset $LINECOUNTER to "1". You can do this -- even though you are not at the top of the page -- with the PAGELEN "0" command. (You can then follow this with another PAGELEN command if you want to restore the actual page length.) Here is a sample POM file that demonstrates why you might want to reset the $LINECOUNTER variable. It will format a text file which contains a form feed character (ASCII 12) at the end of each page, while placing page numbers at the bottom of each page as a "running footing": PROLOGUE <-- Start of prologue MINLEN "0" <-- Allow null lines SET pagenum = "1" <-- Set first page number END <-- End of prologue BEGIN $FLINE ^ #12 <-- Does this line have a form feed? BEGIN $LINECOUNTER #<= "54" <-+ OUTEND | | Put blank lines to the end of the page AGAIN <-+ APPEND x pagenum #12 <-- Set up the footer (including form feed) PAD x "L" " " "80" <-- Format the footer OUTEND |{x} <-- Output the footer SET pagenum = pagenum+ <-- Increment the page number PAGELEN "0" <-- Reset $LINECOUNTER to "1" ELSE OUTEND | {$FLINE} <-- Output a regular line END Note that as far as Parse-O-Matic is concerned, the "actual" output page length is "0" (i.e. "infinite"), but this POM file will output a text file that contains 55 lines per page. (A POM file similar to the one shown is used, with several others, to format this manual.) ------------------------------------------- Running Out of Variables, Literals or Lines ------------------------------------------- A POM file can contain 750 lines (not counting comment lines), and about 2000 variables and literals. If you have a very large POM file, this may not be enough, and you will be forced to do the processing with two or more POM files. You can link these together with a batch file, as follows: @ECHO OFF POM POMFILE1.POM INPUT.TXT TEMP.TXT IF ERRORLEVEL 1 GOTO QUIT POM POMFILE2.POM TEMP.TXT FINAL.TXT :QUIT 152 ---------- Delimiters ---------- If you need to specify a quotation mark, use "". For example: IGNORE $FLINE = "He said ""Hello"" to me." This ignores any line containing: He said "Hello" to me. ------------------ Illegal Characters ------------------ No POM command can contain these ASCII characters: HEX DECIMAL NAME ------- ------- -------------------- $00 #00 NULL $0A #10 LF (Linefeed) $0D #13 CR (Carriage Return) Of course, LF and CR do appear at the end of each line in the POM file, since a POM file is a text file. If you need to use one of the illegal characters in a POM command, use either the $ or # character to denote hex or decimal literals (e.g. SET linefeed = $0A). ----------------- Using Comparators ----------------- Several POM command decide what to do by comparing two values. For example: IF $FLINE[1 3] = "XYZ" THEN x = "3" ELSE "4" In this example, if the first three characters of $FLINE are "XYZ", the variable x is set to "3", otherwise it is set to "4". The first equals sign ("=") is a "comparator", because it defines how two values will be compared. The second equals sign is not a comparator; it is simply padding, which makes the line easier to understand (see the section "Padding for Clarity" for details). Parse-O-Matic supports several types of comparators: TYPE OF COMPARATOR WHAT IT DOES ---------- ------------------------- Literal Compares values character by character Numerical Compares the arithmetic values of real or integer numbers Length Compares the length of one value with a number 153 Whenever a comparator is required, but is omitted, it is assumed to be "literally identical". Thus, the following lines are equivalent: IF x y z "3" "4" (This is very terse, but it works) IF x y THEN z = "3" ELSE "4" (The "equals" comparator is omitted) IF x = y THEN z = "3" ELSE "4" (This is a lot easier to read) ------------------- Literal Comparators ------------------- Here is a list of literal comparators: COMPARATOR MEANING COMMENTS ---------- -------------------- ----------- = Identical <> Not identical > Higher See NOTE #1 >= Higher, or identical See NOTE #1 < Lower See NOTE #1 <= Lower, or identical See NOTE #1 ^ Contains ~ Does not contain SAMEAS Basically the same See NOTE #2 LONGER Length is longer SHORTER Length is shorter SAMELEN Length is the same NOTE #1: Depends on PC-ASCII sort order. Refer to the section "Literal Comparisons and Sort Order" for details. NOTE #2: The two variables are considered the same if they contain the same text, regardless of upper or lower case, and any surrounding whitespace. Thus " CHESHIRE CAT " is the same as "Chesire Cat". With some restrictions (discussed later), literal comparators work on numeric and alphabetic data. Here are some examples of literal comparisons that are "true": "ABC" <> "ABCD" "3" <> "4" "ABC" <= "ABCD" "3" <= "4" "ABC" < "ABCD" "3" < "4" "ABC" SHORTER "ABCD" "3" SAMELEN "4" "ABC" >= "ABC" "ABC" <> "CDE" "ABC" <= "ABC" "ABC" <= "CDE" "ABC" = "ABC" "ABC" < "CDE" "ABC" ^ "ABC" "ABC" SAMELEN "CDE" "ABC" SAMELEN "ABC" "ABC" ~ "CDE" 154 --------------------- Numerical Comparators --------------------- Here is a list of numerical comparators: COMPARATOR MEANING ---------- -------------------- #= Equal #<> Not equal #> Greater #>= Greater, or equal #< Less than #<= Less than, or equal ------------------ Length Comparators ------------------ Here is a list of length comparators: COMPARATOR MEANING ---------- -------------------- LEN= Equal LEN<> Not equal LEN> Greater LEN>= Greater, or equal LEN< Less than LEN<= Less than, or equal The length of the value on the left side of the comparator is compared with a number on the right side of the comparator. For example: IF $FLINE LEN= "0" THEN NullLine = "YES" ELSE "NO" ---------------------------------- Literal Comparisons and Sort Order ---------------------------------- Some of the literal comparators compare text according to "PC-ASCII sort order". For plain English text, this works fine. However, if your text contains diacritical (accented) characters, you should be aware that some comparisons will not work correctly. For example, the "A-Umlaut" character appears in the PC-ASCII character set AFTER the PC-ASCII value for "Z". 155 ------------------- Numeric Comparisons ------------------- Some confusion can arise if you use literal comparators on numbers. For example, this doesn't work as you might expect at first glance: SET count = count+ BEGIN count >= "2" OUTEND x = x |{count} END You might expect this POM file to output any number greater than or equal to "2", but in fact, you will get a different result, because the comparison is a literal (text) comparison. In the example above, "2" to "9" are greater or equal to "2", but "10" (which starts with "1") is less, as is evident when you sort several numbers alphabetically: 1 10 11 15 100 2 20 200 3 30 As you can see, the values 1, 10, 11 and 15 come before "2" when sorted alphabetically. To compare numbers, you should use the numeric comparators. The correct way to code the previous example is as follows: SET count = count+ BEGIN Count #>= "2" <-- Note the #>= comparator OUTEND x = x |{count} END Written in this way, numbers greater than or equal to two will be output. 156 Here are some examples of numeric comparisons that are "true": "345" #<> "567" "1.23" #<> "9.87" "345" #<= "567" "1.23" #<= "9.87" "567" #> "345" "9.87" #> "1.23" "3" #< "6.2" The last example compares an integer ("3") with a real number ("6.2"). The numeric comparators automatically check if one of the numbers contains a decimal point. In such case, the comparison is performed in "real number" mode, which imposes the accuracy restrictions described in the section "The CalcReal Command". This might create a problem if you are comparing a decimal number with a large integer, but this is rarely a cause for worry, since most parsing jobs tend to compare similar types of numbers. ------------------------------- Upgrading from Earlier Versions ------------------------------- IF YOU USED PARSE-O-MATIC PRIOR TO VERSION 3.00: Because the comparator defaults to "literally identical" if it is omitted, POM files created before version 3.00 will continue to function normally -- with two notable exceptions. In older versions, the IGNORE and ACCEPT commands defaulted to "contains". If you have POM files that were created for older versions, you should check your IGNORE and ACCEPT commands to ensure that they are doing what you want them to. --------------------- Predefined Data Types --------------------- For certain commands (e.g. MAKEDATA, MAKETEXT, GET and GETTEXT), Parse-O-Matic has internal definitions of certain data representations; these are known as Parse-O-Matic's "predefined data types": DATA TYPE BYTES MINIMUM VALUE MAXIMUM VALUE COMMENTS --------- ----- ------------- ------------- ----------- BYTE 1 0 255 INTEGER 2 -32768 32767 LONGINT 4 -2147483648 2147483647 REAL 6 -9999999999.9 9999999999.9 See NOTE #1 SHORTINT 1 -128 127 WORD 2 0 65535 DATE - - - See NOTE #2 TRIMMED - 0 chars 255 chars See NOTE #3 NOTE #1: The minimum and maximum values depend on the number of digits of precision. See "The CalcReal Command" for details. 157 NOTE #2: The DATE type does not have a specific length. In some input files, a date serial number might be represented by a numeric format such as INTEGER or LONGINT. For more information, see the discussions of the MAKETEXT, MAKEDATA and GETTEXT commands. NOTE #3: The TRIMMED type does not have a specific length. You can use it with MAKETEXT and GETTEXT commands to remove the spaces, tabs and nulls on either side of a string. It can also be used with MAKEDATA, but since this can produce a field of indeterminate length, it is rarely useful in such a role. Certain predefined data types can have a qualifier, which provides additional information. All commands that use predefined data types will accept the qualifier, but only the MAKETEXT command makes use of it. DATA TYPE QUALIFIER DESCRIPTION EXAMPLES --------- ------------------------ ------------------------------------- REAL Number of decimal places "REAL 2" -> 3.14 "REAL 4" -> 3.1415 DATE Date format "DATE ?y/?n/?d" -> "96/12/01" ----------------------------------- Interpreting Data Formats in a File ----------------------------------- When inspecting a hex dump of a binary file, bear in mind that on PC-compatible computers, the bytes that comprise a number are often reversed. For example, for the INTEGER and WORD data types, the eight most significant bits of numeric values are usually placed AFTER the eight least significant bits. Thus, the decimal value 5099 will appear as EB 13 in the file, not 13 EB, despite the fact that decimal 5099 equals hex 13EB. If you are dealing with data that treats numbers differently, you can sometimes work around the problem by reversing the order of the bytes before performing the conversion. For example, if the file contains a WORD data type, but has the most significant byte FIRST, you can switch things around, as demonstrated by this POM file: CHOP 0 <-- Read the file manually GET x "WORD" <-- Get two bytes from the file APPEND y = x[2] x[1] <-- Flip the bytes around MAKETEXT z y "WORD" <-- Convert the number OUTEND |{z} <-- Output the result 158 ============================================================================ DEDUCED VARIABLES ============================================================================ ----------------- Deduced Variables ----------------- ---------- Definition ---------- "Deduced Variable Names" (generally referred to as "Deduced Variables") provide you with powerful means for organizing and processing data. This particular approach to handling variable data is not available in most high-level languages (e.g. C, Pascal, Basic) but is extremely helpful for a content-oriented language such as Parse-O-Matic. A deduced variable is a variable for which the actual name is not known when the program is being written. The name itself is deduced when the actual POM line is run. It is almost as if you could write a line like this: SET whatever-variable-I-should-be-using-now = "3" There are several rules by which the name is deduced. By creating similar circumstances later on in the program, you can deduce the same variable name in another spot. Deduced variables are always surrounded by parentheses. Here are some sample deduced variable names: SAMPLE PARTS* DEDUCTIONS EQUIVALENT VARIABLE NAME ---------- ------ ---------- ------------------------ (X) 1 None X (X,Y) 2 None X,Y (X,Y,Z) 3 None X,Y,Z (X,@Y) 2 1 X,something (X,@Y,@Z) 3 2 X,something,something (@X,YYY,@Z) 3 2 something,YYY,something (*) A deduced variable has a maximum of three parts. As you can see: - When part of the deduced variable is NOT preceded by the "@" character, it is taken "as-is". - When part of the deduced variable **IS** preceded by the "@" character, it must be looked up. 159 ------------------- The Look-Up Process ------------------- The look-up process is simple: Parse-O-Matic looks up the variable following the "@" character and inserts it into the deduced variable name. For example: SET Y = "BBB" SET Z = "CCC" SET (AAA,@Y,@Z) = "Quick Brown Fox" SET line = (AAA,BBB,CCC) OUTEND |{line} ------------ Restrictions ------------ There are certain limitations that affect the use of deduced variables. - You can not use them in the output picture of an OUT or OUTEND command. For example, the following line will result in an error: OUTEND |We have {(PRODSTOCK,@PRODNUM} {(PRODNAMES,@PRODNUM)} in stock. The correct way to do this is as follows: SET prodstock = (PRODSTOCK,@PRODNUM) SET prodname = (PRODNAMES,@PRODNUM) OUTEND |We have {prodstock} {prodname} in stock. - The length of the deduced variable name can not exceed 12 characters. The following lines will result in an error: SET Part2 = "YYYYYYYYYY" SET Part3 = "ZZZZZZZZZZ" SET (XX,@Part2,@Part3) = "99" This causes an error because Parse-O-Matic tries to create a variable named XX,YYYYYYYYYY,ZZZZZZZZZZ -- which is 24 characters long (including the commas). The maximum length for a Parse-O-Matic variable (of any kind) is 12 characters. The following lines WOULD work: SET Part2 = "YYYY" SET Part3 = "ZZZZ" SET (XX,@Part2,@Part3) = "99" This would set the variable XX,YYYY,ZZZZ (which is 12 characters long, counting the commas) to "99". 160 - Deduced variables can not be used for variable CALL commands. The following line would not work: CALL (ROUTINE,@RTYPE) You would have to do this as follows: SET CallTemp = (ROUTINE,@RTYPE) CALL CallTemp ---------------- Usage Guidelines ---------------- If you know the precise name of the deduced variable you are setting, you can specify it directly, without the parentheses. Thus, the following two commands are the same: SET (CUST,NAME,1) = "Fred Jones" SET CUST,NAME,1 = "Fred Jones" In other words, if a deduced variable name does not include a look-up (i.e. the "@" character), it is not actually being deduced. If you use substrings with deduced variables, you must specify the indexes immediately after the closing parenthesis. Thus: (Data,@Y,@Z)[10 80] <-- This is okay (Data,@Y,@Z) [10 80] <-- This is wrong ( Data, @Y, @Z )[ 10 80 ] <-- This is okay The third example shows that you can use spaces within the parentheses and the brackets to improve readability. Variables are "persistent" (see "Uninitialized and Persistent Variables"). This can be a problem if you are creating variables (particularly arrays) while processing multiple input files: all variables created for the previous input file are still around. For this reason, you may find it necessary to create an initializing section (in the PROLOGUE) which ensures that all working variables are "set to zero", so to speak. 161 --------------- Array Variables --------------- NOTE: For background information, please see "Deduced Variables - Definition". Array variables are deduced variables that use something distinctive about each input item (e.g. a product number) to allow data for similar input items to be treated in the same way. Here is an example of array handling: ------------------------- Example: Product Numbers ------------------------- A common requirement in data processing is to assign a number to items that are different instances of the same thing. For example, let's say your input file contained a summary of the number of items sold at your store at various times of the day. You could assign a "Product Number" to each item, as in the following four-line input file: 10:33 AM 101 Oranges 15 11:04 AM 102 Lemons 4 03:15 PM 104 Bananas 20 04:41 PM 102 Lemons 25 : : : : : : : : Time Product Product Quantity Sold Code Name Sold You could add up the total sales for each item, as follows: SET ProdCode = $FLINE[11 13] SET Quantity = $FLINE[27 28] SET (Name,@ProdCode) = $FLINE[16 22] CALC (Sold,@ProdCode) = (Sold,@ProdCode) "+" Quantity The first three lines obtain the product code (e.g. 101 for Oranges), the quantity sold, and the product name. The fourth line adds the quantity sold to the appropriate product. Thus, after all four input lines have been processed, you will have created six deduced variables: VARIABLE VALUE --------------- ----- Name,101 "Oranges" Name,102 "Lemons " Name,104 "Bananas" Sold,101 "15" Sold,102 "29" <-- "29" is the total of 2nd and 4th input lines Sold,104 "20" 162 There are no variables created for product numbers that did not appear in the input file, so if (for example) you test the value of Sold,103 it will be reported as a null (empty) value. That is because it was never assigned a value. Now that you have added up the totals, you can output them in the EPILOGUE section of the POM file. (The EPILOGUE is the last section of POM code run, after all the input lines have been read.) EPILOGUE SET MaxProdCode = "104" SET ProdCode = "100" BEGIN IF (Sold,@ProdCode) <> "" THEN NumSold = (Sold,@ProdCode) ELSE "NONE" PAD (Name,@ProdCode) "R" " " "7" SET ProdName = (Name,@ProdCode) OUTEND |Product # {ProdCode} ({ProdName}): Sold {NumSold} SET ProdCode = ProdCode+ AGAIN ProdCode #<= MaxProdCode END The output from the POM file will look like this: Product # 100 ( ): Sold NONE Product # 101 (Oranges): Sold 15 Product # 102 (Lemons ): Sold 29 Product # 103 ( ): Sold NONE Product # 104 (Bananas): Sold 20 There is no data for products 100 and 103, since they did not appear in the input file, so they do not show any data in the output. ----------------------- Multidimensional Arrays ----------------------- The example given above used "two dimensional arrays", which is to say that each array has two parts (e.g. Sold,101 has the parts Sold and 101). Since deduced variables can have up to three parts, you can create arrays of up to three dimensions. 163 ------------------- Eponymous Variables ------------------- NOTE: For background information, please see "Deduced Variables - Definition". Sometimes you do not have a convenient number (such as a product number) to organize your arrays. In such cases, you can use the input item itself as the name of the variable. This technique is known as "eponymous variables". (Eponymous means "self-naming".) Let us say your store sells only two items -- dogs and cats -- and you do not have a product number for each. Your input file may look like this: 10:00 Cat 1 10:30 Dog 2 11:00 Cat 10 11:30 Cat 1 12:00 Dog 1 You can add up the items with the following POM file: SET type = $FLINE[ 8 10] SET quant = $FLINE[14 15] CALC (@type) = (@type) "+" quant After all of the input lines are read, you will have created two eponymous variables, as follows: VARIABLE VALUE --------------- ----- Cat "12" <-- Total of input lines 1, 3 and 4 Dog "3" <-- Total of input lines 2 and 5 The totals can then be output in the EPILOGUE, using the following POM code: EPILOGUE OUTEND |Cats sold: {Cat} OUTEND |Dogs sold: {Dog} END If you decide to add cows to your product line-up, you need only add a single line to your EPILOGUE: OUTEND |Cows sold: {Cow} Eponymous techniques can be more complicated than the example shown. For example, you could create sub-categories such as Dog,Beagle,Large. 164 ------------------------ Drawbacks and Advantages ------------------------ The main drawback of eponymous variables is that the POM file must "know" all the possible variable names in order to output them. If you have a small number of items (e.g. Cat, Dog, Cow) this is not a serious problem. If you DO have a large number of items, you can build an array -- "on the fly" as it were -- that adds each new eponymous variable name as it is created. You could then loop through this array, during output, to obtain the names. This may sound complex, but it is enormously powerful. It means that you can create a parsing application that can handle input files containing items whose names are unknown to you. A properly constructed POM file would handle new items without any modifications to the POM code. 165 ============================================================================ VALUE TECHNIQUES ============================================================================ This section describes useful techniques you can use when dealing with variables and literals. -------------------------------------- Uninitialized and Persistent Variables -------------------------------------- Even before a variable is assigned a value (using the SET command, for example), you can use it in a POM command. An uninitialized variable has a null value ("") and is treated normally by all commands. EXCEPTION: To help you catch coding errors, the OUT and OUTEND commands do not allow you to output an uninitialized variable. If you attempt to do so, Parse-O-Matic issues a warning, and processing is terminated. Variables are "persistent": once you have assigned a value to a variable, it retains that value until it is changed. Even if you open a new input file (see "POM and Wildcards") or a new output file (see "The OFile Command"), all variables will retain their values; they will not be "reset" back to null. (Of course, when Parse-O-Matic ends, all variables disappear; they are not retained between separate runs of POM.) ------- Example ------- Here is an example which illustrates why persistent variables are useful: PAGELEN "55" <-- Set page length SET partnum = $FLINE[ 1 10] <-- Extract the part number SET descrip = $FLINE[12 60] <-- Extract the description BEGIN lastpart <> partnum <-- Is this a new part number? OUTPAGE <-- Generate a page eject OUTHDG |PartNumber Description <-- Output a heading OUTHDG |---------- ----------- <-- Output a heading SET lastpart = partnum <-- Remember the current part number END <-- End of BEGIN block OUTEND |{partnum} {descrip} <-- Output the part number The first time a line is read from the input file, the lastpart variable will be null ("") because it has not yet been initialized. As a result, the BEGIN block will be executed. (The OUTPAGE command will be ignored in this first instance, since no data has been sent to the output file.) The BEGIN block also sets the lastpart variable, which will retain that value until it is changed. 166 When the second input line is read (and the POM code is run again from the top), the BEGIN block will be run only if the current part number is different from the previous one (which we saved in the lastpart variable). However, if the partnum variable is different, the BEGIN block will be run, outputting the page eject and headings, and once again saving the partnum in the lastpart variable, so we can check it during the third input line -- and so on. ------------------------------------ Inline Incrementing and Decrementing ------------------------------------ You can add "1" to a variable in a command. For example: SET x = "3" SET x = x+ After the second statement, x would have the value "4". Here are some additional examples: - Incrementing "1" gives you "2" - Incrementing "9" gives you "10" - Incrementing "99" gives you "100" The first time a variable is referenced, it has a null value (unless you SET it yourself). If you increment a null variable, it will be changed from "" (i.e. null) to "1". You can also subtract "1" from a variable in a command: SET x = "3" SET x = x- After the second statement, x would have the value "2". Here are some additional examples: - Decrementing "0" gives you "-1" - Decrementing "1" gives you "0" - Decrementing "99" gives you "98" When you do an inline increment or decrement, the variable itself is not changed. (C programmers take note!) For example: SET y = "3" SET x = y- After the second line, the x variable will equal "2", while the y variable will still equal "3". 167 You can use inline incrementing or decrementing with substrings: SET y = "X23X" SET x = y[2 3]+ After the second line, the x variable will equal "24", while the y variable will still equal "X23X". Only integer numeric values can be incremented or decremented. If you attempt to increment or decrement another type of variable (e.g. text or a decimal number), Parse-O-Matic will halt, and report an error. ------------- Line Counters ------------- If your input record is divided over several lines (due to its original format or perhaps because you used the SPLIT or CHOP command), it is helpful to set up a line counter. The following example extracts the first six characters of the second line of input records that span three lines (designated lines 0, 1 & 2): IF LineCntr = "1" THEN MyField = $FLINE[1 6] OUTEND LineCntr = "1" |{MyField} IF LineCntr = "2" THEN LineCntr = "" ELSE LineCntr+ For an alternative to line counters, see "The ReadNext Command". ------------------- The SHOWNUM Utility ------------------- The ShowNum program (SHOWNUM.EXE in the standard Parse-O-Matic package) is a small utility which converts a hex number to decimal and vice-versa. It can also convert a character to its ASCII equivalent value in decimal or hex. ShowNum is helpful when you are working on a task that forces you to deal with data in hex or decimal form. It is a handy tool if you are working with binary input or output files. You can probably find fancier conversion-reference utilities than SHOWNUM, but this one comes ready-to-go with Parse-O-Matic -- and it is free. You may give unaltered copies of SHOWNUM to anybody without paying royalties. ---------------------- Quick Reference Screen ---------------------- To display a quick-reference help screen, enter the following command at the DOS prompt: SHOWNUM /? 168 If you are calling SHOWNUM from Windows, you can add the /P (Pause) option to make it wait for a keypress before terminating: SHOWNUM /? /P ------------------- Converting a Number ------------------- To find out what the decimal number 123 is in hexadecimal, enter the following command at the DOS prompt: SHOWNUM #123 The # character tells ShowNum that the number is in decimal. The program will display: #123 = $7B To find out what hex 400F is in decimal, enter the following command at the DOS prompt: SHOWNUM $400F The $ character tells ShowNum that the number is in hexadecimal. The program will display: $400F = #16399 ShowNum can handle numbers between -2,147,483,648 (hex $80000000) and 2,147,483,647 (hex $7FFFFFFF). ---------------------- Converting a Character ---------------------- To find out a character's ASCII equivalent value in decimal and hex, use the SHOWNUM utility this way: SHOWNUM A This will display: A = #65 $41 In other words, the letter "A" is ASCII decimal 65, which is hex 41. 169 You can enclose the character in quotes if you wish. This is strictly necessary only if you are trying to remember the values for "space": SHOWNUM " " This will display: " " = #32 $20 Character conversion works only one character at a time: SHOWNUM "ABCD" <-- This is incorrect This would display: "ABCD" = #65 $41 As you can see, only the first character ("A") is being converted. ---------------------- Windows Considerations ---------------------- If you call SHOWNUM.EXE from Windows, your answer will vanish when the program ends and the window closes. For this reason, the file SHOWNUM.BAT has been included in the standard Parse-O-Matic package. You can use this exactly like SHOWNUM.EXE, but it pauses after the answer is displayed. For additional details, see "Installing the SHOWNUM Utility" in the "Running Under Windows" section of this manual. 170 ============================================================================ PROGRAMMING TECHNIQUES ============================================================================ This section describes techniques that will help you create and debug POM files. ------- Tracing ------- By setting the DOS variable POM to ALL, you can generate a trace file, named POM.TRC. This is helpful if you have trouble understanding why your file isn't being parsed properly. But be sure to test it with a SMALL input file; the trace is quite detailed, and it can easily generate a huge output file. To save space, you can specify a particular list of variables to be traced, rather than tracing everything. For example, to trace only the variable PRICE, enter this DOS command: SET POM=PRICE To trace several variables, separate the variable names by slashes, as in this example: SET POM=PRICE/BONUS/NAME This traces the three variables PRICE, BONUS and NAME. ------- Logging ------- Every time Parse-O-Matic runs, it creates a "processing log". This is a text file named POMLOG.TXT, which is placed in Parse-O-Matic's home directory. (For example, if POM.EXE is located in C:\POM, the file will be C:\POM\POMLOG.TXT even if you run POM from another directory.) If the file POMLOG.TXT already exists, it is renamed to POMLOG.BAK. The processing log file POMLOG.TXT contains a report of what happened during the last run of Parse-O-Matic. Usually, the file will be quite short and look something like this: COMMAND: POM TEST.POM TEST.TXT TEMP.TXT DATE: AUG 01 1997 17:50:10 TEST.TXT opened for processing 17:50:14 TEST.TXT processing completed The first line gives the DOS command line, while the second gives the date. Subsequent lines give the time (Hours:Minutes:Seconds) and a progress or error message. 171 If you encounter an error during processing, the text of the warning message is saved in the processing log. It might look something like this: COMMAND: POM TEST.POM TEST.TXT TEMP.TXT DATE: AUG 01 1997 17:50:10 TEST.TXT opened for processing 17:50:10 Execution error in line number 3 of POM file TEST.POM 17:50:11 Required parameter is missing in OUT If you process multiple input files, POMLOG.TXT might look something like this: COMMAND: POM EXAMPL15.POM DATA*.TXT TEMP.TXT DATE: AUG 01 1997 14:21:27 DATA01.TXT opened for processing 14:21:28 DATA01.TXT processing completed 14:21:28 DATA02.TXT opened for processing 14:21:28 DATA02.TXT processing completed 14:21:28 DATA03.TXT opened for processing 14:21:28 DATA03.TXT processing completed 14:21:28 3 files processed If for some reason the processing log can not be created, Parse-O-Matic will continue to run; it will not terminate. For some additional comments on logging, see "Unattended Operation". 172 ============================================================================ COMMAND-LINE TECHNIQUES ============================================================================ This section describe the various options available at the command line. ---------- Quiet Mode ---------- Sometimes you don't want the user to see the Parse-O-Matic processing screen. In such cases, you can use the "Quiet Mode" switch (/Q) on the command line. For example: POM XYZ.POM MYFILE.TXT TEMP.TXT /Q The /Q switch suppresses the display of the processing screen. The only time a user will see anything is if there is a problem (for example: the input file was not found). In such case, Parse-O-Matic will make a noise via the PC speaker, then display a message (see "Unattended Operation" and "The MSGWAIT Command" for some background information). -------------------------------------- User-Specified Command-Line Parameters -------------------------------------- Some POM applications are easier to use if you can pass a value on the command line. For example, let us say you have created a POM file that will search a text file for a particular word, and output all the lines that contain that word. You could manually change the POM file each time, but this is not a practical solution if many people will be using the application. The alternative is to use user-defined command line parameters to pass the value. Consider this POM file, which we will name FINDWORD.POM: ACCEPT $FLINE ^ $CMDLINEX OUTEND |{$FLINE} This will output any lines that contain (^) the value of $CMDLINEX. This variable is set if you use the /X command line parameter. Thus, you would call the POM file as follows: POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /Xadministration This would read the file INPUT.TXT and send any lines containing the word "administration" (in lowercase) to the file OUTPUT.TXT. 173 There are three user-specified command line parameters: PARAMETER SETS VARIABLE --------- ------------- /X $CMDLINEX /Y $CMDLINEY /Z $CMDLINEZ Like all variables, the $CMDLINEx variables can be in uppercase, lowercase or mixed case (e.g. $CmdLineX). ------------------- Case Considerations ------------------- The parameter letter (e.g. /X) can be in uppercase or lowercase. The following two command lines are equivalent: POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /Xadministration POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /xadministration The value following the parameter letter is passed on "as-is"; it is not converted to uppercase or changed in any way. You could modify the FINDWORD file (described earlier) to ignore case, as follows: CVTCASE $CMDLINEX $CMDLINEX <-- Convert the /X value to uppercase ACCEPT $FLUPC ^ $CMDLINEX <-- Compare to the uppercase version of $FLINE OUTEND |{$FLINE} <-- Output the line if the value was found ---------------- Spaces in Values ---------------- Since command-line parameters are separated by spaces, you must enclose your parameter value in quotes if it contains one or more spaces. For example: POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /X"Mr. Jones" If the parameter value contains quotes, they must be doubled-up. Thus, if you were using the FINDWORD application to find The "King" of American Jazz the command would look like this: POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /X"The ""King"" of American Jazz" 174 --------------------- Command-Line Switches --------------------- Sometimes you want a command-line parameter to simply mean "do something". If it is missing from the command line, it would mean "don't do it". You could specify /ZY (for "Yes, do it") and /ZN (for "No, don't do it"), but there is an easier way. If a command-line parameter is missing, Parse-O-Matic sets its corresponding $CMDLINEx variable to "N" (meaning, "No, it's not there"). If the parameter is present -- with no value following it -- Parse-O-Matic sets the $CMDLINEx variable to "Y" (meaning "Yes, it's there"). We can use this method to refine the FINDWORD application described earlier... BEGIN $CMDLINEZ = "Y" CVTCASE $CMDLINEX $CMDLINEX ACCEPT $FLUPC ^ $CMDLINEX ELSE ACCEPT $FLINE ^ $CMDLINEX END OUTEND |{$FLINE} In this case, the /X value (the word we are looking for) will be tested without regard to case (i.e. uppercase or lowercase letters) if /Z is present on the command line. If /Z is missing, we will look for an exact match. Here are two sample command lines: POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /X"Mr. Jones" POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /X"Mr. Jones" /Z The first command would not match "Mr. Jones" with "mr. jones". The second command would consider the two strings equivalent. ---------------------------- Hex and Decimal Code Strings ---------------------------- You can specify precise values either as a decimal ("ASCII") value, or as hexadecimal. For example, the following three commands are equivalent: POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /X" 01" POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /X$20$30$31 POM FINDWORD.POM INPUT.TXT OUTPUT.TXT /X#32#45#49 All three specify the search string " 01". The second one uses decimal notation for the characters, while the third one uses hexadecimal codes for the characters. 175 You can not mix the decimal, hex and literal representations. Here are two examples: /X#32"22"#32 is not permitted /X" 22 " could be used instead /X#32#50#50#32 could also be used /X$30#50 is not permitted /X02 could be used instead /X"02" could also be used ------- Summary ------- SAMPLE COMMAND LINE VALUE OF $CMDLINEX ---------------------------------- ------------------ POM A.POM B.TXT C.TXT /XHarry Harry POM A.POM B.TXT C.TXT /X"The King" The King POM A.POM B.TXT C.TXT /X Y POM A.POM B.TXT C.TXT /XY Y POM A.POM B.TXT C.TXT /XN N POM A.POM B.TXT C.TXT N POM A.POM B.TXT C.TXT /X$30$31 01 POM A.POM B.TXT C.TXT /X#48#50 02 176 ============================================================================ FILE HANDLING ============================================================================ ------------------------------------- How Parse-O-Matic Searches for a File ------------------------------------- When Parse-O-Matic needs to read a file, it follows this procedure: 1) Parse-O-Matic tidies up the file name in the following ways: - It removes spaces and tabs - It converts the file name to uppercase - As per DOS convention, slashes (/) are converted to backslashes (\) - If this type of file has a default extension, and if the file name does not have a period (i.e. dot) in the name, the extension is added. 2) If the file name is fully qualified (i.e. drive and path, or both), Parse-O-Matic tries to open that file. If it can not, it terminates with an error message. 3) If the file name is not fully qualified, Parse-O-Matic follows this procedure: - It first looks for the file in the current directory. - If then looks in the directory where the Parse-O-Matic program (POM.EXE) is located. - It then searches the DOS PATH for the file. (For information about the PATH command, refer to your operating system manual.) - If none of these steps locate the file, Parse-O-Matic terminates with an error message. The following types of files are affected... TYPE OF FILE DEFAULT EXTENSION REFER TO MANUAL SECTION ---------------------- ----------------- ----------------------- POM (Control) File .POM "The POM File" POJ (Job) File .POJ "Parse-O-Matic Job (POJ) Files" Date Information File See NOTE #1 "The POMDATE.CFG File" Lookup File See NOTE #2 "The LookFile Command" Properization Exception See NOTE #2 "The Proper Command" Map File .MPF "The MAPFILE Command" NOTE #1: The Date Information File is always called POMDATE.CFG. You can put your standard version in the Parse-O-Matic directory. If you wish to override it, you should place the modified copy in your current (logged) directory. NOTE #2: This type of file does not have a default extension. However, we recommend "TBL" (i.e. "Table") for Lookup files and "PEF" for Properization Exception Files. 177 Parse-O-Matic does NOT search for input and output files. They must be in the current directory, or must have a fully qualified path. If the input file is missing an extension, it is assumed to be TXT. If the output file is not specified in the POM command, it is assumed to be Output File") Since Parse-O-Matic searches for files, you can place frequently-used Lookup and POM files in a directory in your DOS path. -------------------------------------- How Parse-O-Matic Opens an Output File -------------------------------------- Parse-O-Matic opens an output file the first time one of the output commands (e.g. OUT, OUTEND, OUTHDG) has something to send to the file. When opening an output file, Parse-O-Matic follows this procedure: 1) Normally, the name of the output file is specified on the POM command line or (if it is omitted there), it is specified in an OFILE command within the POM file. If no output file name was given using either method, the name is set to POMOUT.TXT (in the current directory). 2) Parse-O-Matic tidies up the file name in the following ways: - It removes spaces and tabs - It converts the file name to uppercase - As per DOS convention, slashes (/) are converted to backslashes (\) - If the file name does not have an extension, and it does not end in a period or a colon, the extension TXT is added. Thus: C:\XYZ becomes C:\XYZ.TXT C:\XYZ. stays the same C:\XYZ.DAT stays the same LPT2: stays the same (see "Sending Output to a Device") 3) The output file name is compared to the input file name. If they are the same, Parse-O-Matic terminates with an error. You can not send output to the input file, nor can you read input from the output file. 4A) If the file name is preceded by a plus sign ("+"), Parse-O-Matic will append output to the file. Here are some examples: +C:\XYZ.TXT output will be appended to the file +LPT1: this refers to a device, so the "+" is ignored If the file to which you are appending does not already exist, it is first created, as an empty file. 178 4B) If the file name is NOT preceded by a plus sign, the following procedure takes place: - If a file with the specified name already exists, it is renamed with a .BAK extension (replacing any previous file with that name). - The file is created, as an empty file For example, if you run Parse-O-Matic as follows: POM MYPOM.POM INPUT.TXT C:\XYZ.TXT then if C:\XYZ.TXT already exists, it is renamed to C:\XYZ.BAK. 5) Output is directed to the output file until Parse-O-Matic ends or a new output file name is specified by the OFILE command. REMINDER: Parse-O-Matic does not open the output file until it is time to send it some data from the output commands (OUT, OUTEND etc.). If no data is sent to the output file, it will contain its original data (assuming it already existed). If this is a problem, you can either delete the output file before running Parse-O-Matic, or place the following commands in the PROLOGUE: ERASE "OUTPUT.TXT" <-- Delete the output file OFILE "OUTPUT.TXT" <-- Specify the output file If you do this, and no data is sent to the output file, the file will not exist. You can check if POM failed by consulting the DOS ERRORLEVEL. (See your operating system manual for an explanation of ERRORLEVEL.) - If the ERRORLEVEL is 0 and the file does not exist, it means that POM ran successfully, but no output was sent to the file. - If the ERRORLEVEL is 1 or higher, and the file does not exist, it means that POM failed, or you used the HALT command before any output was sent to the file. If you are calling Parse-O-Matic from a program (rather than a batch file), you can check the error level using the facilities built in to the language in which the program was written. For example, Turbo Pascal lets you run another program with the EXEC command, after which you can extract the ERRORLEVEL from the low byte of the DosExitCode variable. 179 --------------------------- Appending to an Output File --------------------------- If you want to add data to the end of the output file, you have three alternatives: 1) Use wildcards, as explained in "POM and Wildcards". In such case, the output file is empty when the first output line is generated (although see method #2 for an exception). When processing with wildcards, all output is sent to the same file, unless you change the file with the OFILE command (see "The OFile Command"). 2) Prefix the output file name with a plus sign. This tells Parse-O- Matic that you want to add data to the end of the file, rather than starting with an empty file. You can use this method on the command line: POM MYPOM.POM INPUT.TXT +C:\MYFILES\OUTPUT.TXT You can also use this method in the OFILE command: OFILE "+C:\MYFILES\OUTPUT.TXT" In these examples, we provided the full path name to the output file. If you do not specify a path name (e.g. OFILE "+OUTPUT.TXT"), the output file is placed in the current directory. 3) Use a batch file and the DOS COPY command to control the concatenation of output files. This method is less convenient, but it allows you to bypass the addition of the new output if there is a processing error. Here is a sample batch file (comments appear after the arrows): @ECHO OFF <-- Turn batch echoing off IF EXIST OUTPUT.TXT DELETE OUTPUT.TXT <-- Get rid of old output file POM MYPOM.POM INPUT.TXT OUTPUT.TXT <-- Parse the input file IF ERRORLEVEL 1 GOTO QUIT <-- Quit if there was an error IF NOT EXIST OUTPUT.TXT GOTO QUIT <-- Quit if no output generated IF EXIST SAFETY.TXT DELETE SAFETY.TXT <-- Get rid of old safety file RENAME MAINFILE.TXT SAFETY.TXT <-- Backup the original file COPY SAVE.TXT+OUTPUT.TXT MAINFILE.TXT <-- Add the new output :QUIT <-- Batch file label for GOTO This method has the added advantage of creating a backup copy of the original output file. If the data in the file is particularly important, you could place the file SAFETY.TXT on another hard drive. 180 -------------------------- Sending Output to a Device -------------------------- Parse-O-Matic recognizes that an "output file" is actually a device if it has a colon (":") at the end of the name. You can direct Parse-O-Matic's output to a standard device such as COM1: or LPT2: by specifying the device name accordingly. For example: POM XYZ.POM INPUT.TXT LPT1: This directs the output to the LPT1 printer. Parse-O-Matic can detect a "Not Ready" condition in most cases. A printer is Not Ready when it is offline, out of paper, or its print buffer is full. If a Not Ready condition occurs, the following happens: - If you are running in Quiet Mode (/Q on the POM command line), a Not Ready condition terminates Parse-O-Matic with a DOS ERRORLEVEL of 243. - If you are not running in Quiet Mode, a message box gives you the option of trying again, or canceling processing. If you cancel, Parse-O-Matic terminates with a DOS ERRORLEVEL of 244. --------- COM Ports --------- If you are sending output to a COM port (e.g. COM1:) you should first set the baud rate with the DOS MODE command, or Pinnacle Software's MODEM program (available on our BBS and Web site). The MODEM program is particularly useful if your COM port is driving a modem. Parse-O-Matic talks to the operating system's COM device driver rather than the modem itself, so before you send data to a modem, it is a good idea to use the MODEM program to check that the modem is online and functioning properly. If you are using a high-speed modem (9600 bps or higher) and you find that you sometimes lose some characters, the operating system or the modem may not be handling a "Not Ready" condition properly during handshaking. In such case, you may find it necessary to turn off buffering (locked DTE speed) and run at a maximum speed of 9600 bps. For a quick course in high-speed modems and buffering, see the Trouble-Shooting Guide included with Pinnacle Software's Sapphire Bulletin Board System (also available on our BBS and Web site). For an example of sending output to a COM port, see "The Pause Command". 181 --------- DbF Files --------- If Parse-O-Matic notices that the input file is a "DBase" file (i.e. it has a DBF extension -- for example: MYFILE.DBF), it will change the way it processes the data. For instance, the variable $FLINE is not defined. Rather, each of the fields in the database are pre-parsed. Thus, if you have a DBF file containing three fields (EMPNUM, NAME, PHONE), your entire POM file might look like this: IGNORE DELETED "Y" OUTEND |{EMPNUM} {NAME} {PHONE} The DELETED variable is created automatically for each record. If it is set to "Y", it means the record has been deleted from the database and is probably not valid. In most cases, you will want to ignore such records. If you do not know what the field names are, you can obtain the list with the following POM file: TRACE DELETED Afterwards, when you inspect the trace file (POM.TRC), you will see a summary of all the fields. Since there are no output commands (e.g. OUTEND and OUTHDG), the output file will be empty. NOTE: Parse-O-Matic does not currently support DBF "Memo" fields. ----------------- POM and Wildcards ----------------- You can process multiple input files with the same POM file by specifying a DOS "wildcard" at the DOS command prompt. All output is then directed to the same output file. For example: POM XYZ.POM *.TXT OUTPUT.TXT This runs the XYZ POM file on each file in the current directory with a TXT extension and sends all output to the file OUTPUT.TXT. The POM file can determine which file it is reading by using the predefined variable $COMMAND, which contains the current POM command line. 182 Consider the following scenario: - You have installed POM.EXE in the directory path C:\UTILITY\POM - The current directory contains ABC.POM, MARK.TXT, MARY.TXT and JOHN.TXT - You enter the command POM ABC *.DAT OUT.TXT Parse-O-Matic runs ABC.POM against the three TXT files. On the first input file, $COMMAND will look like this: C:\UTILITY\POM.EXE ABC.POM MARK.TXT OUT.TXT On the next two input files, it looks like this: C:\UTILITY\POM.EXE ABC.POM MARY.TXT OUT.TXT C:\UTILITY\POM.EXE ABC.POM JOHN.TXT OUT.TXT Note that the file OUT.TXT is NOT processed, even though it has a TXT extension. POM will always avoid processing the output file. Let's say you wanted to concatenate both MARK.TXT and MARY.TXT, and put the file name at the top. You could do it with this POM file, named ABC.POM: SET cmd = $COMMAND <-- Get the command line BEGIN cmd <> lastcmd <-- Has it changed? PARSE fname cmd "2* " "3* " <-- Extract the input file name SETLEN flen fname <-- Get length of input file name SET uline = "" <-- Initialize underline PAD uline "L" "-" flen <-- Set underline OUTEND lastcmd <> "" | <-- Output a linefeed unless OUTEND lastcmd <> "" | <-- this is the first file OUTEND |{fname} <-- Output the file name OUTEND |{uline} <-- Output the underline OUTEND | <-- Output a linefeed SET lastcmd = $COMMAND <-- Remember this command line END <-- End of code block OUTEND |{$FLINE} <-- Output a line from the input You could then process MARK.TXT and MARY.TXT with this command line: POM ABC M*.TXT OUT.TXT This processes any file starting with an "M" that has a TXT extension. Another way to run the command is as follows: POM ABC M???.TXT OUT.TXT This processes any four-letter TXT file that starts with "M". For details about DOS wildcards, consult your operating system manual. 183 ============================================================================ OPERATIONAL TECHNIQUES ============================================================================ ----------------------------- Parse-O-Matic Job (POJ) Files ----------------------------- As explained earlier, a standard Parse-O-Matic command line looks like this: Format: POM pom-file input-file output-file [optional parameters] Example: POM POMFILE.POM REPORT.TXT OUTPUT.TXT You can save the command specifications in a Parse-O-Matic job (.POJ) file so that you do not have to type them over and over again. A POJ file is essentially a single-line text file that contains the specifications. ------------ Simple Usage ------------ Let's say you create a text file named MYJOB.POJ which contains the following line: POMFILE.POM REPORT.TXT OUTPUT.TXT You can then run the parsing job with the following command: POM MYJOB.POJ This would be the same as typing: POM POMFILE.POM REPORT.TXT OUTPUT.TXT NOTE: Do NOT include the POM command in the job file. For example: XYZ.POM ABC.TXT DEF.TXT <-- Correct POM XYZ.POM ABC.TXT DEF.TXT <-- Incorrect; the initial POM is not needed 184 --------------------- Commenting a Job File --------------------- The POJ file can contain explanatory comments so you can document its purpose: null lines and lines that start with a semi-colon are ignored. Thus, the MYJOB.POJ example (described above) could also have been written this way: ; Sample JOB file POMFILE.POM REPORT.TXT OUTPUT.TXT This first line is ignored, so as far as Parse-O-Matic is concerned, the job file contains only one line -- the command specifications. (The sample PBJ files are indented here to offset them from the descriptive text, but a semi-colon must appear in column one to be recognized as a comment.) ------------------------ Prompting for File Names ------------------------ Job files are especially helpful when you don't know in advance which files will be processed, and you would like to prompt for a file name. This lets you use one POM command-line to handle different situations. Let's say you create a text file named MYJOB2.POJ, which contains the following lines: ; Sample prompted job file POMFILE.POM % OUTPUT.TXT You can then run this job with the following DOS command: POM MYJOB2.POJ or simply: POM MYJOB2 Since a single command-line parameter is assumed to refer to a POJ file. The % character means "Ask the user for this file". Since, in this example, the % is in the second position, Parse-O-Matic will prompt you for the name of the input file or mask. (If you press Esc or enter a null name, processing is terminated with a warning message.) 185 Here is another example: ; Sample prompted job file POMFILE.POM % % This will prompt for both the input file and output file names. Now consider this example: ; Sample prompted job file % % % This will prompt for the POM, input and output file names -- which is not particularly helpful! --------------------- Suggesting File Names --------------------- A % prompt will open a box on the screen to ask you for a file name (with an appropriate explanation of what it required) but it will not suggest a name. You can "feed" Parse-O-Matic a suggested name, which you can then accept by hitting Enter, or modify by typing a new one. To provide this default value, place the suggested file name after the % character. For example: ; Sample prompted job file, with recommended file names POMFILE.POM %INPUT.TXT %OUTPUT.TXT When you are asked to specify the input file name, the name INPUT.TXT will appear in the prompt box. It can be accepted as-is, or changed. Incidentally, if you do not provide a suggested file name for the output file, Parse-O-Matic will automatically suggest the name POMOUT.TXT. ------------------- Optional Parameters ------------------- You can include optional parameters in your job file line. For example: POMFILE.POM ABC.TXT XYZ.TXT /Q Because of the /Q parameter, this will parse ABC.TXT in "Quiet mode" (i.e. the processing screen will not be displayed). You can not prompt for optional parameters. 186 Because the output file name is optional in the POM command (defaulting to POMOUT.TXT if it is omitted, which can then be altered in the POM file itself with the OFILE command), it is possible to leave out the output file name (or prompt) in the job file: POMFILE.POM ABC.TXT /Q This assumes that the output file name is POMOUT.TXT, just as if you had typed the following command at the DOS prompt: POM POMFILE.POM ABC.TXT /Q -------- Examples -------- The standard Parse-O-Matic package contains two sample job files, named SAMPJOB1.POJ and SAMPJOB2.POJ, which demonstrate simple and prompted usage of job files. Both files are self-documenting, so view or print them out to find out how to use them. ------------------------------- Encrypted (Scrambled) POM Files ------------------------------- Parse-O-Matic comes with a simple encryption program (named SCRAMBLE) which renders a POM file unreadable to humans yet still usable by Parse-O-Matic. -------------------- The SCRAMBLE Utility -------------------- The format of the SCRAMBLE command (entered at the DOS prompt) is as follows: SCRAMBLE filename password The filename is the name of your POM file, which will be "scrambled" into a seemingly meaningless jumble of characters. The original POM file is renamed with a BAK extension, so if you scramble MYFILE.POM, as in this command: SCRAMBLE MYFILE.POM PRIVATE then two files will be created: MYFILE.POM The scrambled POM file MYFILE.BAK The original POM file 187 The password is a code word of six letters or more that you can use to unscramble the file. To do this, simply enter the SCRAMBLE command again, using the same password. SCRAMBLE ignores case, so the password "SECRET" is equivalent to "Secret". If the wrong password is entered, SCRAMBLE issues a warning message, pauses briefly, then terminates without doing anything else. For a brief summary of SCRAMBLE, enter the following command at the DOS prompt: SCRAMBLE /? ------------------------ Why Scramble a POM File? ------------------------ If you are distributing a POM application, you may not want the end-users to be able to tamper with the POM file. If they alter a scrambled POM file, they will almost certainly render it inoperable. The encryption scheme used by SCRAMBLE (and Parse-O-Matic) is not highly sophisticated, so there are no export restrictions on the technology. The encryption could probably be "cracked" by a professional cryptologist within a day. However, it should confound regular users sufficiently that they will not alter, excerpt or modify your POM code. When Parse-O-Matic notices that the POM file is encrypted, it disables tracing. This prevents the user from using the trace facility to get a listing of the POM file. ---------------------- Support Considerations ---------------------- The SCRAMBLE program does NOT have to be present in the Parse-O-Matic directory for Parse-O-Matic to read scrambled files. If you are shipping a scrambled POM file to a user, you might consider sending along a copy of SCRAMBLE if you think the user might have to make some on-site changes to the POM file. However, in order to do this, the user would have to know your POM file's scrambling password. In most cases it is probably better to provide the user with a fresh file (direct from you) when changes are needed. Scrambled POM files can be used with unregistered evaluation copies of Parse-O-Matic, although the legal requirement remains: continued use of Parse-O-Matic beyond 90 days requires a license (see "Licensing"). This lets your client try out your Parse-O-Matic solution without paying for Parse-O-Matic OR seeing your POM code. 188 ----------- Batch Files ----------- The built-in batch (BAT) capability of DOS and Windows is often overlooked, even by seasoned computer professionals. You can use batch files to make Parse-O-Matic easier to use. Batch files are created with a text editor (such as DOS EDIT, or Windows Notepad). -------------------------------------- Example #1: Save Yourself Some Typing -------------------------------------- Here is a simple batch file (comments appear after the arrows): @ECHO OFF <-- Turn off command-line echoing POM MYPOM.POM INPUT.TXT OUTPUT.TXT <-- Run Parse-O-Matic IF ERRORLEVEL 1 GOTO QUIT <-- Quit if an error occurred SEE OUTPUT.TXT <-- View the output file :QUIT <-- Batch file label The advantage of this batch file is that it saves you the trouble of typing in the entire POM command line each time you want to parse the file. ---------------------------------------- Example #2: Streamline Your Development ---------------------------------------- Here is a batch file which is useful during the development of a POM file. @ECHO OFF DEVELOP 50 MYPOM.POM IN.TXT C:\MYFILES\OUT.TXT This batch file calls DEVELOP.BAT (included with Parse-O-Matic), which displays a menu with the following options: INPUT ------ View input file EDIT ------- Edit POM file PARSE ------ Run parsing job OUTPUT ----- View output file QUIT ------- Finished This lets you do the parsing, view the result, make changes to the POM file if necessary, then parse again. You will find that this technique makes development proceed quickly. Here is an explanation of the second line of the batch file: 189 DEVELOP 50 MYPOM.POM IN.TXT C:\MYFILES\OUT.TXT : : : : : : : : : : : : : : Name of output file <----- : : : : | : : : Name of input file <-------- See NOTE #1 : : : | : : Name of POM file <----- : : : Save position for menu <-------- See NOTE #2 : Invokes the batch file DEVELOP.BAT NOTE #1: You must provide the full path to the files (unless they are in the current directory) and the extension. NOTE #2: The "save position" remembers where you were in the menu. You may use values 49 to 99 to provide a "memory" for 50 different batch files that call DEVELOP.BAT. (The other values are reserved for the Parse-O-Matic installation and tutorial procedures.) If 50 is not enough, you can place additional batch files in another directory; the menu save file (POM.MSV) is always placed in the current directory. In order for DEVELOP.BAT to work correctly when you are in a directory other than the Parse-O-Matic directory, you must place the Parse-O-Matic directory in your DOS PATH (see your operating system manual for details). Your PATH must also include the directory to a text editor. (In the original Parse-O-Matic package, DEVELOP.BAT calls up DOS EDIT.) You may find it instructive to study the file DEVELOP.BAT by loading it into a text editor. The batch file contains some comments which explain how it works. As mentioned in one of the comments, you may wish to change the text editor that the batch file calls for editing the POM file. You may also find the program PSMENU.EXE useful. For a brief description, type PSMENU /? at the DOS prompt. To study a typical menu definition file, enter the command SEE POM.MNU at the DOS prompt. ---------------------------------- Example #3: Automatic Batch Files ---------------------------------- Let's say that each day you have a text file, named DELLIST.TXT, which lists the names of the files that need to be deleted: FRED.TXT MARY.TXT JOHN.TXT HARRY.TXT You could write a POM file (we'll call it MAKEDEL.POM) to write a batch file to delete the files. It would look like this: 190 PROLOGUE OUTEND |@ECHO OFF END IGNORE $FLINE = "COMMAND.COM" <-- An example of a safety feature! OUTEND $FLINE <> "" |DEL {$FLINE} You could automate the entire procedure with the following batch file (which we'll call DAILYDEL.BAT): @ECHO OFF <-- Turn off command-line echoing POM MAKEDEL.POM DELLIST.TXT TEMP.BAT <-- Create the batch file TEMP.BAT IF ERRORLEVEL 1 GOTO QUIT <-- Quit if an error occurred TEMP.BAT <-- Run the batch file DEL TEMP.BAT <-- Delete it :QUIT <-- Batch file label The second line of DAILYDEL.BAT runs Parse-O-Matic to create a batch file named TEMP.BAT. Given the input file shown earlier, TEMP.BAT would look like this: @ECHO OFF DEL FRED.TXT DEL MARY.TXT DEL JOHN.TXT DEL HARRY.TXT After TEMP.BAT is created, DAILYDEL.BAT runs TEMP.BAT (thus deleting all the files listed in DELLIST.TXT). This is only a simple example. Parse-O-Matic's ability to create batch files based on input data provides you with a very powerful tool for automating daily administrative tasks. When you write automatic applications, you should be careful to include routines in both the batch files and the POM files to handle any unusual conditions. In MAKEDEL.POM, we checked the file to be sure that it wasn't "COMMAND.COM", because if that file is deleted, your system will probably stop working! 191 --------------------------------------------------------- Example #4: Controlling a POM File from the Command Line --------------------------------------------------------- Consider the following batch file, which we will call SELECT.BAT: @ECHO OFF <-- Turn off command-line echoing IF (%1) == () GOTO ERROR <-- Make sure we have a parameter SET XYZ=;1 <-- Set the environment variable XYZ POM SELECT.POM INPUT.TXT OUT.TXT <-- Run the POM file SELECT.POM GOTO QUIT <-- Jump to the QUIT label :ERROR <-+ ECHO Missing parameter | Error-handling routine PAUSE <-+ :QUIT <-- Batch file label SET XYZ= <-- Get rid of the environment variable SELECT.BAT can be used with this POM file, which will we name SELECT.POM: PROLOGUE GETENV xyz "XYZ" END OUTEND $FLINE ^ xyz |{$FLINE} You can use SELECT.BAT to output only those lines that contain the variable that you specify. For example, you can enter the following command at the DOS prompt: SELECT MARY This will output only those lines (from INPUT.TXT) that contain "MARY". If you wish to ignore distinctions between uppercase and lowercase, change the last line of SELECT.POM accordingly: OUTEND $FLUPC ^ xyz |{$FLINE} Batch file parameters are separated by spaces on the command line, so the following command would not work as you might expect: SELECT MARY FRED JOHN This would set the batch variable %1 to MARY, ;2 to FRED and ;3 to JOHN. One way to deal with this is to eliminate the spaces when you run the batch file: SELECT MARY/FRED/JOHN 192 You can then replace the OUTEND command in SELECT.POM with these lines: APPEND x xyz "/" <-- Set the x variable to "MARY/FRED/JOHN/" BEGIN x <> "" <-- We will loop through all of the names PEEL y x "" "/" <-- Move a name to the y variable OUTEND $FLUPC ^ y |{$FLINE} <-- Output a line if it contains the name AGAIN <-- Go back to the BEGIN Bear in mind that the system environment space is limited. If you have problems with an application like this one, refer to "The GETENV Command", in the section entitled "Disappearing Environment Variables". -------------------- Unattended Operation -------------------- You can design applications that run themselves while you are not there. There are two reasons why you might want to do this: - You can run long processing jobs just before leaving work at night - Parse-O-Matic is useful, but it isn't very interesting to watch! Several features of Parse-O-Matic facilitate "unattended operation". - The SOUND command can alert you if something unusual happens; you don't have to stare at the screen to make sure that everything is working. - All error messages (which say "Press a key to continue") make a noise via the PC speaker (see "The Sound Command"). - You can use the MSGWAIT command to let processing continue if there is an error (see "The MSGWAIT Command"). - The processing log (see "Logging") can be used to check processing. Let's say you wanted to concatenate (add together) several enormous text files. You could start with the following POM file (named ADD.POM): SET cmd = $COMMAND BEGIN cmd <> lastcmd SOUND "BEEP" SET lastcmd = cmd END OUTEND |{$FLINE} You could then enter the command POM ADD.POM *.TXT ALL.TXT and walk away. Whenever a new file is started, you will hear a beep. When you come back, you can check the file POMLOG.TXT (which will be located in the same directory as POM.EXE). It might look something like this: 193 COMMAND: POM ADD.POM *.TXT ALL.TXT DATE: AUG 01 1997 16:39:12 JOHN.TXT opened for processing 16:45:28 JOHN.TXT processing completed 16:45:29 MARY.TXT opened for processing 16:52:10 MARY.TXT processing completed 16:52:11 FRED.TXT opened for processing 17:03:33 FRED.TXT processing completed If you are processing multiple files, and each one uses a different POM file (and hence requires a separate run of Parse-O-Matic) you can write your batch file so that it renames the log files. This lets you review each log file later. For example: @ECHO OFF POM JOHN.POM JOHN.TXT JOHN.LST RENAME C:\POM\POMLOG.TXT JOHN.LOG POM MARY.POM MARY.TXT MARY.LST RENAME C:\POM\POMLOG.TXT MARY.LOG POM FRED.POM FRED.TXT FRED.LST RENAME C:\POM\POMLOG.TXT FRED.LOG When processing is complete, the files JOHN.LOG, MARY.LOG and FRED.LOG will be available in the directory C:\POM for your inspection. Here is a slightly more sophisticated version of the batch file: @ECHO OFF POM JOHN.POM JOHN.TXT JOHN.LST IF ERRORLEVEL 1 GOTO QUIT RENAME C:\POM\POMLOG.TXT JOHN.LOG POM MARY.POM MARY.TXT MARY.LST IF ERRORLEVEL 1 GOTO QUIT RENAME C:\POM\POMLOG.TXT MARY.LOG POM FRED.POM FRED.TXT FRED.LST IF ERRORLEVEL 1 GOTO QUIT RENAME C:\POM\POMLOG.TXT FRED.LOG :QUIT The IF ERRORLEVEL lines jump to the end of the batch file if Parse-O-Matic generates an error of 1 or higher. When coding batch files, remember that the IF ERRORLEVEL command is considered "True" if the error is the specified value or higher. This means you should always test the higher value first. See your operating system manual for details. 194 -------- Examples -------- Many of the techniques described in this manual are demonstrated by the examples provided with the standard Parse-O-Matic package. To see these examples, switch to your Parse-O-Matic directory, type INFO at the DOS prompt, or run INFO.BAT from Windows or OS/2, then select TUTORIAL. 195 ============================================================================ OPERATIONAL CONSIDERATIONS ============================================================================ ----------------------------------------------- Running Parse-O-Matic on 8088 and 8086 Machines ----------------------------------------------- Parse-O-Matic is designed to run on 80286 machines and higher, in "protected mode". If you have a pressing need to run it on an old 8088 or 8086-class "real mode" machine (e.g. PC or XT), we may be able to prepare a custom copy for you. Contact us for details. ------------------------------------------ Running Parse-O-Matic from Another Program ------------------------------------------ If you are calling Parse-O-Matic from a program written in a high-level language (such as Pascal, Delphi, C or Basic), you can check its success or failure by consulting the "DOS Error Level". Most languages have built-in facilities to test this value. For example, Turbo Pascal lets you run another program with the EXEC command, after which you can extract the ERRORLEVEL from the low byte of the DosExitCode variable. You can also check the DOSERROR return code to check for invocation errors. Some typical errors include: Program not found, Path not found, Access denied, Not enough memory. On long parsing jobs (taking 3 seconds or more on your slowest machine), it is perhaps best to let the user see the processing screen rather than running in Quiet Mode (see "Quiet Mode"). If nothing else, it gives him or her something to look at, and provides assurance that the machine has not stopped working. ----------------------- Solving Memory Problems ----------------------- Parse-O-Matic does all of its work in standard memory; it does not use Extended or Expanded memory. This is rarely a problem, but if you do somehow run out of memory, there are some steps you can take... You can often free up some extra memory by unloading unused device drivers and DOS TSR ("Terminate and Stay Resident") programs. (TSR's are sometimes called "DOS Pop-Ups") Alternatively, most drivers and TSR's can be safely moved into high memory, using the LOADHIGH function in your AUTOEXEC.BAT, or the DEVICEHIGH function in CONFIG.SYS. Some older drivers and TSR's will not tolerate this kind of relocation. 196 ============================================================================ RUNNING UNDER WINDOWS ============================================================================ ------------- Compatibility ------------- Parse-O-Matic is a DOS program, which has a few advantages and a few minor disadvantages for Windows users. The primary advantage is that a Parse-O-Matic application can run on any PC-compatible machine, whether it is running DOS, Windows, or OS/2. Emulators are also available which will let you run Parse-O-Matic (and other DOS software) on Macintosh, Unix, Linux and other computers. Since Parse-O-Matic has no user interface to speak of, Windows' wonderful graphical environment is not particularly important. The only operational difference is that to interrupt Parse-O-Matic processing, you press the Esc key instead of clicking on a Cancel button. Performance is a consideration if you are running Parse-O-Matic at the same time as 32-bit applications under Windows; it will slow them down slightly. However, unless you are multi-tasking heavily, performance is not an issue because the usual bottleneck is the responsiveness and transfer speed of the hard disk, not the speed at which the Parse-O-Matic program runs. ------------------------- Setting Up for Windows 95 ------------------------- To use Parse-O-Matic under Windows, you need the following items, which are included in the standard Parse-O-Matic package: 1) The POM file (icon file POM_FILE.ICO) 2) A batch file (icon file BAT.ICO) These two icon files are included in the standard Parse-O-Matic package. You may find it helpful to copy them to your main Windows directory so that the associations you set for them are not lost if you install a new version of Parse-O-Matic and then delete the original POM directory. ------------------------------------------ Setting Up an Association for the POM File ------------------------------------------ When you click on a POM file, it should call up a text editor. To configure this, follow these steps: 1) Double-click on "My Computer" 2) From the pull-down menu, select View/Options 197 3) Dialog Box: Options Click on the File Types tab Click on the New Type button 4) Dialog Box: Add New File Type Description: Parse-O-Matic Control File Associated extension: POM Click on the New button 5) Dialog Box: New Action Action: &Edit Application used: NOTEPAD.EXE (or the path to your favorite editor) Click on the OK button 6) Dialog Box: Add New File Type Click on the Change Icon button Click on the Browse button File name: The full path to POM_FILE.ICO (e.g. C:\POM\POM_FILE.ICO) Press Enter 7) Dialog Box: Change Icon Click the OK button 8) Dialog Box: Add New File Type Click the Close button 9) Dialog Box: Options Click the Close button Once you have followed these steps, you can double-click on the POM file icon when you are in Windows Explorer or a file folder, and it will be opened with the file editor you specified in step 5. ---------------------------------------------- Setting Up an Association for a POJ (Job) File ---------------------------------------------- When you click on a POJ file, it should call up Parse-O-Matic. To configure this, follow these steps: 1) Double-click on "My Computer" 2) From the pull-down menu, select View/Options 3) Dialog Box: Options Click on the File Types tab Click on the New Type button 4) Dialog Box: Add New File Type Description: Parse-O-Matic Control File Associated extension: POJ Click on the New button 198 5) Dialog Box: New Action Action: &Open Application used: The path to POM.EXE (e.g. C:\POM\POM.EXE) Click on the OK button 6) Dialog Box: Add New File Type Click on the Change Icon button Click on the Browse button File name: The path to POJ_FILE.ICO (e.g. C:\POM\POJ_FILE.ICO) Press Enter 7) Dialog Box: Change Icon Click the OK button 8) Dialog Box: Add New File Type Click the Close button 9) Dialog Box: Options Click the Close button Once you have followed these steps, you can double-click on the POJ file icon when you are in Windows Explorer or a file folder, and it will start up Parse-O-Matic and run to specified job (POJ) file. ----------------------------------------------------- Setting Up an Association for the BAT File (Optional) ----------------------------------------------------- Windows 95 is already set up to process batch (BAT) files. However, Parse-O-Matic comes with an alternative icon which is more distinctive than the one supplied with Windows. (The Parse-O-Matic icon looks like a bat -- a sonar-equipped flying critter with an undeserved bad reputation). To change the icon, follow these steps: 1) Double-click on "My Computer" 2) From the pull-down menu, select View/Options 3) Dialog Box: Options On the list box, find and double-click on MS-DOS Batch File 4) Dialog Box: Edit File Type Click on the Change Icon button 5) Dialog Box: Change Icon Click the Browse button File name: The full path to BAT.ICO (e.g. C:\POM\BAT.ICO) Press Enter 6) Dialog Box: Change Icon Click the OK button 199 7) Dialog Box: Edit File Type Click the Close button 8) Dialog Box: Options Click the Close button After following this procedure, your batch (BAT) file icons will be much more noticeable when they appear in Windows Explorer or a file folder. To edit the batch file, right-click on the icon and select Edit. To run the batch file, simply double-click the icon. For a discussion of batch files, see "Batch Files". ----------------------------------- Setting Up for Windows 98, Me, 2000 ----------------------------------- Under Windows, when you click on file types that are not recognized (which will initially be the case for .POM and .POJ files), Windows offers to assign them to a particular program. You need to associate .POM files with your favorite text editor, and .POJ files with POM.EXE. When you double-click an unrecognized file type, Windows will ask you what program you want to use to open that file. You should start by typing in a description for the file type (e.g. "POM Job File" for .POJ files and "POM Script File" for .POM files). You then point to the appropriate program. In the case of .POJ files, you will click on the "Other..." button to point to the program POM.EXE. ---------------------------------- Additional Associations (Optional) ---------------------------------- After associating .POJ files with POM.EXE and .POM files with your favorite text editor, you can provide additional, right-clickable options (such as opening a .POJ file with a text editor). Under WinMe, you do this as follows: - Double-click "My Computer" (located on your desktop) - Select "Tools", then "Folder Options" - Click on the "File Types" tab - Click on the file type you want to enhance (POM or POJ) - Click on "Advanced" 200 ------------------------------ Installing the ShowNum Utility ------------------------------ The ShowNum program is a small utility which converts a hex number to decimal and vice-versa, or displays the ASCII value of a character (see "The ShowNum Utility"). To install ShowNum as a Windows shortcut: 1) Select "File/New/Shortcut" from the pull-down menu of any folder. 2) Specify the path name to SHOWNUM.BAT, followed by a question mark. For example: C:\POM\SHOWNUM.BAT ? The ? means "Prompt for input before calling the batch file". 3) After you have finished defining the shortcut, right-click on the icon, select "Properties", then the "Program" tab, and make sure the "Close on exit" box is checked off. You can then use ShowNum by double-clicking its icon. You will be prompted to enter a number, and the answer will be displayed. -------------------------- Long File Names in Windows -------------------------- Although Parse-O-Matic runs under Windows, it will only recognize standard "8.3" DOS file names; it does not use the long file names supported by Windows. You can determine the underlying DOS name of a file by checking its "Properties" in Windows Explorer, or by using the DIR command while in DOS mode. ------------------- The WINUTIL Utility ------------------- The WinUtil program (WINUTIL.EXE in the standard Parse-O-Matic package) is a small utility which provides some Windows functions if you are running in a DOS window. WinUtil can detect if Windows is running, and lets you copy a file to the Windows clipboard, or create a file from the contents of the clipboard. 201 ----------- Limitations ----------- WinUtil does not work with Windows 3.1 or earlier. It has been tested under Windows 95, 98 and Me. WinUtil requires the use of 8.3 style file names. That is to say, it can not handle the long file names that were introduced by Windows 95. WinUtil can handle clipboard data up to 65535 bytes. Attempts to process a larger amount of data will cause an error. ---------------------- Quick Reference Screen ---------------------- To display a quick-reference help screen, enter the following command at the DOS prompt: WINUTIL /? ----------------- Sample Batch File ----------------- You will usually use WinUtil in a batch file. For example: @ECHO OFF <-- Turn off echo WINUTIL CHECK <-- See if Windows is running IF ERRORLEVEL 1 GOTO OKWIN <-- Skip ahead if Windows is running ECHO Windows is not running! <-- Display warning GOTO QUIT <-- Skip ahead to end of batch file :OKWIN <-- Batch file label WINUTIL CLIP WRITE work.txt <-- Copy the clipboard to a text file POM editwork.pom in.txt out.txt <-- Modify the text file with POM IF ERRORLEVEL 1 GOTO OOPS <-- Stop if there's an error WINUTIL CLIP READ out.txt <-- Copy the text file to the clipboard DEL in.txt <-- Erase work file DEL out.txt <-- Erase work file GOTO QUIT <-- Skip ahead to end of batch file :OOPS <-- Batch file label ECHO Parsing job failed <-- Display warning PAUSE <-- Wait for a key to be pressed :QUIT <-- Batch file label 202 ------------------------------- Detecting if Windows is Running ------------------------------- To check if Windows is running, use the following command format on the DOS command line, or in a batch file: FORMAT: WINUTIL CHECK [V] EXAMPLE: WINUTIL CHECK The V parameter is optional. If it is included, WinUtil will operate in "verbose" mode. That is to say, it will display one of the following messages: Windows is running Windows is NOT running If the V parameter is omitted, WinUtil does not display a message. If Windows is running, WinUtil sets the DOS return code to 1. If Windows is not running, WinUtil sets the DOS return code to 0. You can test the return code using the IF ERRORLEVEL batch file function. (See "Sample Batch File", above, for an example.) ------------------------------------ Copying the Clipboard to a Text File ------------------------------------ To copy the Windows clipboard to a text file, use the following command format on the DOS command line, or in a batch file: FORMAT: WINUTIL CLIP WRITE filename EXAMPLE: WINUTIL CLIP WRITE clip.txt The DOS return code is set as follows: 0 The operation was successfully carried out 1 The clipboard did not contain any text 255 A processing error occurred If the file you are writing to already exists, it is renamed with a .BAK extension. 203 ------------------------------------ Copying a Text File to the Clipboard ------------------------------------ To copy a text file to the Windows clipboard, use the following command format on the DOS command line, or in a batch file: FORMAT: WINUTIL CLIP READ filename EXAMPLE: WINUTIL CLIP READ clip.txt The DOS return code is set as follows: 0 The operation was successfully carried out 1 The text file did not contain any text 255 A processing error occurred --------------- The SEE Utility --------------- Parse-O-Matic comes with a file view/print/extract utility known as SEE (SEE.EXE). The copy included in the standard package is for single-user operation with an evaluation or registered copy of Parse-O-Matic. It may not be distributed (even if you have purchased a Distribution License for Parse-O-Matic). You may purchase a registered copy of SEE (including the full manual and walk-through). Multi-copy and Distribution Licenses are available. For more information: please phone us at +1-416-287-8892. 204