home *** CD-ROM | disk | FTP | other *** search
- WORKSHEET FILE FORMAT
- FROM LOTUS
-
- APPENDIX B - THE FORMULA COMPILER
-
- Copyright(c) 1984, Lotus Development Corporation
- 161 First Street
- Cambridge, Massachusetts 02142
- (617) 492-7171
- Electronic Edition, December, 1984
- All Rights Reserved
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- APPENDIX B: The Formula Compiler
-
- This appendix describes the internal workings of the formula compiler. The
- compiler transforms an ASCII string of characters representing a formula to
- its Reverse Polish code. The basic algorithm utilizes and SR parser (SR =
- shift and reduce). The aim of the parser is to apply a set of reduction
- rules which embody the syntax of the compiler to an input string. Formula
- code is compiled to a temporary buffer.
-
- Lexicon Analysis
-
- A lexical analyzer breaks up the input string into lexical units called
- tokens. A token is a substring of the original input string operand,
- operator, or special symbol (such as comma, parentheses, etc.) In addition,
- the lexical analyser supplies two special tokens, "beginning of formula"
- (boform) and "end of formula" (eoform), to facilitate the compilation
- process. The lexical analyzer identifies and processes literals (both
- number and string), cell and range references, operators, and function
- calls. It assigns a unique code to each distinct operator, function, or
- type of operand.
-
- A function with no arguments is treated like a number.
-
- Syntax Analysis
-
- The syntactical analysis of a formula is accomplished by processing a list
- of tokens in left-to-right order. A stack called the syntax is also used
- during the syntactical scan. The basic algorithm is as follows:
-
- Repeat the following steps:
-
- 1) Get the next token
-
- 2) If the token is a literal or cell reference:
- a) Push the number code on the syntax stack
- b) Push the number code on the syntax stack
-
- 3) If the token is a range reference:
- a) Compile code to push the range reference
- b) Push the range code on the syntax stack
-
- 4) Otherwise push the token code for the token on the syntax stack.
-
- For each syntax rule, if the pattern on the top of the syntax matches the
- rule pattern take the action associated with the rule and start scanning
- from the beginning for any additional rules which may apply.
-
- When a token code is pushed on the syntax stack, an additional word of
- zeros is also pushed on the stack. This is used when compiling function
- calls to hold the function's argument count.
-
-
-
-
-
- Rule Matching
-
- A relatively small number of rules are used to process formulas of arbitrary
- complexity. If a rule matches the top of the syntax stack, then the
- compiler takes a specific action and rule scanning starts again with the
- first rule. Each rule matches certain patterns on the syntax stack. A
- typical rule might be: if the top of the stack is the token for right
- parenthesis, and the next-to-top is a number, and the second form the top
- is a left parenthesis, then pop the top three items from the syntax stack
- and push the number on the syntax stack.
-
- This rule can be more succinctly represented as:
-
- Stack
-
- Before After Action
- )
- number
- ( number none
-
-
-
- The Rules
-
-
- The following are the syntax rules used to process formulas. Note that the
- order of the rules is important. The rules for compilation of operators
- used additional tables which assign a precedence number and opcode to each
- legal unary and binary operator. Thus, for example, there is a single
- token code for minus sign (-), but there are two opcodes one for unary
- minus and one for binary minus. In addition, these two operators, while
- lexically identical, also have different precedence. In general, operators
- of higher precedence will be performed before operators of lower precedence
- are performed left-to-right. All special operators (boform, eoform,
- parentheses, comma, etc.) are implicitly assigned a precedence of zero.
-
- Rule 1 Termination test
-
- Stack
-
- Before After Action
- eoform Output a return code to compile buffer
- number Return, indicating successful compile
- boform
-
- Rule 2 Function argument processing
-
- Stack
- Before After Action
- ' Error if range argument illegal for
- number or range function.
- ( ( Increment argument count on stack
- function function
-
- Rule 3 Process final function argument
-
- Stack
- Before After Action
- ) Error if range argument illegal for
- number or range function.
- ( Increment argument count on stack
- function number Compile function opcode
- If list function, compile argument
- count; otherwise error is wrong
- argument count.
-
-
-
-
- Rule 4 Parenthesis removal
-
- Stack
- Before After Action
- ) Compile parenthesis opcode
- number
- ( number
- operator operator
-
-
-
- Rule 5 Binary operators
-
- Stack
- Before After Action
- op2 If binary op<binary op, rule does
- number not match. Otherwise, compile opcode
- op1 op2 for operator op1.
-
-
- Rule 6 Unary operators
-
- Stack
- Before After Action
- op2 I unary op<binary op, rule does
- number op2 not match. Otherwise, compile opcode.
- op1 number for operator op 1.
-
-
- Rule 7 Error detection
-
- Stack
- Before After Action
- eoform Return indicating unsuccessful compile
-
-
-
-
-
- Table 9 Operator Precedence Table
-
- Operator Unary Precedence Binary Precedence
- + 6 4
- - 6 4
- * na 5
- / na 7
- ^ na 3
- = na 3
- < > na 3
- < = na 3
- > = na 3
- < na 3
- > na 3
- #and# na 1
- #or# na 1
- #not# 2 na
-
-
-
-
-
-
-
-
-
-
-
- Example:
-
- Using the above rules, we can now see how a particular formula is
- compiled. Let us consider the following formula:
-
- 3+5*6
-
- This is broken up by the lexical analyzer into seven tokens.
-
- boform
- 3
- +
- 5
- *
- 6
- eoform
-
- The syntax scans proceed as follows until a matching rule is found:
-
- Stack
-
- boform number + number
- boform number +
- boform number
- boform
-
- Compile buffer
-
- push 3 push 3 push 3
- push 5
-
- At this point, rule 5 is invoked, but since the precedence of boform is
- zero, no action is taken.
-
- Stack
-
- * number
- number *
- + number
- number +
- boform number
- boform
-
- Compile buffer
-
- push 3 push 3
- push 5 push 5
- push 6
-
-
-
-
-
-
-
- At this point, since the binary precedence of + is lower than the binary
- precedence of *, rule 5 does apply, and the opcode for * is compiled. The
- stack is reduced by replacing number * number by number and scan is made,
- but no further rule applies.
-
-
- Stack
-
- number eoform
- + number
- number +
- boform number
- boform
-
- Compile buffer
-
- push 3 push 3
- push 5 push 5
- push 6 push 6
-
-
-
- Rule 5 applies again, and the opcode for + is compiled, reducing the stack
- to boform, number, eoform. Rescanning finds a match on rule 1 which
- compiles a return opcode and terminates. The final compiled code is thus:
-
- push 3
- push 5
- push 6
- *
- +
- return
-
- A Note on the Decompiler
-
- The algorithm for the formula decompiler was taken verbatim from:
-
- Writing Interactive Compilers and Interpreters, P.J. Brown, John Wiley and
- Sons, 1979. See chapter 6.2. The algorithm itself is described on pages
- 216 and 217.
-
- This algorithm is also described in the following article.
-
- More on the Re-creation of Source Code from Reverse Polish, P.J. Brown,
- Software Practice and Experience, Vol 7, 545-551 (1977).
-
-
-
-
-
-
-
-
-
-
- WORKSHEET COLUMN DESIGNATORS
-
- Most records within the 1-2-3 Condensed Worksheet format are specified
- with column/row designators (for example, column 0, row 0 equals A1). When
- determining the column designator, the table below will help make
- conversion easier.
-
-
- Column Hex Dec Column Hex Dec Column Hex Dec
- A 0 1 BA 34 52 DA 68 104
- B 1 1 BB 35 53 DB 69 105
- C 2 2 BC 36 54 DC 6A 106
- D 3 3 BD 37 55 DD 6B 107
- E 4 4 BE 38 56 DE 6C 108
- F 5 5 BF 39 57 DF 6D 109
- G 6 6 BG 3A 58 DG 6E 110
- H 7 7 BH 3B 59 DH 6F 111
- I 8 8 BI 3C 60 DI 70 112
- J 9 9 BJ 3D 61 DJ 71 113
- K A 10 BK 3E 62 DK 72 114
- L B 11 BL 3F 63 DL 73 115
- M C 12 BM 40 64 DM 74 116
- N D 13 BN 41 65 DN 75 117
- O E 14 BO 42 66 DO 76 118
- P F 15 BP 43 67 DP 77 119
- Q 10 16 BQ 44 68 DQ 78 120
- R 11 17 BR 45 69 DR 79 121
- S 12 18 BS 46 70 DS 7A 122
- T 13 19 BT 47 71 DT 7B 123
- U 14 20 BU 48 72 DU 7C 124
- V 15 21 BV 49 73 DV 7D 125
- W 16 22 BW 4A 74 DW 7E 126
- X 17 23 BX 4B 75 DX 7F 127
- Y 18 24 BY 4C 76 DY 80 128
- Z 19 25 BZ 4D 77 DZ 81 129
- AA 1A 26 CA 4E 78 EA 82 130
- AB 1B 27 CB 4F 79 EB 83 131
- AC 1C 28 CC 50 80 EC 84 132
- AD 1D 29 CD 51 81 ED 85 133
- AE 1E 30 CE 52 82 EE 86 134
- AF 1F 31 CF 53 83 EF 87 135
- AG 20 32 CG 54 84 EG 88 136
- AH 21 33 CH 55 85 EH 89 137
- AI 22 34 CI 56 86 EI 8A 138
- AJ 23 35 CJ 57 87 EJ 8B 139
- AK 24 36 CK 58 88 EK 8C 140
- AL 25 37 CL 59 89 EL 8D 141
- AM 26 38 CM 5A 90 EM 8E 142
- AN 27 39 CN 5B 91 EN 8F 143
- AO 28 40 CO 5C 92 EO 90 144
- AP 29 41 CP 5D 93 EP 91 145
- AQ 2A 42 CQ 5E 94 EQ 92 146
- AR 2B 43 CR 5F 95 ER 93 147
- AS 2C 44 CS 60 96 ES 94 148
- AT 2D 45 CT 61 97 ET 95 149
- AU 2E 46 CU 62 98 EU 96 150
- AV 2F 47 CV 63 99 EV 97 151
- AW 30 48 CW 64 100 EW 98 152
- AX 31 49 CX 65 101 EX 99 153
- AY 32 50 CY 66 102 EY 9A 154
- AZ 33 51 CZ 67 103 EZ 9B 155
-
-
-
-
-
-
-
-
- (CONTINUED)
-
-
-
-
- Column Hex Dec Column Hex Dec
-
- FA 9C 156 HA DO 208
- FB 9D 157 HB D1 209
- FC 9E 158 HC D2 210
- FD 9F 159 HD D3 211
- FE AO 160 HE D4 212
- FF A1 161 HF D5 213
- FG A2 162 HG D6 214
- FH A3 163 HH D7 215
- FI A4 164 HI D8 216
- FJ A5 165 HJ D9 217
- FK A6 166 HK DA 218
- FL A7 167 HL DB 219
- FM A8 168 HM DC 220
- FN A9 169 HN DD 221
- FO AA 170 HO DE 222
- FP AB 171 HP DF 223
- FQ AC 172 HQ EO 224
- FR AD 173 HR E1 225
- FS AE 174 HS E2 226
- FT AF 175 HT E3 227
- FU BO 176 HU E4 228
- FV B1 177 HV E5 229
- FW B2 178 HW E6 230
- FX B3 179 HX E7 231
- FY B4 180 HY E8 232
- FZ B5 181 HZ E9 233
- GA B6 182 IA EA 234
- GB B7 183 IB EB 235
- GC B8 184 IC EC 236
- GD B9 185 ID ED 237
- GE BA 186 IE EE 238
- GF BB 187 IF EF 239
- GG BC 188 IG FO 240
- GH BD 189 IH F1 241
- GI BE 190 II F2 242
- GJ BF 191 IJ F3 243
- GK CO 192 IK F4 244
- GL C1 193 IL F5 245
- GM C2 195 IM F6 246
- GN C3 195 IN F7 247
- GO C4 196 IO F8 248
- GP C5 197 IP F9 249
- GQ C6 198 IQ FA 250
- GR C7 199 IR FB 251
- GS C8 200 IS FC 252
- GT C9 201 IT FD 253
- GU CA 202 IU FE 254
- GV CB 203 IV FF 255
- GW CC 204
- GX CD 205
- GY CE 206
- GZ CF 207
-
-
-
-
- ANALYSIS OF 1-2-3 WORKSHEET FILE
-
- The worksheet shown below was created in 1-2-3 and saved to disk.
-
-
-
- Key:
-
- A2..A5 Named Range (code 11)
- EXAMPLE A2: Label (code 15)
- 100 A3: Integer (code 13)
- 12.5 A4: Number (code 14)
- 87.5 A5: Formula (+A3-A4)
- (code 16)
-
-
- The example shown below is a partial hex dump of this worksheet file. By
- reading each record header, you can determine the type of record you are
- encountering. The record header will also tell you the length of that
- follows the header. By analyzing the record header, you can read the
- records you want and skip unrelated records.
-
-
- 362B:0100 06 00 08 00 00 00 00 00 00 00
- 362B:0110 04 00 2F 00 01 00 01 02 00 01 00 FF 03 00 01 00
- 362B:0120 00 04 00 01 00 00 05 00 01 00 FF 07 00 1F 00 00
- 362B:0130 00 01 00 71 00 09 00 08 00 14 00 00 00 00 00 00
- 362B:0140 00 00 00 00 00 00 00 04 00 04 00 48 00 00 0B 00
- 362B:0150 18 00 54 45 53 54 00 00 00 00 00 00 00 00 00 00
- 362B:0160 00 00 00 00 01 00 00 00 04 00 18 00 19 00 00 FF
- 362B:0170 FF 00 00 FF FF 00 00 FF FF 00 00 FF FF 00 00 FF
- 362B:0180
-
-
- 362B:05C0
- 362B:05D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- 362B:05E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- 362B:05F0 00 00 00 00 71 71 01 00 0F 00 0E 00 FF 00 00 01
- 362B:0600 00 27 45 58 41 4D 50 4C 45 00 0D 00 07 00 FF 00
- 362B:0610 00 02 00 64 00
- 362B:0620 10 00 1B 00 FF 00 00 04 00 00
- 362B:0630 00 00 00 00 E0 55 40 0C 00 01 00 80 FE BF 01 00
- 362B:0640 80 FF BF 0A 03