home *** CD-ROM | disk | FTP | other *** search
-
-
-
- FLEX(1) FLEX(1)
-
-
- NNAAMMEE
- flex - fast lexical analyzer generator
-
- SSYYNNOOPPSSIISS
- fflleexx [[--bbccddffhhiillnnppssttvvwwBBFFIILLTTVV7788++ --CC[[aaeeffFFmmrr]] --PPpprreeffiixx --SSsskkeellee--
- ttoonn]] _[_f_i_l_e_n_a_m_e _._._._]
-
- DDEESSCCRRIIPPTTIIOONN
- _f_l_e_x is a tool for generating _s_c_a_n_n_e_r_s_: programs which
- recognized lexical patterns in text. _f_l_e_x reads the given
- input files, or its standard input if no file names are
- given, for a description of a scanner to generate. The
- description is in the form of pairs of regular expressions
- and C code, called _r_u_l_e_s_. _f_l_e_x generates as output a C
- source file, lleexx..yyyy..cc,, which defines a routine yyyylleexx(())..
- This file is compiled and linked with the --llffll library to
- produce an executable. When the executable is run, it
- analyzes its input for occurrences of the regular expres-
- sions. Whenever it finds one, it executes the correspond-
- ing C code.
-
- For full documentation, see fflleexxddoocc((11)).. This manual entry
- is intended for use as a quick reference.
-
- OOPPTTIIOONNSS
- _f_l_e_x has the following options:
-
- --bb generate backing-up information to _l_e_x_._b_a_c_k_u_p_.
- This is a list of scanner states which require
- backing up and the input characters on which they
- do so. By adding rules one can remove backing-up
- states. If all backing-up states are eliminated
- and --CCff or --CCFF is used, the generated scanner will
- run faster.
-
- --cc is a do-nothing, deprecated option included for
- POSIX compliance.
-
- NNOOTTEE:: in previous releases of _f_l_e_x --cc specified
- table-compression options. This functionality is
- now given by the --CC flag. To ease the the impact
- of this change, when _f_l_e_x encounters --cc,, it cur-
- rently issues a warning message and assumes that --CC
- was desired instead. In the future this "promo-
- tion" of --cc to --CC will go away in the name of full
- POSIX compliance (unless the POSIX meaning is
- removed first).
-
- --dd makes the generated scanner run in _d_e_b_u_g mode.
- Whenever a pattern is recognized and the global
- yyyy__fflleexx__ddeebbuugg is non-zero (which is the default),
- the scanner will write to _s_t_d_e_r_r a line of the
- form:
-
-
-
-
- Version 2.4 November 1993 1
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- --accepting rule at line 53 ("the matched text")
-
- The line number refers to the location of the rule
- in the file defining the scanner (i.e., the file
- that was fed to flex). Messages are also generated
- when the scanner backs up, accepts the default
- rule, reaches the end of its input buffer (or
- encounters a NUL; the two look the same as far as
- the scanner's concerned), or reaches an end-of-
- file.
-
- --ff specifies _f_a_s_t _s_c_a_n_n_e_r_. No table compression is
- done and stdio is bypassed. The result is large
- but fast. This option is equivalent to --CCffrr (see
- below).
-
- --hh generates a "help" summary of _f_l_e_x_'_s options to
- _s_t_d_e_r_r and then exits.
-
- --ii instructs _f_l_e_x to generate a _c_a_s_e_-_i_n_s_e_n_s_i_t_i_v_e scan-
- ner. The case of letters given in the _f_l_e_x input
- patterns will be ignored, and tokens in the input
- will be matched regardless of case. The matched
- text given in _y_y_t_e_x_t will have the preserved case
- (i.e., it will not be folded).
-
- --ll turns on maximum compatibility with the original
- AT&T lex implementation, at a considerable perfor-
- mance cost. This option is incompatible with --++,,
- --ff,, --FF,, --CCff,, or --CCFF.. See _f_l_e_x_d_o_c_(_1_) for details.
-
- --nn is another do-nothing, deprecated option included
- only for POSIX compliance.
-
- --pp generates a performance report to stderr. The
- report consists of comments regarding features of
- the _f_l_e_x input file which will cause a loss of per-
- formance in the resulting scanner. If you give the
- flag twice, you will also get comments regarding
- features that lead to minor performance losses.
-
- --ss causes the _d_e_f_a_u_l_t _r_u_l_e (that unmatched scanner
- input is echoed to _s_t_d_o_u_t_) to be suppressed. If
- the scanner encounters input that does not match
- any of its rules, it aborts with an error.
-
- --tt instructs _f_l_e_x to write the scanner it generates to
- standard output instead of lleexx..yyyy..cc..
-
- --vv specifies that _f_l_e_x should write to _s_t_d_e_r_r a sum-
- mary of statistics regarding the scanner it gener-
- ates.
-
- --ww suppresses warning messages.
-
-
-
- Version 2.4 November 1993 2
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- --BB instructs _f_l_e_x to generate a _b_a_t_c_h scanner instead
- of an _i_n_t_e_r_a_c_t_i_v_e scanner (see --II below). See
- _f_l_e_x_d_o_c_(_1_) for details. Scanners using --CCff or --CCFF
- compression options automatically specify this
- option, too.
-
- --FF specifies that the _f_a_s_t scanner table representa-
- tion should be used (and stdio bypassed). This
- representation is about as fast as the full table
- representation ((--ff)),, and for some sets of patterns
- will be considerably smaller (and for others,
- larger). It cannot be used with the --++ option.
- See fflleexxddoocc((11)) for more details.
-
- This option is equivalent to --CCFFrr (see below).
-
- --II instructs _f_l_e_x to generate an _i_n_t_e_r_a_c_t_i_v_e scanner,
- that is, a scanner which stops immediately rather
- than looking ahead if it knows that the currently
- scanned text cannot be part of a longer rule's
- match. This is the opposite of _b_a_t_c_h scanners (see
- --BB above). See fflleexxddoocc((11)) for details.
-
- Note, --II cannot be used in conjunction with _f_u_l_l or
- _f_a_s_t _t_a_b_l_e_s_, i.e., the --ff,, --FF,, --CCff,, or --CCFF flags.
- For other table compression options, --II is the
- default.
-
- --LL instructs _f_l_e_x not to generate ##lliinnee directives in
- lleexx..yyyy..cc.. The default is to generate such direc-
- tives so error messages in the actions will be cor-
- rectly located with respect to the original _f_l_e_x
- input file, and not to the fairly meaningless line
- numbers of lleexx..yyyy..cc..
-
- --TT makes _f_l_e_x run in _t_r_a_c_e mode. It will generate a
- lot of messages to _s_t_d_e_r_r concerning the form of
- the input and the resultant non-deterministic and
- deterministic finite automata. This option is
- mostly for use in maintaining _f_l_e_x_.
-
- --VV prints the version number to _s_t_d_e_r_r and exits.
-
- --77 instructs _f_l_e_x to generate a 7-bit scanner, which
- can save considerable table space, especially when
- using --CCff or --CCFF (and, at most sites, --77 is on by
- default for these options. To see if this is the
- case, use the --vv verbose flag and check the flag
- summary it reports).
-
- --88 instructs _f_l_e_x to generate an 8-bit scanner. This
- is the default except for the --CCff and --CCFF compres-
- sion options, for which the default is site-
- dependent, and can be checked by inspecting the
-
-
-
- Version 2.4 November 1993 3
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- flag summary generated by the --vv option.
-
- --++ specifies that you want flex to generate a C++
- scanner class. See the section on Generating C++
- Scanners in _f_l_e_x_d_o_c_(_1_) for details.
-
- --CC[[aaeeffFFmmrr]]
- controls the degree of table compression and scan-
- ner optimization.
-
- --CCaa trade off larger tables in the generated scan-
- ner for faster performance because the elements of
- the tables are better aligned for memory access and
- computation. This option can double the size of
- the tables used by your scanner.
-
- --CCee directs _f_l_e_x to construct _e_q_u_i_v_a_l_e_n_c_e _c_l_a_s_s_e_s_,
- i.e., sets of characters which have identical lexi-
- cal properties. Equivalence classes usually give
- dramatic reductions in the final table/object file
- sizes (typically a factor of 2-5) and are pretty
- cheap performance-wise (one array look-up per char-
- acter scanned).
-
- --CCff specifies that the _f_u_l_l scanner tables should
- be generated - _f_l_e_x should not compress the tables
- by taking advantages of similar transition func-
- tions for different states.
-
- --CCFF specifies that the alternate fast scanner rep-
- resentation (described in fflleexxddoocc((11)))) should be
- used. This option cannot be used with --++..
-
- --CCmm directs _f_l_e_x to construct _m_e_t_a_-_e_q_u_i_v_a_l_e_n_c_e
- _c_l_a_s_s_e_s_, which are sets of equivalence classes (or
- characters, if equivalence classes are not being
- used) that are commonly used together. Meta-
- equivalence classes are often a big win when using
- compressed tables, but they have a moderate perfor-
- mance impact (one or two "if" tests and one array
- look-up per character scanned).
-
- --CCrr causes the generated scanner to _b_y_p_a_s_s using
- stdio for input. In general this option results in
- a minor performance gain only worthwhile if used in
- conjunction with --CCff or --CCFF.. It can cause surpris-
- ing behavior if you use stdio yourself to read from
- _y_y_i_n prior to calling the scanner.
-
- A lone --CC specifies that the scanner tables should
- be compressed but neither equivalence classes nor
- meta-equivalence classes should be used.
-
- The options --CCff or --CCFF and --CCmm do not make sense
-
-
-
- Version 2.4 November 1993 4
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- together - there is no opportunity for meta-
- equivalence classes if the table is not being com-
- pressed. Otherwise the options may be freely
- mixed.
-
- The default setting is --CCeemm,, which specifies that
- _f_l_e_x should generate equivalence classes and meta-
- equivalence classes. This setting provides the
- highest degree of table compression. You can trade
- off faster-executing scanners at the cost of larger
- tables with the following generally being true:
-
- slowest & smallest
- -Cem
- -Cm
- -Ce
- -C
- -C{f,F}e
- -C{f,F}
- -C{f,F}a
- fastest & largest
-
-
- --CC options are cumulative.
-
- --PPpprreeffiixx
- changes the default _y_y prefix used by _f_l_e_x to be
- _p_r_e_f_i_x instead. See _f_l_e_x_d_o_c_(_1_) for a description
- of all the global variables and file names that
- this affects.
-
- --SSsskkeelleettoonn__ffiillee
- overrides the default skeleton file from which _f_l_e_x
- constructs its scanners. You'll never need this
- option unless you are doing _f_l_e_x maintenance or
- development.
-
- SSUUMMMMAARRYY OOFF FFLLEEXX RREEGGUULLAARR EEXXPPRREESSSSIIOONNSS
- The patterns in the input are written using an extended
- set of regular expressions. These are:
-
- x match the character 'x'
- . any character except newline
- [xyz] a "character class"; in this case, the pattern
- matches either an 'x', a 'y', or a 'z'
- [abj-oZ] a "character class" with a range in it; matches
- an 'a', a 'b', any letter from 'j' through 'o',
- or a 'Z'
- [^A-Z] a "negated character class", i.e., any character
- but those in the class. In this case, any
- character EXCEPT an uppercase letter.
- [^A-Z\n] any character EXCEPT an uppercase letter or
- a newline
- r* zero or more r's, where r is any regular expression
-
-
-
- Version 2.4 November 1993 5
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- r+ one or more r's
- r? zero or one r's (that is, "an optional r")
- r{2,5} anywhere from two to five r's
- r{2,} two or more r's
- r{4} exactly 4 r's
- {name} the expansion of the "name" definition
- (see above)
- "[xyz]\"foo"
- the literal string: [xyz]"foo
- \X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
- then the ANSI-C interpretation of \x.
- Otherwise, a literal 'X' (used to escape
- operators such as '*')
- \123 the character with octal value 123
- \x2a the character with hexadecimal value 2a
- (r) match an r; parentheses are used to override
- precedence (see below)
-
-
- rs the regular expression r followed by the
- regular expression s; called "concatenation"
-
-
- r|s either an r or an s
-
-
- r/s an r but only if it is followed by an s. The
- s is not part of the matched text. This type
- of pattern is called as "trailing context".
- ^r an r, but only at the beginning of a line
- r$ an r, but only at the end of a line. Equivalent
- to "r/\n".
-
-
- <s>r an r, but only in start condition s (see
- below for discussion of start conditions)
- <s1,s2,s3>r
- same, but in any of start conditions s1,
- s2, or s3
- <*>r an r in any start condition, even an exclusive one.
-
-
- <<EOF>> an end-of-file
- <s1,s2><<EOF>>
- an end-of-file when in start condition s1 or s2
-
- The regular expressions listed above are grouped according
- to precedence, from highest precedence at the top to low-
- est at the bottom. Those grouped together have equal
- precedence.
-
- Some notes on patterns:
-
- - Negated character classes _m_a_t_c_h _n_e_w_l_i_n_e_s unless
-
-
-
- Version 2.4 November 1993 6
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- "\n" (or an equivalent escape sequence) is one of
- the characters explicitly present in the negated
- character class (e.g., "[^A-Z\n]").
-
- - A rule can have at most one instance of trailing
- context (the '/' operator or the '$' operator).
- The start condition, '^', and "<<EOF>>" patterns
- can only occur at the beginning of a pattern, and,
- as well as with '/' and '$', cannot be grouped
- inside parentheses. The following are all illegal:
-
- foo/bar$
- foo|(bar$)
- foo|^bar
- <sc1>foo<sc2>bar
-
-
- SSUUMMMMAARRYY OOFF SSPPEECCIIAALL AACCTTIIOONNSS
- In addition to arbitrary C code, the following can appear
- in actions:
-
- - EECCHHOO copies yytext to the scanner's output.
-
- - BBEEGGIINN followed by the name of a start condition
- places the scanner in the corresponding start con-
- dition.
-
- - RREEJJEECCTT directs the scanner to proceed on to the
- "second best" rule which matched the input (or a
- prefix of the input). yyyytteexxtt and yyyylleenngg are set up
- appropriately. Note that RREEJJEECCTT is a particularly
- expensive feature in terms scanner performance; if
- it is used in _a_n_y of the scanner's actions it will
- slow down _a_l_l of the scanner's matching. Further-
- more, RREEJJEECCTT cannot be used with the --ff or --FF
- options.
-
- Note also that unlike the other special actions,
- RREEJJEECCTT is a _b_r_a_n_c_h_; code immediately following it
- in the action will _n_o_t be executed.
-
- - yyyymmoorree(()) tells the scanner that the next time it
- matches a rule, the corresponding token should be
- _a_p_p_e_n_d_e_d onto the current value of yyyytteexxtt rather
- than replacing it.
-
- - yyyylleessss((nn)) returns all but the first _n characters of
- the current token back to the input stream, where
- they will be rescanned when the scanner looks for
- the next match. yyyytteexxtt and yyyylleenngg are adjusted
- appropriately (e.g., yyyylleenngg will now be equal to _n
- ).
-
- - uunnppuutt((cc)) puts the character _c back onto the input
-
-
-
- Version 2.4 November 1993 7
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- stream. It will be the next character scanned.
-
- - iinnppuutt(()) reads the next character from the input
- stream (this routine is called yyyyiinnppuutt(()) if the
- scanner is compiled using CC++++))..
-
- - yyyytteerrmmiinnaattee(()) can be used in lieu of a return
- statement in an action. It terminates the scanner
- and returns a 0 to the scanner's caller, indicating
- "all done".
-
- By default, yyyytteerrmmiinnaattee(()) is also called when an
- end-of-file is encountered. It is a macro and may
- be redefined.
-
- - YYYY__NNEEWW__FFIILLEE is an action available only in <<EOF>>
- rules. It means "Okay, I've set up a new input
- file, continue scanning". It is no longer
- required; you can just assign _y_y_i_n to point to a
- new file in the <<EOF>> action.
-
- - yyyy__ccrreeaattee__bbuuffffeerr(( ffiillee,, ssiizzee )) takes a _F_I_L_E pointer
- and an integer _s_i_z_e_. It returns a YY_BUFFER_STATE
- handle to a new input buffer large enough to acco-
- modate _s_i_z_e characters and associated with the
- given file. When in doubt, use YYYY__BBUUFF__SSIIZZEE for the
- size.
-
- - yyyy__sswwiittcchh__ttoo__bbuuffffeerr(( nneeww__bbuuffffeerr )) switches the
- scanner's processing to scan for tokens from the
- given buffer, which must be a YY_BUFFER_STATE.
-
- - yyyy__ddeelleettee__bbuuffffeerr(( bbuuffffeerr )) deletes the given
- buffer.
-
- VVAALLUUEESS AAVVAAIILLAABBLLEE TTOO TTHHEE UUSSEERR
- - cchhaarr **yyyytteexxtt holds the text of the current token.
- It may be modified but not lengthened (you cannot
- append characters to the end). Modifying the last
- character may affect the activity of rules anchored
- using '^' during the next scan; see fflleexxddoocc((11)) for
- details.
-
- If the special directive %%aarrrraayy appears in the
- first section of the scanner description, then
- yyyytteexxtt is instead declared cchhaarr yyyytteexxtt[[YYYYLLMMAAXX]],,
- where YYYYLLMMAAXX is a macro definition that you can
- redefine in the first section if you don't like the
- default value (generally 8KB). Using %%aarrrraayy
- results in somewhat slower scanners, but the value
- of yyyytteexxtt becomes immune to calls to _i_n_p_u_t_(_) and
- _u_n_p_u_t_(_)_, which potentially destroy its value when
- yyyytteexxtt is a character pointer. The opposite of
- %%aarrrraayy is %%ppooiinntteerr,, which is the default.
-
-
-
- Version 2.4 November 1993 8
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- You cannot use %%aarrrraayy when generating C++ scanner
- classes (the --++ flag).
-
- - iinntt yyyylleenngg holds the length of the current token.
-
- - FFIILLEE **yyyyiinn is the file which by default _f_l_e_x reads
- from. It may be redefined but doing so only makes
- sense before scanning begins or after an EOF has
- been encountered. Changing it in the midst of
- scanning will have unexpected results since _f_l_e_x
- buffers its input; use yyyyrreessttaarrtt(()) instead. Once
- scanning terminates because an end-of-file has been
- seen, yyoouu ccaann aassssiiggnn _y_y_i_n at the new input file and
- then call the scanner again to continue scanning.
-
- - vvooiidd yyyyrreessttaarrtt(( FFIILLEE **nneeww__ffiillee )) may be called to
- point _y_y_i_n at the new input file. The switch-over
- to the new file is immediate (any previously
- buffered-up input is lost). Note that calling
- yyyyrreessttaarrtt(()) with _y_y_i_n as an argument thus throws
- away the current input buffer and continues scan-
- ning the same input file.
-
- - FFIILLEE **yyyyoouutt is the file to which EECCHHOO actions are
- done. It can be reassigned by the user.
-
- - YYYY__CCUURRRREENNTT__BBUUFFFFEERR returns a YYYY__BBUUFFFFEERR__SSTTAATTEE handle
- to the current buffer.
-
- - YYYY__SSTTAARRTT returns an integer value corresponding to
- the current start condition. You can subsequently
- use this value with BBEEGGIINN to return to that start
- condition.
-
- MMAACCRROOSS AANNDD FFUUNNCCTTIIOONNSS YYOOUU CCAANN RREEDDEEFFIINNEE
- - YYYY__DDEECCLL controls how the scanning routine is
- declared. By default, it is "int yylex()", or, if
- prototypes are being used, "int yylex(void)". This
- definition may be changed by redefining the
- "YY_DECL" macro. Note that if you give arguments
- to the scanning routine using a K&R-style/non-
- prototyped function declaration, you must terminate
- the definition with a semi-colon (;).
-
- - The nature of how the scanner gets its input can be
- controlled by redefining the YYYY__IINNPPUUTT macro.
- YY_INPUT's calling sequence is
- "YY_INPUT(buf,result,max_size)". Its action is to
- place up to _m_a_x___s_i_z_e characters in the character
- array _b_u_f and return in the integer variable _r_e_s_u_l_t
- either the number of characters read or the con-
- stant YY_NULL (0 on Unix systems) to indicate EOF.
- The default YY_INPUT reads from the global file-
- pointer "yyin". A sample redefinition of YY_INPUT
-
-
-
- Version 2.4 November 1993 9
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- (in the definitions section of the input file):
-
- %{
- #undef YY_INPUT
- #define YY_INPUT(buf,result,max_size) \
- { \
- int c = getchar(); \
- result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
- }
- %}
-
-
- - When the scanner receives an end-of-file indication
- from YY_INPUT, it then checks the function yyyywwrraapp(())
- function. If yyyywwrraapp(()) returns false (zero), then
- it is assumed that the function has gone ahead and
- set up _y_y_i_n to point to another input file, and
- scanning continues. If it returns true (non-zero),
- then the scanner terminates, returning 0 to its
- caller.
-
- The default yyyywwrraapp(()) always returns 1.
-
- - YY_USER_ACTION can be redefined to provide an
- action which is always executed prior to the
- matched rule's action.
-
- - The macro YYYY__UUSSEERR__IINNIITT may be redefined to provide
- an action which is always executed before the first
- scan.
-
- - In the generated scanner, the actions are all gath-
- ered in one large switch statement and separated
- using YYYY__BBRREEAAKK,, which may be redefined. By
- default, it is simply a "break", to separate each
- rule's action from the following rule's.
-
- FFIILLEESS
- --llffll library with which to link scanners to obtain the
- default versions of _y_y_w_r_a_p_(_) and/or _m_a_i_n_(_)_.
-
- _l_e_x_._y_y_._c
- generated scanner (called _l_e_x_y_y_._c on some systems).
-
- _l_e_x_._y_y_._c_c
- generated C++ scanner class, when using --++..
-
- _<_F_l_e_x_L_e_x_e_r_._h_>
- header file defining the C++ scanner base class,
- FFlleexxLLeexxeerr,, and its derived class, yyyyFFlleexxLLeexxeerr..
-
- _f_l_e_x_._s_k_l
- skeleton scanner. This file is only used when
- building flex, not when flex executes.
-
-
-
- Version 2.4 November 1993 10
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- _l_e_x_._b_a_c_k_u_p
- backing-up information for --bb flag (called _l_e_x_._b_c_k
- on some systems).
-
- SSEEEE AALLSSOO
- flexdoc(1), lex(1), yacc(1), sed(1), awk(1).
-
- M. E. Lesk and E. Schmidt, _L_E_X _- _L_e_x_i_c_a_l _A_n_a_l_y_z_e_r _G_e_n_e_r_a_-
- _t_o_r
-
- DDIIAAGGNNOOSSTTIICCSS
- _r_e_j_e_c_t___u_s_e_d___b_u_t___n_o_t___d_e_t_e_c_t_e_d _u_n_d_e_f_i_n_e_d or
-
- _y_y_m_o_r_e___u_s_e_d___b_u_t___n_o_t___d_e_t_e_c_t_e_d _u_n_d_e_f_i_n_e_d _- These errors can
- occur at compile time. They indicate that the scanner
- uses RREEJJEECCTT or yyyymmoorree(()) but that _f_l_e_x failed to notice the
- fact, meaning that _f_l_e_x scanned the first two sections
- looking for occurrences of these actions and failed to
- find any, but somehow you snuck some in (via a #include
- file, for example). Make an explicit reference to the
- action in your _f_l_e_x input file. (Note that previously
- _f_l_e_x supported a %%uusseedd//%%uunnuusseedd mechanism for dealing with
- this problem; this feature is still supported but now dep-
- recated, and will go away soon unless the author hears
- from people who can argue compellingly that they need it.)
-
- _f_l_e_x _s_c_a_n_n_e_r _j_a_m_m_e_d _- a scanner compiled with --ss has
- encountered an input string which wasn't matched by any of
- its rules.
-
- _w_a_r_n_i_n_g_, _r_u_l_e _c_a_n_n_o_t _b_e _m_a_t_c_h_e_d indicates that the given
- rule cannot be matched because it follows other rules that
- will always match the same text as it. See _f_l_e_x_d_o_c_(_1_) for
- an example.
-
- _w_a_r_n_i_n_g_, --ss _o_p_t_i_o_n _g_i_v_e_n _b_u_t _d_e_f_a_u_l_t _r_u_l_e _c_a_n _b_e _m_a_t_c_h_e_d
- means that it is possible (perhaps only in a particular
- start condition) that the default rule (match any single
- character) is the only one that will match a particular
- input. Since
-
- _s_c_a_n_n_e_r _i_n_p_u_t _b_u_f_f_e_r _o_v_e_r_f_l_o_w_e_d _- a scanner rule matched
- more text than the available dynamic memory.
-
- _t_o_k_e_n _t_o_o _l_a_r_g_e_, _e_x_c_e_e_d_s _Y_Y_L_M_A_X _- your scanner uses %%aarrrraayy
- and one of its rules matched a string longer than the YYYYLL--
- MMAAXX constant (8K bytes by default). You can increase the
- value by #define'ing YYYYLLMMAAXX in the definitions section of
- your _f_l_e_x input.
-
- _s_c_a_n_n_e_r _r_e_q_u_i_r_e_s _-_8 _f_l_a_g _t_o _u_s_e _t_h_e _c_h_a_r_a_c_t_e_r _'_x_' _- Your
- scanner specification includes recognizing the 8-bit char-
- acter _'_x_' and you did not specify the -8 flag, and your
- scanner defaulted to 7-bit because you used the --CCff or --CCFF
-
-
-
- Version 2.4 November 1993 11
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- table compression options.
-
- _f_l_e_x _s_c_a_n_n_e_r _p_u_s_h_-_b_a_c_k _o_v_e_r_f_l_o_w _- you used uunnppuutt(()) to push
- back so much text that the scanner's buffer could not hold
- both the pushed-back text and the current token in yyyytteexxtt..
- Ideally the scanner should dynamically resize the buffer
- in this case, but at present it does not.
-
- _i_n_p_u_t _b_u_f_f_e_r _o_v_e_r_f_l_o_w_, _c_a_n_'_t _e_n_l_a_r_g_e _b_u_f_f_e_r _b_e_c_a_u_s_e _s_c_a_n_-
- _n_e_r _u_s_e_s _R_E_J_E_C_T _- the scanner was working on matching an
- extremely large token and needed to expand the input
- buffer. This doesn't work with scanners that use RREEJJEECCTT..
-
- _f_a_t_a_l _f_l_e_x _s_c_a_n_n_e_r _i_n_t_e_r_n_a_l _e_r_r_o_r_-_-_e_n_d _o_f _b_u_f_f_e_r _m_i_s_s_e_d _-
- This can occur in an scanner which is reentered after a
- long-jump has jumped out (or over) the scanner's activa-
- tion frame. Before reentering the scanner, use:
-
- yyrestart( yyin );
-
- or use C++ scanner classes (the --++ option), which are
- fully reentrant.
-
- AAUUTTHHOORR
- Vern Paxson, with the help of many ideas and much inspira-
- tion from Van Jacobson. Original version by Jef
- Poskanzer.
-
- See flexdoc(1) for additional credits and the address to
- send comments to.
-
- DDEEFFIICCIIEENNCCIIEESS // BBUUGGSS
- Some trailing context patterns cannot be properly matched
- and generate warning messages ("dangerous trailing con-
- text"). These are patterns where the ending of the first
- part of the rule matches the beginning of the second part,
- such as "zx*/xy*", where the 'x*' matches the 'x' at the
- beginning of the trailing context. (Note that the POSIX
- draft states that the text matched by such patterns is
- undefined.)
-
- For some trailing context rules, parts which are actually
- fixed-length are not recognized as such, leading to the
- abovementioned performance loss. In particular, parts
- using '|' or {n} (such as "foo{3}") are always considered
- variable-length.
-
- Combining trailing context with the special '|' action can
- result in _f_i_x_e_d trailing context being turned into the
- more expensive _v_a_r_i_a_b_l_e trailing context. For example, in
- the following:
-
- %%
- abc |
-
-
-
- Version 2.4 November 1993 12
-
-
-
-
-
- FLEX(1) FLEX(1)
-
-
- xyz/def
-
-
- Use of uunnppuutt(()) or iinnppuutt(()) invalidates yytext and yyleng,
- unless the %%aarrrraayy directive or the --ll option has been
- used.
-
- Use of unput() to push back more text than was matched can
- result in the pushed-back text matching a beginning-of-
- line ('^') rule even though it didn't come at the begin-
- ning of the line (though this is rare!).
-
- Pattern-matching of NUL's is substantially slower than
- matching other characters.
-
- Dynamic resizing of the input buffer is slow, as it
- entails rescanning all the text matched so far by the cur-
- rent (generally huge) token.
-
- _f_l_e_x does not generate correct #line directives for code
- internal to the scanner; thus, bugs in _f_l_e_x_._s_k_l yield
- bogus line numbers.
-
- Due to both buffering of input and read-ahead, you cannot
- intermix calls to <stdio.h> routines, such as, for exam-
- ple, ggeettcchhaarr(()),, with _f_l_e_x rules and expect it to work.
- Call iinnppuutt(()) instead.
-
- The total table entries listed by the --vv flag excludes the
- number of table entries needed to determine what rule has
- been matched. The number of entries is equal to the num-
- ber of DFA states if the scanner does not use RREEJJEECCTT,, and
- somewhat greater than the number of states if it does.
-
- RREEJJEECCTT cannot be used with the --ff or --FF options.
-
- The _f_l_e_x internal algorithms need documentation.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Version 2.4 November 1993 13
-
-
-