Name format string is used for formatting Name, Group name and Collapsed text icon of the text range.
In format string all entries of kind: %sINT, %eINT, %psINT, %peINT will be replaced with the token, index of which will be calculated on the assumption of the format expression.
Letters specifies the reference index.
%s - start of current text range
%e - end of current text range
%ps - start of parent text range
%pe - end of parent text range
There may be several entries of "p" metacharacter. For example, %pps - means start of parent of parent text range.
Also after these specifiers you may use "L" or "Z"
L - defines string from line start to the token (including this token);
Z - defines string from the token to the end of line (including this token);
For example:
%SL2 - defines string from line start to second token from text range start.
Token index will be calculated as reference index minus Index (INT).
Extension of name formatting syntax
Syntax
%(S|E)P*(L|Z)?[0-9]+
is expanded to
%(S|E)P*([\[]<token>[\]]<offset>?)?
where <token> is a specific token that is "searched from the specified starting point (S for first token in the range , or E for the last token) towards the respective range end (up- or downwards). The search-direction is kept in the variable "rngdir" which is set in the "S" , "E" decision.
range-start = "for",
range-end = "end"
then "...%s[to] ..." will skip forward to the token "to" (with index 4).
The token values are searched on a "asis" basis, there is no case-insensitivity option yet.
A "numeric number following the token value will define an <offset> relative to the found token.
For this clause, the variable "idx" is not set by taking the static numeric value as in "...%s2 ..." , instead the "found token index" is kept.
For "%S..." the search starts at idx=0 up to max 28. ---> rngdir = +1;
For "%E..." the search starts at idx=28 downto min 0. ---> rngdir = -1;
The options L or Z introduced in V2.35 will not combine with the new (range) specifying options --> somebody else may find a use for such extended ranges.
Notes: Avoid to search for tokens that can occur at multiple places (for example a ";" between statements).
The above syntax is simple as it allows to identify the
block-start-tokens "for x = 1 to 12 do"
block-body anything after block-start tokens up to
block-end-tokens "end ;"
but many syntax formats do not trivially support this separation.
The current implementation does not provide the information where "block-start", "block-body" and "block-end" are beginning/ending.
A "%B0..." for the "block-body" portion and a "ignore block-body tokens" option may be nice !?
b) any such clause (either absolute or given by token value) can "start a token range" by additionally specifying:
The first form uses the static index specification to define the end-range:
"%s0~s3" results in "for x = 1" (tokens 0, 1, ... 3)
The 2nd form uses the new syntax to "search for an end-token beginning at the starting range index (idx) up- or down-wards.
"%s0~s[do]" results in "for x = 1 to 12 do" (tokens 0, 1, ... 6) if a search is not satisfied, the complete range up to "e0" is taken. Because of the same "S", the search starts with "TagStr[idx]" ...
"s0~e[do]" results in the same string, but starts at the final "end" of the block and scanning downwards.
Caution: This may produce WRONG results if nested loops are scanned !
I could not find a valid representation of "range-start" token-streams, the range-body alone and/or the range-end token-stream.
Such information may be helpful to better display blocks and/or collapse display of the "block-body" alone.
The 3rd form is an abbreviation where the S/E indicators are taken to be identical as the starting point
S1~[do]1" results in "x = 1 to 12" (tokens 1, 2, ... 5)
The <offset> "1" will here skip back by 1 from the found token "do". The range-end is kept in the variable "to_idx".
The "token-value" to search for can not be whitespace #00..#20. Leading and trailing whitespace withing the "...[vvvvvvv] ..." enclosed by [ and ] characters sequence is removed before searching. the "vvvvvv" can contain escaped characters like "... [\]] ..." to allow "[" and/or "]" to be part of the value. The \r, \n, \f ...escapes are not supported here.
The token accumulation simply (?) takes all tokens from "idx" ... "to_idx" and builds a string by appending all tokens with ONE " " (blank) as separating delimiter. There is no process to keep the original token positions within the source line(s) and any whitepace including cr/lf's there. This may be an addition but I currently do not see a need for it.
c) "ranges as specified above may accumulate many tokens and it may be desirable to "limit" the result string.
This can be done by using another operand syntax
In all three forms the "~" is immediately followed by a numeric value which is interpreted as "maximum number of tokens in the substituted string", if the range takes MORE than this maximum The value is internally kept in the variable "rngmax" below. When the result string is accumulated (taking all tokens between "idx" up- resp. down-to "to_idx") the number of appended tokens can not go beyond "rngmax". If this happens the result will be created in the form "t-1 t-2 -- t-max ..." with the ellipsis string " ..." appended.
Hans L. Werschner, Oct '07
Example:
function TMyClass.MyFunction ...
we have rule:
0| .
1| [any Identifier]
2| function
reference start of text range will point to ".", so
format string
"%s2 => %s1.%s-1"
results
"function => TMyClass.MyFunction"