home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 10 Tools
/
10-Tools.zip
/
lifeos2.zip
/
LIFE-1.02
/
EXAMPLES
/
SUPERLIN
/
SUPERLIN.DOC
< prev
next >
Wrap
Text File
|
1996-06-04
|
55KB
|
1,958 lines
____________________________________________________________________________
SuperLint
____________________________________________________________________________
A generator of C source analyzers
____________________________________________________________________________
Author : Arnaud Venet
Date : Feb 19th 1994
Modified : Mar 4th 1994
Version : 0.3
____________________________________________________________________________
(C) Digital Equipment Corporation 1993 - 1994
____________________________________________________________________________
This document gives a rapid description of the SuperLint package.
SuperLint is a real application written in LIFE allowing the user to build
lint-like tools for K&R and ANSI C programs. The user can enter rules
written in a simple C-like language which will be automatically applied or,
if he is ambitious, write a real program in LIFE, using directly the
syntactic structures provided by the integrated C parser.
This tool is a prototype. All comments, suggestions and bug reports are
welcome.
____________________________________________________________________________
An extended example of use is in the file 'rules.lf' of the package
____________________________________________________________________________
Overview
========
SuperLint is built on top of a parser which produces a rich representation
of C code (a fully annotated syntax tree). Building an analyzer consists
in writing a set of rules that use the syntactic structures produced by the
parser.
In the most general case, the difficulty is that the user must write his
own procedure exploring the syntactic tree and collecting the needed
information, which may be very tedious.
It turns out that in most cases the verifications are local in the sense
that the user only needs the information reachable from one or more
particular nodes. A characteristic example is an analyzer that checks a C
program with regard to a set of style rules. In this case a rule only needs
the lexical information contained in the node to which it applies.
Therefore we provided SuperLint with a C-like language which allows the
user to write easily a set of verification rules, each rule being only
applicable to some syntactic structures. Then a procedure can be called
which automatically explores all the nodes of the tree and for each one
invokes the appropriate rules.
But actually SuperLint is much more than a simple C-code verifier. It is
a real programming environment entirely dedicated to the manipulation of
C-programs, ranging from stylistic verifications to code rewriting or static
analysis. The C-like rules specification language is expanded into LIFE
code, and it is possible to mix in a same file parts of code written in
this language and real LIFE code. Thus one can take advantage of all the
functionalities and tools offered by Wild_LIFE, like the X Window interface
for example. A typical use consists in writing libraries of functions and
procedures which involve complex manipulations in LIFE and the analyzers
specifications in C-like code. We illustrate this with a small toolkit that
is provided with SuperLint and which is systematically imported by each
analyzer. Moreover modules are fully supported. It is thus possible to
construct complete encapsulated programming libraries devoted to the purpose
of the SuperLint user.
We will first give a brief description of the syntactic structures produced
by the parser, then we will explain how to write analyzers.
I. Syntactic structures
=========================
0 - Notations
-------------
The syntactic and lexical structures are represented by psi-terms,
that is, by flexible records. The record type is called the "root sort"
of the psi-term. The field names are called "features". We use the
following notation to describe the psi-term with root sort PsiTerm and
the features Feature1, ..., FeatureN :
PsiTerm
-------
o Feature1 :
description of Feature1
.
.
.
o FeatureN :
description of FeatureN
Several symbols separated by commas may sometimes be written in place of
'PsiTerm' above. This simply means that several psi-terms with different
root sorts share the same features with identical meanings.
In the following we will use the term 'structure' to denote psi-terms.
The parser was built upon the grammar specified in Kernighan & Ritchie :
"The C programming language", 2nd edition.
1 - Tokens
----------
All tokens have the common features :
o file :
the name of the file in which the token was found
o line :
the line number of the token in its file
o column :
the column number of the token in its file
o white_spaces :
a string containing all the white spaces (blank, tab, newline)
preceding the current token
o previous :
the token preceding the current one
o next :
the token following to the current one
o first :
the first token in the file
o last :
the last token in the file
The tokens are :
nothing
-------
The empty token. Actually, the value of the 'previous' feature of the
first token and the 'next' feature of the last token in a file.
identifier
----------
o id :
the string representing the identifier
characters_string
-----------------
o string :
the string itself, all escape characters being processed
o extended :
a boolean telling whether the string is extended (i.e. preceded
by an uppercase 'L' letter)
character
---------
o char :
the string representing the character, all escape characters
being processed
o extended :
a boolean telling whether the character is extended (i.e. preceded
by an uppercase 'L' letter)
number
------
o num :
the description of the number. It may be :
- int
---
o value :
the string representing the integer
o base :
the base of the integer. It may be
- octal
- decimal
- hexadecimal
o signed :
a boolean telling whether a 'S' suffix was present
o long :
a boolean telling whether a 'L' suffix was present
- float
-----
o integer_part :
the string representing the integer part of the float
o decimal_part :
the string representing the decimal part of the float
("0" if none)
o exponential_part :
It is
exponential_part
----------------
o value :
the string representing the exponential part of the float
("0" if none)
o sign :
the sign of the exponential part. It may be :
- positive
- negative
o type :
depending on the eventual suffix. It may be :
- float
- double
- long_double
The vararg token : '...'
------------------------
An operator symbol
------------------
It may be : '+', '-', '*', '/', '<', '>', '&', '|', '!', '%', '^',
'~', '<<', '>>', '=', '+=', '-=', '*=', '/=', '%=', '&=',
'^=', '|=', '<<=', '>>=', '!=', '?', '.', '->'
A separator symbol
------------------
It may be : ',', ';', ':', '(', ')', '{', '}', '[', ']'
A keyword
---------
It may be :
auto break case char
const continue default do
double else enum extern
float for goto if
int long register return
short signed sizeof static
struct switch typedef union
unsigned void volatile while
NOTE : When you access a token name from within an analyzer the root sort
---- of a token is always converted into a string because some
symbols like '(', '{', ...etc are difficult to handle explicitly
in LIFE code (parse problems). In all cases you'd better use the
'string2id' function (see section 3, part II).
2 - Types
---------
We note simple types :
- void, char
- the numerical types
- the structures and unions
- the enumerations
The complex types are the pointers, arrays, tags and types between
parentheses.
All simple types have these common features :
o qualification :
It may be
- const
- volatile
- nothing
The case nothing occurs when no qualifier was specified.
o store_class :
It may be
- auto
- register
- static
- extern
- nothing
The case nothing occurs when no store class was specified.
o specifiers :
it is the list of the lexical type specifiers. They are keyword tokens
like 'int', 'const', ...etc and also :
- type_alias
----------
It represents an identifier previously aliased to a type in a 'typedef'
statement. Its only feature is :
o name : the corresponding identifier token
- struct_name, union_name, enum_name
----------------------------------
It represents an identifier designing a structured or enumerated type.
Its features are :
o name :
the identifier token representing the name of the complex type
o keyword :
the keyword coming with the identifier. It is one of the tokens :
struct, union, enum
- struct, union, enum
-------------------
It is the case of an anonymous struct definition. See below for further
details.
2.1 Simple types
----------------
void
----
char, int, short_int, long_int, float, double, long_double
----------------------------------------------------------
o signed :
a boolean which is true when type is signed
o inserted :
this feature is optional. It's present for a type 'int' which has been
inserted by the parser when he encounters a function declaration without
a return type (i.e. for external and local declarations and for function
definitions).
struct, union
-------------
o keyword :
one of the tokens : struct, union
o body :
the body of the struct or union type specification. It is
struct_body, union_body
-----------------------
o left_brace : the token "{" beginning the body definition
o fields :
It is a structure of root sort 'fields' containing as features
the names of all the fields of the C structure or union. These names
are strings. The name of an anonymous tag is "tag*N" where N is an
integer (it tells that it's the Nth anonymous tag encountered in the
structure).
It has the extra feature 'first_member' which gives the first field
in the C structure or union.
The structure of a field is :
field
-----
o name :
the identifier token of the field name or the structure :
anonymous_tag
-------------
o id : the tag name generated by the parser
in the case of an anonymous tag.
o type :
the type of the field
o initialization :
an initialization expression (see section 6.7 for further
details)
o separator :
the separator token at the end of the field specification
(namely a semicolon)
o first :
the first field in the structure
o last :
the last field in the structure
o previous :
the previous field (if any, nothing otherwise)
o next :
the next field (if any, nothing otherwise)
o right_brace : the token "}" ending the body definition
enum
----
o keyword :
the token enum
o body :
the body of the enum type specification. It is
enum_body
---------
o left_brace :
the token "{" beginning the body definition
o enumerators :
It is a structure of root sort 'enumerators' containing as features
the names of all the enumerators defined in the enum statement.
These names are strings. It also has the feature 'first_member'
which gives the first enumerator specified.
The structure of an enumerator is :
enumerator
----------
o name :
the identifier token of the enumerator
o initialization :
an initialization expression (see section 6.7 for further
details)
o separator :
the separator token at the end of the field specification
(namely a comma) or nothing
o initialization :
an initialization expression (see section 6.7 for further
details) setting the value of the enumerator
o first :
the first enumerator in the enumeration
o last :
the last enumerator in the enumeration
o previous :
the previous enumerator (if any, nothing otherwise)
o next :
the next enumerator (if any, nothing otherwise)
o right_brace : the token "}" ending the body definition
2.2 Complex types
-----------------
protected_type
--------------
Case of a type between parentheses
o left_parenthesis :
the token "("
o type :
the type between parentheses
o right_parenthesis :
the token ")"
pointer
-------
o to :
the type the pointer points to
o star :
the token "*" appearing in the type specification
o qualification :
the pointer's qualification, the same as for simple types.
array
-----
o of :
the type of the elements of the array
o dimensions :
the dimensions of the array. It is
dimensions
----------
o dimensions_number :
the number of dimensions of the array
o features 1..dimensions_number :
they are the dimensions of the array.
A dimension is
dimension
---------
o left_bracket :
the token "["
o size :
the constant expression defining the size of the dimension
or 'nothing' if unspecified.
o right_bracket :
then token "]"
tag
---
o type :
the type of the tag
You must note that in this case the size of the tag is put in the
initialization feature of the associated declarations (see section 6.7).
This enables an homogeneous presentation of the declarations.
function
--------
o return_type :
the return type of the function
o left_parenthesis :
the token "(" preceding the parameters specification
o parameters :
It is 'nothing' (if no parameter is specified) or
parameters
----------
o parameters_number :
number of parameters
o vararg :
this boolean is true when the function accepts a variable
number of arguments (in this case, 'parameters_number' is the number
of parameters explicitly declared).
o suspension_points :
this feature exists only if 'vararg' is true. It contains the token
'...' which appears at the end of the parameters specification.
o the features 1..parameters_number :
Each feature contains a parameter specification, namely
parameter
---------
o name :
the corresponding identifier token if any, or an 'anonymous'
structure :
anonymous
---------
o file :
the name of the file in which the parameter's specification
occurs
o line :
the corresponding line number in the previous file
o white_spaces :
the empty string : ""
This case is related to ANSI function prototypes where the
types of the arguments appear without any name.
o type :
the parameter type
or
label_parameter
---------------
It only occurs in K&R C function definition : only the names of
parameters appear in the function head. Their types are specified
at the beginning of the function body.
o label :
the corresponding identifier token
Both kinds of parameters have the common features :
o separator :
the comma following the parameter, if not the last one,
'nothing' otherwise (for an explication of the semicolon
case see 'parameters_declarations' in section 3.2).
o previous :
the previous parameter, if any, 'nothing' otherwise
o next :
the next parameter, if any, 'nothing' otherwise
o first :
the first parameter in the specification
o last :
the last parameter in the specification
o right_parenthesis :
the token ")" ending the parameters declaration
3 - Type and Function Definitions
---------------------------------
3.1 Type definitions
--------------------
All type definitions have the common features :
o previous :
the previous statement, i.e., type definition, declaration or
function definition depending on the location of the current type
definition (external, local declaration)
o next :
the next statement
o first :
the first statement in the type definition's scope
o last :
the last statement in the type definition's scope
type_definition
---------------
o keyword :
the token typedef
o type :
the type being defined
o name :
the corresponding identifier
o separator :
the comma or semicolon token ending the type definition
struct_definition, union_definition, enum_definition
----------------------------------------------------
o keyword :
a keyword token among struct, union, enum
o name :
the corresponding identifier
o body :
the body of the compound type (see previous section)
3.2 Function definitions
------------------------
function_definition
-------------------
o type :
a 'function' type describing the function head
o name :
the corresponding identifier token
o style :
it is 'old' for a K&R's style function definition, 'modern' for
an ANSI definition
o parameters_declarations :
it is
parameters_declarations
-----------------------
o ParameterDeclaration1
.
.
.
o ParameterDeclarationN
o parent_declarations :
the 'toplevel' declarations (see section 4.1)
o first_declaration
The features names ParameterDeclarationX are string corresponding to the
parameters names (if any). They contain :
parameter_declaration
---------------------
o parameter :
a 'parameter' structure (see 'function' type, section 2).
For K&R function definitions this structure refers to the
corresponding declaration between the function's head and body.
o previous :
the previous parameter declaration if any, 'nothing' otherwise
o next :
the next parameter declaration if any, nothing otherwise
o first :
the first parameter declaration
o last :
the last parameter declaration
o scope :
it is
function_head
-------------
o function_name :
a string representing the function's name where the parameter
declaration appears
The feature first_declaration contains the first parameter declaration
if any, 'nothing' otherwise.
Note that a 'parameter_declaration' structure can be inserted by the
parser when a parameter figures in the head of a K&R function definition
but is not declared thereafter. In this case the type of the parameter
is by default set to 'int'.
o body :
the function body. It's a 'block' instruction (see section 5)
4 - Declarations
----------------
4.1 The toplevel declarations
-----------------------------
This structure is the root of the syntactic tree :
toplevel
--------
o parent_declarations :
nothing
o type_definitions :
a term of root sort 'type_definitions'. The features are strings
representing the names of the type aliases. Each feature contains a
'type_definition' term (see section 3.1)
o struct_definitions :
a term of root sort 'struct_definitions'. The features are strings
representing the names of the structures defined. Each feature contains
a 'struct_definition' term (see section 3.1)
o union_definitions :
on the same model as struct_definitions
o enum_definitions :
on the same model as struct_definitions
o struct_declarations :
a term of root sort 'struct_declarations'. The features are strings
representing the names of the structures declared. Each feature contains
a 'struct_declaration' term (see section 4.3)
o union_declarations :
on the same model as struct_declarations
o enum_declarations :
on the same model as struct_declarations
o external_declarations :
a term of root sort 'external_declarations'. The features are strings
representing the names of the identifiers declared. Each feature contains
an 'external_declaration' (see section 4.3)
o function_definitions :
a term of root sort 'function_definitions'. The features are strings
representing the names of the functions defined. Each feature contains
a 'function_definition' term (see section 3.2)
o files :
it is
files
-----
o files_number :
the number of files that have been parsed when SuperLint was called
o 1 .. file_number :
each feature contains a 'file_info' structure, namely :
file_info
---------
o name :
a string representing the name of the file as he was typed on
the command line when SuperLint was called
o first_token :
the first token in the file
4.2 Block declarations
----------------------
These are the declarations appearing at the beginning of a block { ... } :
block_declarations
------------------
o parent_declarations :
it may be a structure
- 'parameters_declarations' for a function body (see section 3.2)
- 'block_declarations' for a nested block
o local_declarations :
a term of root sort 'local_declarations'. The features are strings
representing the names of the identifiers declared in the block.
Each feature contains a 'local_declaration' (see section 4.3)
o type_definitions :
the same as for toplevel
o struct_definitions :
the same as for toplevel
o enum_definitions :
the same as for toplevel
o union_definitions :
the same as for toplevel
o struct_declarations :
the same as for toplevel
o enum_declarations :
the same as for toplevel
o union_declarations :
the same as for toplevel
4.3 Single declarations
-----------------------
All single declarations have the common features :
o scope :
the declarations which the declaration belongs to. It may be
'toplevel' or 'block_declarations'
o previous :
the previous statement (declaration, function definition, type definition,
according to the context) in the declaration's scope
o next :
the next statement in the declaration's scope
o first :
the first statement in the declaration's scope
o last :
the last statement in the declaration's scope
Logically the parameter_declaration structure should figure here.
However for clarity we described it with the function_definition structure.
The single declarations are :
external_declaration, local_declaration
---------------------------------------
o type :
the type of the declared identifier
o name :
the corresponding identifier token
o initialization :
the initialization expression if any (see section 6.7), 'nothing'
otherwise
o separator :
a comma or semicolon token ending the declaration
struct_declaration, union_declaration, enum_declaration
-------------------------------------------------------
o keyword :
a token among struct, union, enum
o name :
the corresponding identifier token
o semi_colon :
the semicolon token ending the declaration
Several verifications are performed at parse time :
. an incomplete struct or union declaration without a corresponding
struct or union definition in the same scope is stated as an error
. an enum declaration which does not correspond to a visible definition
is also stated as an error
5 - Instructions
----------------
All instructions have the common features :
o scope :
the scope of the current instruction. It may be
- function_body(FunctionName) where FunctionName is a string.
This form occurs for a block instruction which is the body of the
function FunctionName (see section 3.2)
- block : the block which the current instruction belongs to
- then_body : it is
then_body
---------
o instruction :
the 'if' instruction which executes the current instruction
when the expression in the condition is not zero
- else_body : it is
else_body
---------
o instruction :
the 'if' instruction which executes the current instruction
when the expression in the condition is zero
- labeled_instruction, case, default, switch, while, for, do_while :
the instruction which embodies the current one
o next :
the next instruction in the current block if any, 'nothing' otherwise
o previous :
the previous instruction in the current block if any, 'nothing' otherwise
o first :
the first instruction in the current block
o last :
the last instruction in the current block
NOTE : a single instruction which is the body of a labeled instruction,
---- case, default, if, switch, while, for or do..while instruction has
no previous nor next instruction. In these cases the features previous
and next contain 'nothing' while the features first and last contain
the instruction itself.
5.1 Labeled instructions
------------------------
labeled_instruction
-------------------
o label :
the identifier token corresponding to the instruction's label
o colon :
the colon token separating the label and the instruction
o body :
the labeled instruction
case
----
o keyword ;
the token 'case'
o condition :
the expression of the case statement
o colon :
the colon token separating the condition and the instruction
o body :
the corresponding instruction
default
-------
o keyword ;
the token 'default'
o colon :
the colon token separating the keyword 'default' and the instruction
o body :
the corresponding instruction
5.2 Compound instruction
------------------------
block
-----
o left_brace :
the "{" token corresponding to the brace beginning the block
o block_declarations :
a 'block_declarations' structure corresponding to the local
declarations and definitions made within the block (see section 4.2)
o instructions :
the list of instructions of the block (see section II for a description
of the list data type)
o right_brace :
the "}" token corresponding to the brace ending the block
5.3 Expression instruction
--------------------------
expression_instruction
----------------------
o expression :
the corresponding expression (see section 6)
o semi_colon :
the semicolon token ending the instruction
5.4 Selection instructions
--------------------------
if
--
o keyword :
the token 'if'
o left_parenthesis :
the token "(" beginning the condition
o condition :
the corresponding expression (see section 6)
o right_parenthesis :
then token ")" ending the condition
o then_body :
the corresponding instruction
o else_keyword :
the token 'else' if any, 'nothing' otherwise
o else_body :
the instruction corresponding to the 'else' case if any, 'nothing'
otherwise
switch
------
o keyword :
the token 'switch'
o left_parenthesis :
the token "(" beginning the condition
o condition :
the corresponding expression (see section 6)
o right_parenthesis :
then token ")" ending the condition
o body :
a block instruction containing a sequence of 'case' statements.
There is currently no verification of the coherence of this block
at parse time.
5.5 Iteration instructions
--------------------------
while
-----
o keyword :
the token 'while'
o left_parenthesis :
the token "(" beginning the loop condition
o condition :
the corresponding expression (see section 6)
o right_parenthesis :
then token ")" ending the loop condition
o body :
the instruction corresponding to the body of the loop
do_while
--------
o do_keyword :
the token 'do'
o body :
the instruction corresponding to the body of the loop
o while_keyword :
the token 'while'
o left_parenthesis :
the token "(" beginning the loop condition
o condition :
the corresponding expression (see section 6)
o right_parenthesis :
then token ")" ending the loop condition
o semi_colon :
the semicolon token endin the instruction
for
---
o keyword :
the token 'for'
o left_parenthesis :
the token "(" beginning the for
o initialization :
the initialization expression of the loop if any (see section 6),
'nothing' otherwise
o first_semi_colon :
the semicolon token following the initialization
o condition :
the expression corresponding to the loop condition if any (see section 6),
'nothing' otherwise
o second_semi_colon :
the semicolon token following the condition
o step :
the last expression of the 'for' statement if any (see section 6),
'nothing' otherwise
o right_parenthesis :
the token ")" ending the for
o body :
the instruction corresponding to the body of the loop
5.6 Jump instructions
---------------------
continue, break
---------------
o keyword :
the corresponding keyword token
o semi_colon :
the semicolon token ending the instruction
goto
----
o keyword :
the token 'goto'
o label :
the identifier token corresponding to the goto label
o semi_colon :
the semicolon token ending the instruction
return
------
o keyword :
the token 'return'
o value :
the returned expression
o semi_colon :
the semicolon token ending the instruction
5.7 Void instruction
--------------------
void_instruction
----------------
o semi_colon :
the semicolon token in which this instruction consists
6 - Expressions
---------------
6.1 Primary expressions
-----------------------
Primary expressions are the identifier, number, characters_string and
character tokens. We must add to this list the parenthesized expressions
which have the following structure :
protected_expression
--------------------
o left_parenthesis :
the token "("
o expression :
the parenthesized expression
o right_parenthesis :
the token ")"
All non primary expressions have the common features :
o operator :
an operator token corresponding to the expression's one if any,
'nothing' otherwise
o mode :
it may be :
- prefix
- postfix
- infix
- special
since the expression's operator is prefix, postfix, infix or has
a special behaviour. When describing an operator we will systematically
give its mode.
6.2 Unary expressions
---------------------
The root sort of unary expressions is the corresponding unary operator's
symbol. They have one common feature :
o expression :
it contains the expression which is applied to the unary operator
The unary expressions are
Incrementation expression
-------------------------
The operator is '++' or '--'. Their mode feature contains 'prefix' or
'postfix', according to the position of the operator.
Unary arithmetic expression
---------------------------
The operator is '+', '-' or '~' and the mode is 'prefix'.
Unary logical expression
------------------------
The operator is '!' and the mode is 'prefix'.
Pointer reference
-----------------
The operator is '*' and the mode is 'prefix'.
Address value
-------------
The operator is '&' and the mode is 'prefix'.
Size of expression
------------------
The operator is 'sizeof' and the mode is 'prefix'.
6.3 Binary expressions
----------------------
The root sort of binary expressions is the corresponding binary operator's
symbol.
The binary expressions are
Regular binary expressions
--------------------------
Their mode is always 'infix'. They have two common features :
o left_expression :
it contains the expression on the left side of the operator
o right_expression :
it contains the expression on the right side of the operator
The operator is one of : '+', '-', '*', '/', '<', '>', '|', '%', '^', '<<',
'>>', '=', '+=', '-=', '*=', '/=', '%=', '&=', '^=',
'|=', '<<=', '>>=', '!=', '.', '->', ','
The array reference is a binary expression, but it has a particular
representation since its operator is discontinuous : [ ... ]
array_reference
---------------
o array :
the expression corresponding to the array
o mode :
'special'
o operator :
'nothing'
o left_bracket :
the token "[" preceding the element's index in the array
o index :
the expression corresponding to the element's index in the array
o right_bracket :
the token "[" following the element's index in the array
6.4 Conditional expression
--------------------------
The root sort of this expression is '?'. The feature mode contains 'special'
and the feature operator the token of the corresponding question mark.
Its features are :
o condition :
the expression corresponding to the test
o then :
the resulting expression if condition evaluates to non zero value
o colon :
the colon token of the expression
o else :
the resulting expression if condition evaluates to zero
6.5 Function call
-----------------
function_call
-------------
o call :
the expression corresponding to the function being called
o left_parenthesis :
the token "(" beginning the arguments list
o arguments :
the arguments of the call. It is
arguments
---------
o arguments_number :
the number of arguments
o 1 .. arguments_number :
the arguments (if arguments_number is not 0). Each feature
contains an expression. All arguments have the common features :
o previous :
the previous argument
o next :
the next argument
o first :
the first argument
o last :
the last argument
o right_parenthesis :
the token "(" ending the arguments list
6.6 Cast expression
-------------------
cast
----
o left_parenthesis :
the token "(" at the beginning of the type specification
o type :
the cast type (see section 2)
o right_parenthesis :
the token ")" at the end of the type specification
o expression :
the cast expression
6.7 Initialization expressions
------------------------------
They are expressions which set the initial value of an identifier within
its declaration.
setting
-------
o keyword :
a token ":" (for the size of a tag) or a token "=" for the classical
initialization of an identifier in its declaration.
o setting :
the initalization expression. It's either a constant expression
either a complex initialization (e.g. the initialization of the elements
of an array). In the latter case the expression is
complex_initialization
----------------------
o left_brace :
the token "{" beginning the initialization
o body :
the initialization expression (a constant expression or a
sequence of expressions with perhaps complex initializations inside)
o right_brace :
the token "}" ending the initialization
II. Building analyzers
======================
1 - Principles
--------------
A shell script named 'slc' is provided with the SuperLint package. It
automatically calls the rules compiler and produces an analyzer from
one or several files containing the analyzer's specification.
After creating an analyzer from a rules file, you can run the analyzer
on C files using the 'sl' shell script. See the README file in the
SuperLint package for complete information concerning the use of sl and
slc.
2 - Description of the rules specification language
---------------------------------------------------
2.1 Analyzer's design
---------------------
An analyzer's specification consists of one or several files which may contain
verification rules, functions or procedures written in the C-like language
as well as pure LIFE code.
The entry point of the analyzer is a procedure (see section 2.4) called
'main' which is invoked when the analyzer is launched with 'sl'. The parse
of the C files specified on the command line is automatically effected and
the result stored in the global variable named 'syntactic_tree' which
contains a 'toplevel' structure (see section 4.1).
When the procedure 'main' is absent, SuperLint launches an automatic
verification of all the rules of the analyzer's specification. But if you
write your own main procedure you have to handle by yourself all the
verification operations.
2.2 Structures and aliases
--------------------------
A structure is designed by an alphanumerical name always beginning by
a lower case letter. You don't need to declare structures like in C since
they don't have predefined fields, i.e. you can add as many features
to your structure as you want. You access the features of a structure with
the '.' operator, just like in C. When you try to get the value of a
nonexisting feature an error occurs.
You add a new feature to a structure with the 'set_feature' procedure.
This assignment is destructive, in the sense that if you set a preexisting
feature with 'set_feature', the previous value of the feature is lost.
Note that a structure ought not to have any feature. In the following we
will name identifiers such featureless structures.
The basic way to pass arguments or to denote particular structures in
SuperLint is to use aliases. An alias is an alphanumerical name beginning
by an uppercase letter or an underscore. You don't need to declare aliases.
You set an alias value using the '=' operator. Once declared an alias cannot
be reassigned. The scope of an alias is the rule/procedure/function in which
it is declared. It is destroyed when exiting the scope.
An alias is much like a local variable in C, but with restrictions in the way
is is assigned.
Example :
MyStructure = my_structure,
set_feature(name, "Mr. Struct", MyStructure),
set_feature(foo, bar, MyStructure),
X = MyStructure.foo,
set_feature(foo, nothing, MyStructure)
These operations declare that the alias MyStructure designs the structure
'my_structure', which has two features : name and foo with the initial
values "Mr. Struct" and 'bar'. X is an alias for the feature foo of
MyStructure. At this moment the value of X is 'bar'. But after the destrcutive
reassignment : 'set_feature(foo, nothing, MyStructure)' the value of X is
'nothing'. The alias X doesn't design the structure 'bar' but the feature
foo of MyStructure independently of its content.
Note for the purpose of the LIFE programmer :
---------------------------------------------
Structures are actually psi-terms and aliases logical variables. The
'set_feature' operation is the non-backtrackable assignment '<<-'.
It's not dangerous to use it for all the syntactic structures produced by
the parser are persistent.
It will not cause any problem as long as the user who programs with the C-like
language, programs *effectively* like in C and doesn't mix both imperative and
declarative styles.
2.3 Rules
---------
A rule is a specific procedure which is called for each node during the
automatic scanning of the syntactic tree.
A rule specification has the following form :
[ rule ] <RULE NAME> (<NODE NAME>, <DOMAIN NAME>) :->
{
entry_test(<ENTRY TEST>),
<RULE BODY>
}.
o <RULE NAME> is an identifier. Two different rules cannot have the same
name
o <NODE NAME> is an alias designing the syntactic node currently scanned
o <DOMAIN NAME> is an identifier designing a set of rules. This can
be useful when using libraries of rules since it's possible to tell
SuperLint to perform verifications only for particular rules sets.
See section 3 for further details.
o <ENTRY TEST> is a logical expression selecting the nodes to which the
rule must apply. If the expression is true the rule's body is executed
else the verification process continues with another rule or another node.
See section 2.6 for further details on logical expressions.
o <RULE BODY> is a sequence of instructions constituting the body of the
rule. See section 2.5 for further details on instructions.
NOTE : the keyword 'rule' before the rule's name is optional.
----
2.4 Procedures and functions
----------------------------
Procedures and functions behave much like in C. Their syntax is :
procedure <PROCEDURE NAME> (<PARAMETER 1>, ..., <PARAMETER N>) :->
{
<PROCEDURE BODY>
}.
o <PROCEDURE NAME> is an identifier.
o <PARAMETER 1>, ..., <PARAMETER N> are aliases designing the parameters
of the procedure.
o <PROCEDURE BODY> is a sequence of instructions constituting the body of
the procedure. See section 2.5 for further details on instructions.
function <FUNCTION NAME> (<ARGUMENT 1>, ..., <ARGUMENT N>) :->
{
<FUNCTION BODY>
}.
o <FUNCTION NAME> is an identifier.
o <ARGUMENT 1>, ..., <ARGUMENT N> are aliases designing the arguments
of the procedure.
o <FUNCTION BODY> is a sequence of instructions constituting the body of
the function. If you forget to specify a return value for the function,
the identifier 'unknown' is returned. See section 2.5 for further details
on instructions.
NOTE : If you call a function or a procedure with aliases which have not
---- been previously set you are in deep trouble. Currently there is no
runtime verification insuring that all the arguments of a procedure
or function call are defined.
Note for the purpose of the LIFE programmer :
---------------------------------------------
Functions are actually translated into LIFE functions and procedures into
predicates.
2.5 Instructions
----------------
An instruction may be :
o A block instruction : { ... }
It contains a sequence of instructions, two instructions being separated
by a comma.
Example :
{
X = f(1, name),
Y1 = X.field1,
Y2 = X.field2,
p(Y1.foo, Y2)
}
WARNING : Aliases defined within a block are NOT local to this block.
------- Remember that the scope of an alias is the rule/function/procedure
in which it appears.
o An alias definition (see section 2.2) :
<ALIAS NAME> = <EXPRESSION>
See section 2.6 for further details on expressions.
o A procedure call :
<PROCEDURE NAME> (<PARAMETER 1>, ..., <PARAMETER N>)
where <PARAMETER 1>, ..., <PARAMETER N> are expressions.
o A 'donothing' instruction which does nothing.
o A 'return' statement :
return <EXPRESSION>
where <EXPRESSION> is an expression.
It can only occur in a function body for it is intended to specify the
return value of the function.
WARNING : Unlike in C the function doesn't return after the 'return' has
------- been executed. If there are other instructions following the
'return' instruction they are executed in turn.
o An 'if' statement :
if(<CONDITION>) then
<FIRST CASE>
[ else
<SECOND CASE> ]
where <CONDITION> is a logical expression an <FIRST CASE> and <SECOND CASE>
are instructions.
When <CONDITION> is true <FIRST CASE> is executed, <SECOND CASE> otherwise.
The <SECOND CASE> specification is optional.
WARNING : Because of the method used to parse LIFE (operator's precedence)
------- it is difficult to parse correctly nested if...then...else
statements. Currently if one of the <FIRST CASE> or <SECOND CASE>
instructions needs to be an if statement, it MUST be protected
in a block instruction.
Example :
if(...) then {
if(...) then
...
} else {
if (...) then
...
else
...
}
Moreover the symbol 'if' being defined as an operator of the
SuperLint rules language, you must put in between parentheses
if you want to use it as an identifier (when handling an 'if'
syntactic node produced by the parser, for example). Otherwise
it will be parsed incorrectly by Wild_LIFE.
This remark remains true for the 'return' symbol.
o A 'switch' statement :
switch(<EXPRESSION>) {
case(<CASE 1>, <INSTRUCTION 1>),
.
.
.
case(<CASE N>, <INSTRUCTION N>) [,
default(<DEFAULT INSTRUCTION>) ]
}
where <EXPRESSION> is an expression and <INSTRUCTION 1> ... <INSTRUCTION N>
and <DEFAULT INSTRUCTION> are instructions.
The <CASE 1> .. <CASE N> are either expressions or multiple choice cases,
namely :
{ <EXPRESSION 1>, ..., <EXPRESSION M> }
where <EXPRESSION 1>, ..., <EXPRESSION M> ar expressions.
When executing this instruction <EXPRESSION> is first evaluated and
its value V compared with each case.
If i is the least integer in 1..N such that V is equal to the value of
<CASE i>, if this one is an expression, or V is equal to the value of one
of the <EXPRESSION j>, if <CASE i> is a multiple choice case, then
<INSTRUCTION i> is executed and the execution of the 'switch' is over.
If there is a 'default' case, the <DEFAULT INSTRUCTION> is executed when
the i defined above doesn't exist.
2.6 Expressions
---------------
2.6.1 Standard expressions
--------------------------
A standard expression may be
o a structure with or without features
o a number
o a boolean value : true or false
o a string
o a function call. Note that any LIFE function may be used in a function
call. Therefore we encourage you to search in the Wild_LIFE manual for
the built-in function that you may need.
o a feature access with the '.' operator
Example :
MyStruct.the_feature
X.tree.node.label
o an arithmetical expression with operators : '+', '*', '-', '/', '^'
o a boolean expression with operators : 'or', 'and', 'not' or a comparison
expression.
A comparison expression is a binary expression whose operator may be
'==', '<>', '<', '=<', '>' or '>=' and whose operands are standard
expressions.
The '==' operator is the same as in C, except that it can be used with
structures. For example :
X == foo
evaluates to 'true' iff X designs a structure named foo, independently
from its features.
The '<>' operator is the negative form of the precedent. It's much like
the '!=' operator in C.
2.6.2 Logical expressions
-------------------------
A logical expression is an expression occuring only in an 'entry_test' or 'if'
statement. A logical expression may be :
o a boolean expression
o a binary expression with operators '&&' or '||'
They have the same meaning as the corresponding ones in C.
BUT you cannot use them in a boolean expression. The major difference
comes from the evaluation's order. These operators evaluate from left to
right. The boolean operators 'and' and 'or' have no insured evaluation's
order.
3 - The toolkit
---------------
SuperLint comes with several built-in structures, functions and procedures.
o Lists
-----
A list may be given by :
o [] : the empty list with no elements
o newlist(E, L) : where E is a structure and L a list (it's the lists
constructor function)
A non empty list has two features :
o first_element : the first element in the list (corresponding to the
structure designed by E above)
o tail : the list except its first element (corresponding to the
list designed by L above)
A list may be specified extensively by giving all its elements
Example :
L = [1, 2, 3],
E = L.tail.first_element,
L2 = newlist("hello", L)
In this example the value of E is 2 and the value of L2 is
["hello", 1, 2, 3].
o features_list(What)
-------------------
Function returning the list of features of the structure designed by
'What' if any, [] otherwise.
o has_feature(Feature, What)
--------------------------
Tells if the structure designed by 'What' has the feature designed by
'Feature'.
May only be used in a logical expression.
o syntactic_tree
--------------
This function returns the 'toplevel' structure of the syntactic tree.
Note for the purpose of the LIFE programmer :
---------------------------------------------
Actually it's not a function but a persistent variable. This information
is however meaningless for the user of the C-like language.
o parse_tree(Operation, Node)
---------------------------
This procedure scans the syntactic tree from the node 'Node' applying
to each of its subtrees the operation 'Operation' which must be a procedure
with one parameter.
In fact the procedure 'Operation' is applied twice on the node :
o when the node is encountered for the first time
o when the operation has been applied to all the subtrees of the node
To distinguish these two cases the function 'parse_mode' is provided.
In the first case it returns 'prefix', in the second case it returns
'postfix'.
Three other functions are provided :
o current_declaration : the declaration whose subtrees are currently
explored, 'nothing' otherwise (see section 4, part I).
o current_block : the block whose subtrees are currently explored,
'nothing' otherwise (see section 5, part I).
o current_instruction : the instruction whose subtrees are currently
explored, 'nothing' otherwise (see section 5, part I).
o scan_tree [ ({ <DOMAIN 1>, ..., <DOMAIN N> }) ]
-----------------------------------------------
This operation launches the scanning of the syntactic tree from the
'toplevel' node, applying the rules whose domain is one of :
<DOMAIN 1>, ..., <DOMAIN N>.
If no domain name is specified all the rules are applied : it's the
default behaviour of an analyzer.
This procedure uses 'parse_tree', therefore all the associated functions
are available.
o error_msg(Message [, Token])
----------------------------
This procedure writes the error message 'Message', giving informations
about file and line from the token 'Token' (see section 1, part I).
The second argument is optional.
This procedure can only be called in a rule or in a function/procedure
which is accessed from a rule.
The name of the rule in which this procedure is called is also displayed.
You can retrieve this option by adding the directive :
display_rules(no).
in one of your rules files.
o is_token(What), is_not_token(What)
----------------------------------
Tells whether the structure designed by 'What' is a token or not.
May only be used in a logical expression.
o is_operator(What)
-----------------
Tells if 'What' designed by 'What' is an operator or not, i.e. an
expression (see section 6, part I).
May only be used in a logical expression.
o is_assignment_operator, is_logical_operator, is_arithmetic_operator,
is_relation_operator, is_conditional_operator, is_unary_operator,
is_struct_reference, is_assignment :
Like the previous one, but for specific expressions (see section 6, part I).
They may only be used in a logical expression.
o expand_type(Type)
-----------------
This function returns the expanded form of the type structure designed
by 'Type' (see section 3, part I), i.e. the equivalent type where all
the type names defined in a typedef statement have been recursively
replaced by their equivalent type.
o basename(FilePath)
------------------
'FilePath' designing an absolute UNIX path of a file, this function
returns the single file name.
o file_extension(FilePath)
------------------------
'FilePath' designing an absolute UNIX path of a file, this function
returns the extension of the file name if any, the empty string ""
otherwise.
o writeln
-------
This procedure writes all its arguments in the standard output, the last
being followed by a newline.
o string2id(Identifier)
---------------------
This function transforms the string 'Identifier' into an identifier of
same name. This function is useful when handling structures names which
cannot be literally written because they would cause problems to the
LIFE parser (e.g. '!', '(', ...etc).
You should always use this function when using non alphanumerical structures
names like the tokens or expressions nodes (see section 1 and 6, part I).