Solo Programadores 22

home *** CD-ROM | disk | FTP | other *** search

/ Solo Programadores 22 / SOLO_22.iso / docs / misc / old / gnote1.doc < prev next >

Wrap

Text File | 1996-04-29 | 57.4 KB | 1,187 lines

GNAT-NOTE #1 Jan 31, 1993 Revised: April 8, 1993 Revised: April 12, 09:51 Robert B. K. Dewar A LIBRARY DESIGN FOR GNAT This design is based on discussions in the GNU-Ada design group at NYU, as well as taking into account contributions from others, including especially Richard Stallman. The basic philosophy is to provide an environment which is fully flexible, and at the same time has a natural and intuitive style of use both for Ada programmers used to the more conventional Ada library model, and to Unix programmers. The original version of this note was generated in January, but the design has undergone extensive modification since then. The approach described here is that implemented in GNAT as of April, 1993. Background -- The Ada Library Model of Compilation -------------------------------------------------- This document addresses the issue of representing what the Ada RM calls the library "file", and implementing the semantics associated with this entity. First, let's review the Ada model. We use the term Ada model to describe the common interpretation of the intention of the reference manual. As we shall see later, the RM can be read in a rather flexible manner (the basic issue being the extent to which its discussion of the library is talking about a conceptual or physical entity). Existing Ada implementations have in fact taken a particular interpretation, which is what we describe here. An Ada library (we will always use this terminology to distinguish it from other uses of the word library) is a data structure that gathers the results of a set of compilations of Ada source files. A compilation is performed in the context of such a library, and the information in the library is used to enforce type consistency between separately compiled modules. Unlike some other language environments, all such type checking is performed at compile time, and Ada guarantees at the language level that separately compiled modules of a complete Ada program are type consistent. Building an Ada program consists of selecting a main program (typically this is a parameterless procedure compiled into the Ada library), and all the other modules on which this main program depends. These modules are then bound into a single executable program. For the most part this process is similar to the normal link step which is familiar from other language environments, but there are some Ada-specific semantics which are intended to be enforced at link time. Let's look at some specific examples of how the Ada library model works. Suppose that we have a program consisting of the following elements, called compilation units, each of which is separately compiled. 1. -- Specification of MAIN procedure procedure MAIN; 2. -- Body (implementation) of MAIN procedure with PROC1, PACKG1; -- units needed by MAIN program procedure MAIN is -- not required to be called MAIN ... end; 3. -- Specification of PROC1 procedure with PACKG1; procedure PROC1 (....); 4. -- Body of PROC1 procedure procedure PROC1 (....) is ... end; 5 -- Specification of package PACKG1 package PACKG1 is ... end; 6. -- Body of package PACKG1 package body PACKG1 is ... end; Note: in this discussion we use all upper case for unit names to clearly distinguish them from file names, which are all lower case. Actual casing requirements are more flexible of course. In particular, we prefer to use the mixed case convention (e.g. Utility_Package) in our actual Ada code, but the clear font difference helps avoid confusion in a document of this type. Notice first of all that for each procedure and package, there are two separate parts. First we have the specification (which gives the name and types of the procedure parameters, and is essentially similar in function to a function prototype -- or collection of prototypes in the case of a package -- in C. The other part is the body which is the implementation. These two parts can in general be compiled separately. A compilation unit may "depend" on other compilation units. The most typical way of creating such a dependence is by use of a "with" clause. For example, in the above set of units, procedure MAIN depends on procedure PROC1. A definite order of compilation is enforced by the language semantics (and implemented by use of the Ada library). In our example here, the compilation order must respect the following partial ordering: Spec of MAIN must be compiled before Body of MAIN Spec of PROC1 must be compiled before Body of PROC1 Spec of PACKG1 must be compiled before Body of PACKG1 Spec of PROC1 must be compiled before Body of MAIN Spec of PACKG1 must be compiled before Body of MAIN Spec of PACKG1 must be compiled before Spec of PROC1 Basically the idea is that you must compile the specs of anything you depend on before compiling the dependent unit, and in addition, the spec of a unit must be compiled before its corresponding body. Within these rules there is a fair amount of freedom in the compilation order. For example, in the current example, there is no rule about the order in which the bodies must be compiled. An important idea here is one of "obsolete" units. If a unit is recompiled, then units which depend on it are obsolete, and must be recompiled. Again the Ada library is the data structure which is used to implement this requirement. For example, in our example here, if the spec of PACKG1 is recompiled, then the body of Main, and the spec and body of PROC1 must be recompiled (further- more, in accordance with the ordering rules given above, the spec of PROC1 must be recompiled before the body of Main). There are a few more fine points in the model. A compiler must be able to take as input a compilation, which is a series of one or more compilation units. The normal model is that a single source file can contain several compilation units, although the Ada RM says nothing about source files, so this is not a necessary convention. In particular, it would be possible to declare that the representation of a compilation consisting of several units consists of a series of files, each containing more than one unit. However, most, but not all implementations, have just assumed that "compilation unit = file", so that submitting a file to the compiler involves submitting a series of compilation units. If two files contain the same unit, then the one which gets into the library is the one compiled latest. The meaning of the program thus depends on the order of compilation of its components. A particularly confusing case is when multiple units appear in a file. If file F1 contains units A,B,C and file F2 contains unit B, then compiling F2 after F1 will remove the old version of B from the library, but leave A and C intact. It is permissible to compile the body of a procedure without compiling the corresponding spec. In this case the body acts as a spec, and has the same dependencies as the spec. In the example above, we could omit compilation unit number 1, and compilation unit 2 would act as the spec for MAIN. The specification for a subprogram can be omitted, in which case the body acts as a spec. The exact details of how this works are a little tricky. In particular, when you have a body that is serving as a spec in this way, it will be as usual by the introduction of a separate spec. Once a spec has been introduced, compiling a body which is incompatible with the spec must be rejected. In Ada/83, certain packages may have optional package bodies (these are typically packages containing only type and variable declarations). In Ada/9X, such packages may *not* have associated package bodies. If the specification of a procedure contains a pragma inline, or the specification of a package contains one or more inlined procedures, then any unit that depends on the specification also depends on its body, since it needs the body to do the inlining. In this case the body containing the inlined procedures must be compiled before the with'ing unit. In the Ada Reference Manual, there are specific references to a "library file", and this is often taken to mean that the Ada library should be or must be represented using a file in the normal sense. Most Ada systems do in fact implement the Ada library in this manner, so that a compilation specifies a source file and an Ada library, and the effect of the compilation is to generate object and listing output *and* to update the library file. However, it is clearly accepted that the RM does not require this implementation approach. In this view, an Ada library is a conceptual entity that can be implemented in any manner that provides the required semantics. Note: in the model where a library file is maintained, special Ada specific utilities are required to rename, move or copy units between libraries, since the Ada library information must be maintained in an Ada specific form known only to components of the Ada system. Note: an Ada purist will note that the proper technical term for what we have called a specification or spec here is "declaration", but the (mis)use of the term spec(ification) is essentially universal in the Ada world, so we follow this de facto standard in our terminology, except that from now on we will adopt the internal GNAT terminology: specification, spelled out in full, is a syntactic term, referring to the defined Ada grammar. The abbreviation spec is reserved for referring to declarations of units. We actually find the use of spec in this context helpful, since for example if one refers to the spec for a given body, the meaning is clear, whereas if you refer to the declaration for a package body, it is not clear whether you are talking about the declaration of the body itself, or the package declaration. Some Relevant Ada Language Features ----------------------------------- This section summarizes some important features of Ada that are relevant to this discussion. Ada knowledgeable people can skip this, but it will be helpful to those whose knowledge comes from the non-Ada world. Subunits and Stubs A nested body, such as a nested procedure body, or nested package body, can be made into a subunit. This means that it is in a separate file, and at least in some sense is compiled separately. We say in some sense here because it must be compiled in the context of its parent, just as though it had been inline. In the parent, we have a "stub" that stands for the missing body, e.g. procedure JUNK is separate; The body is then placed in a separate compilation unit, typically in a separate file, and looks like: separate (PARENT) procedure JUNK is .. <normal procedure body code> .. where PARENT is the name of the unit containing the stub. Semantically the overall effect of this structure should be semantically equivalent to including the subunit inline, although that isn't quite exactly right in Ada terms, since the subunit can have its own context clause (with'ed units), and, although there is no conceptual reason for this restriction (i.e. it stems from methodological considerations, rather than technical considerations), Ada does not permit with clauses other than at the start of the compilation. Child Units A child unit in Ada 9X is an extension of its parent unit, which is a library package. Child units have qualified names indicating the parent (e.g. unit XYZ.ARN is a child of unit XYZ). A child unit has both a spec and a body. The spec acts as an extension of the parent spec, and the body acts as an extension of the parent body. Inlined Subprograms The spec of a subprogram can be marked using a pragma Inline, which means that an attempt should be made to inline the code of the body. This creates a dependence of the unit containing a call on the body. Actually the rule in the RM is that this dependence is only established if the body has been compiled before the unit containing the call. This is a natural consequence of the library model in the RM, and means for instance that if two packages call inlined routines in one another, one can not expect both requests to be satisfied (which one is satisfied depends on the order of compilation). Generic Units A generic unit is essentially a macro for a subprogram or package where the parameters can be types as well as normal procedure parameters. To use a generic it must be instantiated giving specific values for the parameters. This conceptually creates a copy of the spec and body which are appropriately customized. An obvious implementation is simply to inline the customized copies at the point of instantiation. However this creates a problem since it means that a dependency is created from the unit containing the instantiation to the body. As we discussed for the inlined subprogram case, that can cause some restrictions in cases where two packages instantiate generics declared in the other. In the case of inlined subprograms, we could just ignore the inlining request, but in the generic case we get stuck. There are approaches for getting around these limitations, but they are complicated. We won't go into them further here. We note that the Ada/83 RM specifically allows an implementation to place restrictions on the use of generics consistent with this model of inline expansion, but in any case the GNAT scheme is simple as we shall see and has no such restrictions. Background -- The GNU Model of compilation ------------------------------------------ The GNU model of compilation is that separate files which constitute the program are separately compiled and each compilation produces a corresponding object file. These object files are then linked together by specifying a list of object files in a program. A library consists of a set of such object files and there is no library file as such, although there is a notion of dependence on headers (which are of course source files). In this model, standard system utilities (rm, mv, cp) can be used to remove, rename, and copy modules. In the case of C and C++ programs, a given source file can #include header files. In this case to compile the file, the header files must be available. The make utility in GNU usage in general specifies for each object file which source files must be around to generate it, i.e. it establishes a dependency of the object file on a set of sources. As long as the dependencies in the make file are correct, and as long as all compilations are performed using this make file, then consistency of the system is guaranteed. However there is nothing to stop compilations being carried out without the use of make, and in such cases, it is possible to generate executables which are inconsistent, e.g. more than one incompatible version of a given header file appears in separate object modules. The Design Goal - Unification ----------------------------- The goal in this design is to reconcile the Ada and GNU models of compilation. On the one hand, we want the Ada guarantee of inter-module type integrity that is guaranteed by the Ada language specification -- in particular it should be essentially impossible to link a type inconsistent program. On the other hand, we want to fit into the GNU model in which separate compilations generate separate object files (and which has no place for a global library file). The Basic Model of GNAT Compilation ----------------------------------- In this section we will describe the basic module of GNAT compilation. Before starting, we should warn Ada programmers that they are likely to react that the GNAT approach is at best peculiar and at worst wrong, because it is quite different from conventional Ada models. However, we ask for such readers to read ahead with an open mind. Later on we will describe how the system can be used in a manner that has identical semantics to typical library based Ada systems if that is desirable. The fundamental point is that we use the GNU view of compilation as our starting point, and in particular we are entirely source based. A GNAT compilation specifies a source file, and generates a single object file. There are *no* library files, or any centralized library information of any kind. A GNAT source file contains a single compilation unit (a compilation is represented as a series of source files, each containing one compilation unit). Furthermore there is a mapping from unit names to file names, so that from a unit name one can always determine the file name. This mapping is quite flexible, as we shall describe later, but for the examples in this document we will use the default file naming convention as follows: The file name is the expanded name of the unit with dots replaced by minus signs. An additional minus sign is appended to specs to distinguish them from bodies. The extension .ada is included in all files. Some examples of these default mapping rules are: Unit name File name PACKGE1 (spec) packge-.ada PACKGE2 (body) packge.ada SCN.NLIT (subunit) scn-nlit.ada CHILD.PKG (child spec) child-pkg-.ada XYZ.ARG.LMS (subunit xyz-arg-lms.ada ABC.DEF.GHI (child spec) abc-def.ghi-.ada The corresponding object file has the same file name with the extension .o (which is why the spec and body of a file have to have different file names, not just different extensions). As in a C file with #include'd header files, a GNAT source file may require other source files for its compilation. These include: The corresponding spec for a body. For example if we compile a package body xyz.ada, we will reference the source of the package spec in xyz-.ada The parent spec of a child library spec. Child libraries are extensions of their parent library, so to compile a child library, we must have the files for its parent available (and since this principle is applied recursively, the entire set of ancestors will be needed). For example, if we are compiling the child spec abc-def-.ada, we will need the source of its parent in abc-.ada. With'ed specifications. The context clause of an Ada compilation unit specifies a series of units whose specs contain entities that may be referenced in the compilation. The sources of all such specs must be available. For example if we compile xyz.ada, and Unit XYZ with's unit ABC, then we will need the source file abc-.ada containing the spec of ABC. Parent body for a subunit. If we are compiling a subunit, then it can reference entities declared in its parent, so certainly we must have the source of the parent around. For example, if we are compiling the subunit in file abc-def.ada, then we will need the source of its parent in abc.ada Bodies of inlined subprograms. If we call an inlined procedure declared in some spec, then we need not only the source of that spec, but also the body. For example, if unit ABC with's the inlined subprogram RAPID, then the compilation of abc.ada will require not only the spec of the source in rapid-.ada, but also the body in the file rapid.ada Bodies of instantiated generics. This is exactly the same situation. For example if unit TOP1 instantiates a generic subprogram GENERAL1, then the compilation of top1.ada will require not only the spec of the source in general1-.ada, but also the generic body in general1.ada Bodies of packages containing either inlined subprograms that are called, of generic bodies that are instantiated. This is a similar case. Suppose that unit JUNK1 with's the package PACK1, and makes a call to the inlined subprogram XYZ declared in PACK1, or instantiates the generic spec GEN1 declared in PACK1, then the compilation of junk1.ada will require not only the package spec in pack1-.ada, but also the package body in pack1.ada. All these rules probably seem quite reasonable to a C programmer, since they are similar to the requirements that compilation of a C source containing a #include for a header requires the header to be around. However, an Ada programmer is likely to be puzzled. The key understanding is that in GNAT, dependencies are not from one compilation unit to another, but from object files to corresponding sources. Let's take another look at the example at the start of this note: 1. -- Specification of MAIN procedure (in file main-.ada) procedure MAIN; 2. -- Body (implementation) of MAIN procedure (in file main.ada) with PROC1, PACKG1; -- units needed by MAIN program procedure MAIN is -- not required to be called MAIN ... end; 3. -- Specification of PROC1 procedure (in file proc1-.ada) with PACKG1; procedure PROC1 (....); 4. -- Body of PROC1 procedure (in proc1.ada) procedure PROC1 (....) is ... end; 5 -- Specification of package PACKG1 (in file packg1-.ada) package PACKG1 is ... end; 6. -- Body of package PACKG1 (in file packg1.ada) package body PACKG1 is ... end; Now we have a number of dependencies of object files on source files as follows: main-.o depends on main-.ada main.o depends on main.ada, main-.ada, proc1-.ada, packg1-.ada proc1-.o depends on proc1-.ada, packg1-.ada proc1.o depends on proc1.ada proc1-.ada, packg1-.ada packg1-.o depends on packg1-.ada packg1.o depends on packg1.ada, packg1-.ada Note that the dependencies are transitive, in this example the dependency of proc1.o on packg1-.ada is such a transitive dependence. This is similar to a situation in C where a header #include's another header, and of course both header files must be around to compile a file including the first header. In this approach, we are reinterpreting the "order of compilation" rules to be "dependency on source files" rules. A rule that says that the body of MAIN cannot be compiled until the spec of MAIN has been compiled is reinterpreted to mean that the body of MAIN cannot be compiled unless the source of the spec of MAIN is available. The rules about compilations obsoleting other compilations are similarly reinterpreted. The rule that says that recompiling the source of MAIN obsoletes the body is taken to mean that reediting the source of MAIN requires the body to be recompiled. One interesting consequence of the GNAT approach is that if all the sources of a program are available, there are in fact no restrictions on the order of compilation, the units can be compiled in any order. We can even compile bodies before the corresponding specs if we want. This model of source dependencies has a number of significant advantages. It's certainly much more familiar to non-Ada programmers, and we believe that it is fundamentally much simpler than conventional Ada library models. Furthermore, there are a number of technical difficulties relating to circular dependencies in the conventional model (where two units depend on one another) that completely disappear. For instance, consider the following situation: 1. -- Specification of PACKG1 (in file packg1-.ada) package PACKG1 is procedure PROC1; pragma Inline (PROC1); ... end PACKG1; 2. -- Body (implementation) of PACKG1 (in file packg1.ada) with PACKG2; package body PACKG1 is ... PROC2; ... end PACKG1; 3. -- Specification of PACKG2 (in file packg2-.ada) package PACKG2 is procedure PROC2; pragma Inline (PROC2); ... end PACKG2; 4. -- Body (implementation) of PACKG2 (in file packg2.ada) with PACKG1; package body PACKG2 is ... PROC1; ... end PACKG1; This is the case of mutually recursive inline references that causes trouble in the conventional model, since to accomplish both inlining actions, the units for the bodies of the two packages would have to depend on one another. Note incidentally that we are not talking about a case of actual recursive inlining, we assume in this example that the call to PROC1 is not in the body of PROC2, but in some other subprogram, and similarly the call to PROC2 is not in the body of PROC1, but also in some other subprogram, so this situation is perfectly sensible, and it would be desirable to have both inline actions achieved. In the GNAT model there is no special problem, the dependencies are: packg1-.o depends on packg1-.ada packg1.o depends on packg1.ada, packg1-.ada, packg2.ada, packg2-.ada packg2-.o depends on packg2-.ada packg2.o depends on packg1.ada, packg1-.ada, packg2.ada, packg2-.ada No big surprises, no particular problems! It's just that, as one might expect any change to any of the four sources requires that the bodies of the two packages be recompiled. Now the failure of the normal Ada library model in this case is not critical, since the semantic effect of failing to achieve inlining is just a loss of efficiency. However, consider a similar example with mutual generic instantiation: 1. -- Specification of PACKG1 (in file packg1-.ada) package PACKG1 is generic type X is private; procedure PROC1 (M : X); ... end PACKG1; 2. -- Body (implementation) of PACKG1 (in file packg1.ada) with PACKG2; package body PACKG1 is ... package NEW1 is new PROC1 (Integer); ... end PACKG1; 3. -- Specification of PACKG2 (in file packg2-.ada) package PACKG2 is generic type X is private; procedure PROC2 (M : X); ... end PACKG2; 4. -- Body (implementation) of PACKG2 (in file packg2.ada) with PACKG1; package body PACKG2 is ... package NEW2 is new PROC2 (Integer); ... end PACKG1; Once again, we are not talking about an actual recursive instantiation, which would be illegal in Ada. The instantiation of PROC2 does not occur in the body of PROC1, and the instantiation of PROC1 does not occur in the body of PROC2, so this program is perfectly legal. Now we are in trouble with the Ada dependency model if we are trying to inline generics, because once again this would generate a mutual dependency between the two package bodies. In the conventional Ada model, we have two ways out of this: o Take advantage of the permission in Ada/83 to refuse to compile this particular program. The Ada programmer may be annoyed, but you are still conforming. This is a bit of "subsetting" that is specifically permitted by the standard. Note however that it is either possible or likely, depending on your point of view, that Ada/9X will withdraw this subsetting permission, and in any case, this subsetting is not desirable from an Ada programmer's point of view. o Figure out how to avoid the dependencies. There are two approaches. One is to use shared implementations of generics, which causes all kinds of implementation problems. The other is to compile the instantiated copies in separate object files, and then defer their compilation till the necessary information is at hand. This approach is also tricky, and certainly does not conform with our "one source, one object" approach. Now let's look at what happens in the GNAT model. We simply get the same set of dependencies as in the inline case: packg1-.o depends on packg1-.ada packg1.o depends on packg1.ada, packg1-.ada, packg2.ada, packg2-.ada packg2-.o depends on packg2-.ada packg2.o depends on packg1.ada, packg1-.ada, packg2.ada, packg2-.ada Again, no particular problems! It's just that we have to recompile both package bodies if any of the four sources is modified. Furthermore we can use the simple generic inlining model without introducing any of the restrictions usually associated with this model. Ensuring Consistency -------------------- One thing that will be worrying Ada programmers at this point is how we ensure that an executable Ada program is guaranteed to be consistent. In the C case, we answer this question by saying "generate a correct make file with the proper dependencies, preferably with a tool, and then jolly well use it whenever you compile -- caveat emptor those who don't follow this rule! Well that doesn't sound good enough for Ada programmer's who have a much more strenuous view of safety and correctness -- indeed this is a principle aspect of the appeal of Ada. In particular, suppose we have the six files of our first example: 1. -- Specification of MAIN procedure procedure MAIN; 2. -- Body (implementation) of MAIN procedure with PROC1, PACKG1; -- units needed by MAIN program procedure MAIN is -- not required to be called MAIN ... end; 3. -- Specification of PROC1 procedure with PACKG1; procedure PROC1 (....); 4. -- Body of PROC1 procedure procedure PROC1 (....) is ... end; 5 -- Specification of package PACKG1 package PACKG1 is ... end; 6. -- Body of package PACKG1 package body PACKG1 is ... end; Now we do the following: Compile packg1-.ada to generate packg1-.o Compile packg1.ada to generate packg1.o Compile proc1-.ada to generate proc1-.o Compile proc1.ada to generate proc1.o Compile main-.ada to generate main-.o Compile main.ada to generate main.o So far so good, six nice consistent object files. Now let's do the following: Edit source of packg1-.ada Recompile packg1-.ada to generate new version of packg1-.o Recompile packg1.ada to generate new version of packg1.o Now if we were using a proper make file, the dependencies in this make file would force us to recompile the spec and body of PROC1 and the body of MAIN. But suppose we don't use the make file. Well we have six objects that are certainly NOT consistent. GNAT has two lines of defence against an attempt to construct a program from a set of inconsistent objects. First, when we said we generated no centralized library information, the operable word was centralized. In fact we do generate some library information for each object file. We call this information the ADL (Ada Library) information, and the most important component is a recording of the time stamps of all sources on which this unit depends. Before a program is linked, the Ada binder (you could also call it a prelinker to use the more familiar GU terminology) must be run. Ada semantics require this step for two reasons. First, initialization calls must be made to initialize unit specs and bodies (this initialization activity is called elaboration in Ada), and you can't tell the order of these calls until you have the whole program. Second, it is possible to construct a situation in which no possible order of elaboration exists. Such a situation is considered a compile time error, and must be diagnosed prior to execution. Part of the processing in the GNAT binder makes sure that the program is consistent by looking at time stamps in the ADL information associated with the object modules of the program. In our attempted subversion of the system above, the binder will detect an error resulting from the time stamp of the source file packg1-.ada in the ADL for packg1-.o and packg1.o will not match the time stamp of this same source file in the ADL for the other modules. The binder will then give a message something like: Please recompile proc1-.ada (source of packg1-.ada has been modified) Please recompile proc1.ada (source of packg1-.ada has been modified) Please recompile main.ada (source of packg1-.ada has been modified) These correspond to messages typically obtained from Ada library systems if they are kind enough to keep traces of obsoleted modules around. Many existing Ada libraries are *not* kind enough to do this, and so will simply generate messages saying that these three units are missing from the library (because they were removed from the library when packg1-.ada was recompiled). Note that only the time stamps of the source files are relevant. The time when the source file was compiled is irrelevant, and in particular if you recompile the same source file without having edited anything, you'll get the same object file, and nothing will get obsoleted, which makes sense of course, but conventional Ada library systems will obsolete things in this situation and require quite unnecessary recompilations. Suppose we have a more devious programmer, who has saved the object file from a previous bind operation on this program (the binder generates an object file containing the elaboration calls in the required order), and who tries to link the program without calling the binder. Well the second level of GNAT defence steps in. The object files themselves contain external references which include time stamp information, and the linker will not be able to link the program. The error messages are a little bit more mysterious, you will get something like: Unresolved external symbol: packg1%s-1993-04-03:00:00.00 which is to be interpreted to mean that someone wanted the version of the spec whose source has the given time stamp, but there is no corresponding object file, meaning that the source has been modified and recompiled. These two lines of defence ensure the same level of security that is provided by conventional Ada library systems (actually some such systems don't provide the second level of defence). A really determined programmer can still cheat by deliberately modifying the time stamps of files. We don't particularly encourage this, but we don't try to prevent it. After all, in an environment where the programmer can change any bits in sight, we can only make it harder to subvert the consistency requirement, not impossible. The important thing is to have sufficient defences that we could never get an inconsistent program other than by very deliberate subversion of the defences. As an example of the use of such subversion, consider a programmer who wants to add an entry to a spec, and guesses, correctly as it turns out, that files currently with'ing the old version of the spec don't really need to be compiled. Well it will in fact work to edit the spec, add the new declaration, and then change the time stamp of the source back to its original value. However, this sort of thing is obviously risky, not guaranteed to work, and definitely in the caveat emptor range! Order of Compilation Issues --------------------------- As we have observed, the GNAT model doesn't really place restrictions on the order of compilation. In particular, if the sources are all around, it is perfectly possible to compile a package body before compiling the corresponding package spec. However, a consequence of such an inverted compilation order maybe that when the package body is compiled, the package spec will be found to have syntax errors. Of course the compilation cannot proceed in this case. GNAT will generate messages clearly identifying the syntax errors in the spec, and will refuse to generate an object file. Normal Ada practice is of course to compile the spec first, and then only compile the body if the spec is error free. This practice is still generally desirable in the GNAT environment. Furthermore, as a result of the Ada semantic requirements, if you compile a spec without errors, then you are absolutely guaranteed that any subsequent compilation that makes use of this spec will not encounter errors from the recompilation of the spec that occurs as a normal part of the GNAT processing. Note the contrast here with the use of C headers, which one generally does not compile in isolation, and even if you can compile them in isolation, the fact that compiling a header generates no errors is no guarantee that its incorporation by #include into some other file will not generate additional context dependent errors. It may be desirable in practice to enforce the spec-before-body order of compilation. That's easily done by using make files that introduce additional dependencies of object files on other object files for referenced specs. For instance, going back to our standard six file example, the normal GNAT make file looks like: main-.o depends on main-.ada main.o depends on main.ada, main-.ada, proc1-.ada, packg1-.ada proc1-.o depends on proc1-.ada, packg1-.ada proc1.o depends on proc1.ada proc1-.ada, packg1-.ada packg1-.o depends on packg1-.ada packg1.o depends on packg1.ada, packg1-.ada If you want to ensure that specs are compiled before bodies, additional dependencies can be added: main-.o depends on main-.ada main.o depends on main.ada, main-.ada, proc1-.ada, packg1-.ada and also on main-.o, proc1-.o, packg1-.o proc1-.o depends on proc1-.ada, packg1-.ada and also on packg1-.o proc1.o depends on proc1.ada proc1-.ada, packg1-.ada and also on proc1-.o, packg1-.o packg1-.o depends on packg1-.ada packg1.o depends on packg1.ada, packg1-.ada and also on packg1-.o Now if you run make using this set of dependencies you get the normal spec before body rules. Suppose for example you edit packg1-.o and run make. Clearly in the resulting make file packg1-.ada must be compiled before packg1.ada, since the compilation of packg1.ada depends on output from the compilation of packg1-.ada and therefore must be done after it. We anticipate a make-depend type utility for GNAT that will have a switch to specify whether or not you want this type of enforcement of compilation order. The compiler itself certainly does not need this enforcement, and so our approach provides maximum flexibility for the programmer in this regard. Note that you probably don't want to introduce dependencies on object files for bodies, even if you are dependent on the corresponding sources. Such additional dependencies wouldn't provide any methodological advantages, and would have the disadvantage of creating restrictions on the use of pragma Inline and generic instantiations. Handling Subunits ----------------- Subunits could be handled with no further special considerations in the above model. In particular, the object files for the subunit bodies would depend on the source files of their parents, and the usual GNAT model would apply, including the user option of whether or not to force the normal Ada order of compilation that requires the parent to be compiled first. However, we take a much more radical view of subunits. The reasons for this view are essentially orthogonal to the considerations given so far, and are fundamentally the following: 1. There are a number of situations where you would normally expect the compiler to know things at compile time, e.g. which outer level variables are referenced by inner level procedures, which packages declare tasks, etc which you can't know in a conventional Ada system because there may be subunits present which you can't see when you are compiling the parent. This results in a degradation of the code. For example, consider the following: procedure XYZ is A : Integer; B : Integer; package Inner is procedure Munge; end Inner; package body Inner is separate; begin ... end; Now we are compiling the parent. We would like to know if tasks are present so that we know whether or not to establish a task master for this procedure or we would like to know if A is referenced by an inner procedure, so that we know if it can be kept in a register. Neither of these questions can be answered in a conventional system when compiling the parent, so we have to assume the worst, and the effect is that the presence of subunits can degrade the code quality considerably. 2. Package subunits are a huge mess to implement. Consider in the above example that the body for Inner looks like: separate (XYZ) package body Inner is M : Integer; ... end; Semantically the integer M belongs to the stack frame of its enclosing procedure, and in particular it has the lifetime of this stack frame. Where the heck shall we put it? We can't easily put it in that stack frame directly, since when we compiled the enclosing procedure, we didn't know that M existed. This problem (one might say headache) is well known to Ada implementors. There are a number of schemes, none of them fully satisfactory, and many of them introduce significant implementation complexity. 3. GNAT is making use of the existing backend of GCC, which certainly is not set up for separate compilation of inner procedures, let alone package subunits. We could presumably teach it what it needs to know, and make the necessary modifications, but they are rather language specific, and we prefer to avoid the need for making this kind of modification to the backend of GCC. These factors combine to make subunits a big headache. In GNAT we choose to get rid of all of them at a stroke by deciding that we will not attempt to generate an object file for a subunit tree unless the sources of all necessary subunits are present. We then essentially macro-substitute the bodies for their stubs, and all the above problems disappear. If you want to think of this in C terms, consider that the way you would model subunits in C is to use #include to drag in the separate bodies, and then of course all the sources would have to be around to compile the parent. In the context of GNAT, there are two consequences. First subunits themselves do not generate object files and do not need to be separately compiled. In this respect they are similar to C include files, which are not separately compiled and do not have corresponding object files. Second, the parent unit can only be compiled to generate its object module if the sources of the subunits are all available. There are two immediate reactions that an Ada programmer will have. First there are efficiency concerns -- "Boy, you're forcing a lot of extra compilation, that's going to be very slow!" We'll deal with this concern in a separate section. The more significant concern is that the whole point of using subunits is to separate concerns. Consider the following scenario: Susan develops the parent unit XYZ, which has two subunits XYZ.A and XYZ.B she creates the source file xyz.ada and then gives the task of writing the two subunits to Jose and Jack. Jose creates the source file xyz-a.ada containing the subunit XYZ.A Jack creates the source file xyz-b.ada containing the subunit XYZ.B In a conventional Ada system, Susan will compile her parent unit before giving the tasks to Jose and Jack to be sure that it is syntactically and semantically correct. She can't test it, except possibly with dummy stubs, but she still wants to make sure it doesn't contain obvious compile errors before checking it into the configuration management system. Similarly Jose and Jack will want to compile their subunits, using the compiled version of the parent, to check that they are syntactically and semantically correct. Again they can't easily test them, but they want to be able to catch obvious errors early on. When all components of the system are ready, then testing can begin with the assurance that no syntax or semantic errors will appear when the system is assembled. Are we going to lose that important capability in GNAT, given its approach of compiling the whole thing together? The answer is no. It's true that we can't make an object file of the whole structure until all units are there, but that of itself is not really a limitation, because we can't test things till we have all the subunits anyway. What GNAT does permit is to run the compilations of the parent on its own, or the bodies of the subunits in the presence of their parent sources in syntax/semantic check only mode. No object file will be generated, but the same assurances that the component is syntactically and semantically correct apply. Since the primary purpose of the compilations that Susan, Jose and Jack did was to ensure freedom from such errors, the GNAT system has exactly the same functional capabilities as a conventional Ada system. What About Efficiency? ---------------------- There are two efficiency concerns presented by this source-based approach. First, we are constantly recompiling units in the simple case from their source. For example, given the package: with XYZ, MNO, TEXT_IO; use TEXT_IO; procedure JFK is begin Put (XYZ.WHO); Put (MNO.SHOT); Put ("JFK?:"); end; the GNAT compiler, asked to compile file jfk.ada, is going to have to recompile the specs of XYZ, MNO and TEXT_IO. That sounds bad, but let's look at the alternative. In conventional Ada library based systems, the result of a compilation is to place information, typically some kind of intermediate tree, in the library. A subsequent WITH then fetches this tree from the library. In practice, this tree information can be huge, often much bigger than the source. It's not at all clear that rereading and recompiling the source is less efficient than writing and reading back in these trees. It's true that recompiling means redoing syntax and semantic checking, but there may be less I/O to do, and reading and writing linked structures can be complex. Of course we won't know how this really compares till we have detailed performance figures, but from the performance we see so far, we don't think our approach will be significantly slower than the conventional library approach, and it may well be faster. The second efficiency concern has to do with our "recompile-the-whole-tree" approach to subunits. In the case where a complete program is being compiled anyway, there is of course no disadvantage in our approach, since each subunit has to be compiled once in any case. The situation in which the GNAT approach is obviously "inefficient" is when a modification is made to a single subunit, and the whole tree must be recompiled. Obviously one can construct examples where the amount of extra recompilation required is significant. We know this, and it's a conscious trade off. In return for this extra recompilation effort, we are in a position to generate much more efficient code for subunits, and also we simplify our implementation effort considerably. Furthermore, we think that the GNAT compiler will be fast enough that in practice, there will be few cases in which the general performance of GNAT will not be competitive with, or better than conventional systems. Again, time will tell. Note once more that there is nothing in the source based approach that mandates the compile-everything-at-once approach to subunits. This is a quite independent decision, and indeed we could revisit this decision later on, but remember that the only disadvantage in our approach is possible additional compilation time requirements. From every other point of view, we are clearly ahead in taking this approach to subunits. Finding Source Files -------------------- The GNAT approach involves the ability to find a source file given the Ada unit name. There are two issues to be addressed. First how do we find the file name from the unit name? There are two approaches in the GNAT system for addressing this question. First algorithmic mappings are provided. The default mapping is the one we mentioned at the start of this document: The file name is the expanded name of the unit with dots replaced by minus signs. An additional minus sign is appended to specs to distinguish them from bodies. The extension .ada is included in all files. Via command line switches, this algorithm can be modified by specifying a different character than minus to replace dots (dot itself can be used), and different suffixes to distinguish bodies and specs. One interesting possibility is to specify that dots are to be converted to slashes (or whatever the system uses for subdirectory indications), in which case the subunits of a parent unit are gathered in a subdirectory of that name. This in fact may be a useful enough option to build into the compiler in some more direct form (e.g. if you can't find a-b.ada, then automatically go look for a/b.ada). The second approach, again activated by a command line switch or environment variable, a separate file can be constructed that provides mapping of unit names to file names. This mapping file is then consulted to determine the file name, given the unit name. The second issue is how to find the source file, once the source file name has been determined. In GNAT this is done using a search path which specifies a list of directories to be checked in sequence to find the source file. This is analogous to the method that some C compilers use to locate header files. Advantages of the GNAT Model ---------------------------- In addition to the advantages that have already been discussed, there are two other respects in which the GNAT model is superior to the conventional Ada library model. First, all source files are simply normal system files, they can be copied around, deleted or organized using normal system utilities. In the case of a conventional library based system, the library is often an Ada-specific object that has to be manipulated with special Ada-specific tools. For instance, to delete a unit that is no longer needed in the GNAT system, simply use the system delete command on its source and object files, but in most Ada systems, a special library-delete command must be used. Similarly, the effect of multiple libraries can be achieved simply by having multiple directories of source files that are searched in an appropriate order. The conventional Ada library system often requires complex, non-portable, special features to support multiple libraries. Second, many of the anomalies that arise from special cases in the Ada library model are avoided. For example, suppose that there are two source files that both contain the spec of a procedure Util. In a conventional system, whichever source is compiled later "wins" without notification of any kind, which means that the semantics of the program can silently depend on the order of compilation. This can't happen in the normal use of GNAT, since two files with the same unit have to have the same file name, and can't accidentally coexist in the same directory. Similarly, in a system that permits multiple units in the same file, various anomalies arise as a result of other files which recompile some, but not all of these units. You then get a program which does not correspond to any set of coherent sources. That can never happen in GNAT. Every executable program must correspond to a particular set of source files, and could be recreated by compiling these source files without knowledge of the original order of compilation. Support of ASIS-Like Interfaces ------------------------------- Specifications like ASIS provide an interface from Ada programs to information stored in the Ada program library, and at least from a presentational point of view seem to depend strongly on the notion of a program library which contains all the necessary information. The GNAT implementation of such an interface understands the library in this case to be the set of source files used to compile the program. To access the information in this "library" at the required semantic level, the source files must be recompiled. Again, this may or may not be more efficient than reading in the necessary information from the precompiled library file, but it's certainly functionally and semantically equivalent. But It Doesn't Sounds Like Ada to Me ------------------------------------ We believe that the Ada/83 reference manual can be read in a sufficiently flexible and abstract manner that nothing we are doing in the above approach in any sense violates the requirements of Ada. Basically we consider that the rules in the Ada/83 RM are essentially oriented to ensuring consistency in an Ada program, and that a lot of the description in chapter 10 of the RM is essentially the description of one possible approach to achieving this end. Furthermore, the Ada/9X reference manual will be written in a way that tries hard to avoid over-specification of the implementation approach. Nevertheless, most, in fact essentially all, existing Ada compilers have implemented the model in chapter 10 quite literally, and as a result, Ada programmers have come to expect a model of the world in which the monolithic library is the center of the Ada universe. Furthermore, some of our rules in GNAT, in particular the rule about mapping of unit names to file names, and the rule about only one compilation unit per source file, may seem to be unacceptable restrictions. However, GNAT is sufficiently flexible that in fact we think any particular approach to Ada library maintenance, including the various multi-library features provided by various vendors, can be faithfully copied from a functional point of view by adopting appropriate procedures. In particular, how would one model a conventional library system in which source files can contain multiple compilation units and have no naming restrictions. Here is one approach. Create a directory called Adalib, which will represent the library. In this directory we will place source files that meet the GNAT requirements and their corresponding object files. To compile an arbitrary Ada source file, first syntax check it. This can be done using GNAT, because in syntax check only, the restrictions on one unit per file, and on the names of the units, are ignored. If there are syntax errors, forget it (GNAT sets a return code indicating that syntax errors were found, so this is easy to implement in a shell script or batch file). Otherwise, run it through a utility which breaks it up in to separate source files with GNAT naming conventions. Put these source files in a temporary directory. Compile these source files with GNAT, but don't generate code yet. Instead just do syntax and semantic checking. (Note that the only required action of an Ada compiler at compile time is to generate error messages and not update the library if there are errors). If there are no syntax or semantic errors in any of the units, then copy the sources to the library directory. When the program is to be bound, first do the actual compilation of all the units (which we know will work because we did a syntax and semantics check already). Then bind the resulting objects and we are done. Note that Ada does not specify the division of labor between the compiler and binder, except to either require or strongly imply that syntax and semantic errors should be caught at the compiler level. Thus the fact that we are doing the actual code generation at what is logically bind time in the above scheme is perfectly permissible (it just seems to a user that the compilations are very quick and the binder somewhat slow!) This entire procedure can be implemented by appropriate shell scripts or batch files. We generally don't think that many people using GNAT will take this approach. In particular it succeeds in faithfully reintroducing some of the anomalies and limitations that we have worked to eliminate. However, it may be useful for dealing with existing Ada source code, and in particular the ACVC suite takes various liberties in its assumptions about chapter 10 implications. For example, it assumes that a source file can contain more than one compilation unit. Thus this kind of mode will be helpful for running the ACVC suite. Of course this is just one possible scenario. Many others are possible. Since the fundamental capabilities of the GNAT compiler are free of many restrictions normally associated with Ada compilers, there is a lot of freedom in how such scenarios might be constructed. What do we Lose? ---------------- We do lose one feature that some may consider important. It is impossible with the GNAT approach to distribute a package for someone to use without at least giving them the source of the package specification. There is no way to distribute black-box libraries with this system that contain hidden information. Clearly one can imagine proprietary software situations in which this would seem like a restriction, but in the GCC world where we are committed to the free distribution of sources, this seems like an advantage. Similarly, it's hard to make proprietary tools that read information from our "library", since you have to use the compiler to read the library, because the library has to be created by recompiling the source. That means that your proprietary tool would have to include the GNAT compiler, and you can't do that since the licensing of the GNAT source, while very liberal, has one important restriction, namely that you can't incorporate it in proprietary products. Again this "restriction" seems like an advantage to us, given our commitment to maintaining full access to the sources of GNAT and related tools. Summary ------- Although somewhat radical by conventional Ada standards, we think that a good case can be made that the GNAT approach is clearly superior. Certainly it meets the important goals of being consistent with the Ada standard, and being far less unfamiliar to non-Ada programmers. We also think it's much easier to understand than the conventional library based model.