home *** CD-ROM | disk | FTP | other *** search
- .sh 1 "IMPLEMENTATION ISSUES" 4
- .pp
- This section does not attempt to describe in detail the actual implementation
- of the POSTGRES optimizer.
- Rather, it focuses on important decisions made in building the optimizer.
- .sh 2 "Choice of Language"
- .pp
- Query optimization requires a great deal of element manipulation.
- The optimizer must separate, modify, and regroup elements within a query's
- target list and qualification to create new components
- for individual nodes in the plan tree.
- A programming language like LISP is well-suited for this type of processing
- because the language contains functions and data structures specifically
- designed for object manipulation tasks.
- Consequently, for ease of implementation, we chose to write the POSTGRES
- query optimizer in LISP.
- .pp
- Franz LISP, Opus 42 [FRAN85] was the selected LISP dialect.
- It was chosen because it was readily available and also because it supports
- a foreign function interface.
- A foreign function interface allows LISP code to call routines written
- in other programming languages.
- This feature is of utmost importance because
- the optimizer must call C routines
- to access information from database system catalogs,
- given that the access method code in POSTGRES is being written in C.
- .pp
- Franz LISP, Opus 42 is also fairly compatible with CommonLISP [STEE84], an
- emerging standard LISP dialect.
- Therefore, in the future, if translation from Franz to CommonLISP is necessary,
- this will require minimal effort.
- .pp
- In general, compiled LISP code executes less efficiently than compiled C code.
- Therefore, an optimizer written in LISP will execute more slowly than
- an optimizer written in C.
- This, however, is not a problem.
- As discussed in section 3.1, POSTGRES compiles query plans and caches,
- for later use,
- plans and tuples resulting from query preexecution.
- Because of these two features, a single query plan produced by the optimizer
- may be used several times.
- As a result, the cost of optimization is amortized
- over several executions.
- This significantly reduces a query's planning cost,
- yielding a figure that is minimal relative to
- the overall cost of execution.
- Therefore, in terms of optimizer efficiency,
- the choice of language is not a major concern.
- .sh 2 "Representing Query Plans in LISP"
- .pp
- In general, the cost of query processing constitutes the most
- significant portion of a query's execution cost.
- Therefore, the query processor must execute as cost efficiently as possible.
- To meet this goal, every node in the plan tree is constructed
- using one-dimensional arrays.
- These are known as \*(lqvectors\*(rq in LISP.
- Each element within a vector corresponds to some property of a node.
- By indexing appropriate vector entries, all properties can be accessed in
- constant time.
- .pp
- Among the properties within each plan node are the left and
- right subtrees of the node, target lists, and qualifications.
- The left and right subtrees either point to another plan node or
- nothing (\fBnil\fR in LISP).
- The target list and qualification entries respectively point to a
- list of individual target list elements and a list of restriction clauses.
- Lists are used to represent these structures because
- both sets of items are variable length,
- and random access to individual entries within these lists is not required.
- Each target list item consists of two items,
- also grouped together in a list.
- The first item in the list is a \*(lqresdom\*(rq node.
- It contains information about its corresponding
- target list entry \*-
- its type, destination, and if relevant, sort or hash information.
- Each resdom node is implemented using a vector.
- The second element, an \fIexpr\fR, is an arbitrary arithmetic expression
- consisting of
- variables, constants, parameters, functions, and operators.
- Each of these subcomponents is also a vector, and these vectors are linked
- together in a list if they
- represent the arguments to a particular operation or function.
- A restriction clause is a boolean \fIexpr\fR; therefore the preceding
- description applies to qualifications as well.
- .pp
- In addition, every plan node contains an empty slot that the query processor
- uses to store runtime-specific query information.
- Figure 4.1 shows the internal representation of a query plan that accesses
- two attributes and processes a single join clause using nested iteration.
- .(z
- .hl
- .GS
- file figure4.1
- .GE
- .sp
- .ce 2
- \fBFigure 4.1\fR
- Internal representation of a nested iteration join node
- .hl
- .)z
- .pp
- Constructs analogous to records in Pascal and \*(lqstructs\*(rq in C
- are used to build
- the different vector types associated with each node type.
- These constructs are called \*(lqdefstructs\*(rq in LISP.
- With defstructs, LISP programmers can combine primitive data types to create
- structured items.
- These new data structures, in turn, can be combined and nested to create even
- more complex structures.
- After defining a defstruct, LISP automatically provides a
- set of routines to dynamically create objects of these structured types
- and to access information within a structure.
- As a result, although a vector is the underlying data type used to
- implement defstructs, users can access node properties by specifying
- field names, as opposed to indexing vector entries.
- Figure 4.2 shows a Franz LISP defstruct definition and associated routines
- for a nestloop node.
- .(z
- .hl
- .(l
- .sz \n(ep
- .ft I
- ; Plan information common to all plan nodes
- .ft R
-
- (\fBdefstruct\fR (plannode
- (\fB:conc-name\fR get_))
- (state \fBnil\fR)
- targetlist
- (lefttree \fBnil\fR)
- (righttree \fBnil\fR))
- \fI; initialize lefttree, righttree, and state to nil\fR
-
- .ft I
- ; Nestloop node
- .ft R
-
- (\fBdefstruct\fR (nestloop
- (\fB:include\fR plannode)
- \fI; node contains defstruct defined above\fR
- (\fB:conc-name\fR get_)
- (\fB:constructor\fR make_nestloop (targetlist qual lefttree righttree)))
- (nodetype \*(lqNESTLOOP\*(rq)
- qual)
- .sp
- .ft I
- ;
- ; LISP routines provided as a result of the above definitions:
- ;
- ; Routines to retrieve property fields:
-
- (\fBget_state \fRnode)
- (\fBget_targetlist\fR node)
- (\fBget_lefttree\fR node)
- (\fBget_righttree\fR node)
- (\fBget_nodetype\fR node)
- (\fBget_qual \fRnode)
-
- \fI; Routine to construct a nestloop node:\fR
-
- (\fBmake_nestloop\fR targetlist qual lefttree righttree)
- .sz \n(pp
- .)l
- .sp 2
- .ce 2
- \fBFigure 4.2\fR
- Sample defstruct definition
- .hl
- .)z
- .sh 2 "Internal Data Structures"
- .pp
- In the process of creating possible query plans, the optimizer generates
- a great deal of information.
- To keep track of all this information, a more flexible structure of
- LISP is used \*- property lists.
- Every LISP object may have associated with it a list of characteristics,
- set by the user, called a property list.
- A major advantage of property lists is that one does not have to
- preallocate space for property slots, as required for defstructs.
- As a result, at any given time,
- every object may have an arbitrary number of properties associated with it.
- .pp
- Property lists are implemented using linked lists.
- Thus, access to information within a property list
- requires a linear search.
- The inefficiency of linear search is not a problem here because
- generally, the optimizer does not store any more than four or five items
- within a single property list,
- and as indicated in section 4.1, efficiency is not a primary
- consideration in this part of the system.
- Therefore, because property lists are simpler to work with, they are
- used extensively within the optimizer.
- .sh 2 "LISP-C Interface"
- .pp
- The foreign function interface in LISP is fairly easy to work with, provided
- a set of stringent rules are followed.
- For example, the standard way to pass structured information (e.g.
- a character string) from a C function is
- to return a pointer to the object.
- From here, the user can manipulate information within the
- object by referencing the pointer.
- This, however, will not work when LISP calls C because
- LISP cannot manipulate objects that C has allocated.
- It presents problems for the LISP garbage collector.
- .pp
- To work around this, C can return structured information by
- loading the data into variables that LISP has passed as parameters.
- Space for these return variables must be allocated by LISP prior to the C call.
- This is straightforward provided LISP knows the size of the returning object
- and can set aside a sufficient amount of memory.
- However, this is not always the case because tuples returned by
- C access method routines are variable length.
- .pp
- Fortunately, the optimizer never requires the contents of an entire tuple;
- on all occasions, it only needs a fixed set of attributes from
- within a single tuple.
- Therefore, rather than attempt to directly manipulate arbitrary tuples
- returned by access method routines, a layer written in C was built between
- the optimizer and the access method code.
- When the optimizer needs information from system catalogs, it calls
- some routine within this layer, which then calls access method routines to
- retrieve tuples.
- Desired information within these tuples are either returned explicitly
- as integers and floats, or they are passed back
- within variables allocated by LISP.
- .pp
- As an example, the optimizer may call a C routine, within the layer, called
- \*(lqretrieve_index\*(rq to retrieve information about a secondary index.
- In calling the routine, LISP passes a pointer to an integer array
- \*(lqindexinfo.\*(rq
- \*(lqRetrieve_index\*(rq then calls the access method routine \*(lqgetnext\*(rq
- until an appropriate tuple from the index catalog has been located.
- The index identifier, the number of pages in the index, and any other
- relevant information are extracted from the tuple and
- passed back to the optimizer in the array \*(lqindexinfo.\*(rq
- Consequently, variable length tuples are handled solely by C,
- resulting in a cleaner and simpler LISP-C interface.
- .sh 2 "An Evaluation of Using LISP"
- .pp
- Overall, writing the POSTGRES optimizer required about 6500 lines of LISP
- code and another 700 lines of C code.
- Having written the optimizer, using LISP was an excellent choice.
- There was a close match between our processing needs and the
- constructs and functions LISP provides.
- As a result, the programming effort was simplified.
- Had we used a language like C, we would have had to explicitly implement
- structures and routines equivalent to those LISP provides.
- .pp
- While writing the optimizer, it was also evident that
- other features of LISP were instrumental in simplifying code development.
- For instance, LISP allows you to either interpret or compile code written
- in the language.
- Naturally, compiled code is used in POSTGRES, but in developing
- the optimizer, the interpretive option was used.
- This significantly reduced development time because debugging was simpler and
- compilation time was eliminated.
- .pp
- LISP also supports dynamic allocation and implicit recollection of free
- space.
- The latter is implemented using garbage collection.
- As a result of these two properties, the optimizer can easily create
- objects of any type when needed, and LISP automatically handles memory
- management issues.
- .pp
- Last of all, LISP is a weakly typed language and
- because no single type is associated with variables in weakly
- typed languages, union types
- were implicit and variable declarations were unnecessary.
- This further resulted in simpler data structure definitions because
- declaration of field types was also unnecessary, as shown in figure 4.2.
- Another advantage of weakly typed languages is the absence of strict type
- checking.
- As a result,
- there is a certain degree of independence between the
- parameters a routine accepts and those that are actually passed.
- For example, if a routine accepts an identifier as a parameter but does
- not manipulate its actual value, then whether the identifier is an integer or
- string is not significant; choosing one or the
- other will not affect code within the routine.
- In many situations, this characteristic allowed us to make changes without
- modifying other relevant pieces of code.
- Changes could be made much more quickly as a result.
- So to briefly summarize, LISP was a simpler and much more flexible language to
- work with.
-