OS/2 Shareware BBS: 8 Other

home *** CD-ROM | disk | FTP | other *** search

/ OS/2 Shareware BBS: 8 Other / 08-Other.zip / tutor.zip / TUTORIAL.TXT

Wrap

Text File | 1993-12-26 | 59KB | 649 lines

This article is designed to teach you about POET, an object-oriented database system for C++. We will start with an overview of POET and of object oriented databases, proceeding to source code fragments which illustrate how common database tasks are actually accomplished using POET. Information about how to contact POET Software is available at the end of the article. WHAT IS POET? POET is the best selling object oriented database system today.It has received a variety of awards, including Computer Language's Jolt award for product excellence, Computer Language's productivity award, and BYTE Magazine's award of distinction. POET is truly object oriented. It stores your C++ objects in the database. POET understands your C++ class declarations, so you never have to tell it how to load or store your objects. Because POET works with objects you never have to figure out how to translate your design into the two-dimensional tables used by conventional database systems. When you read an object back from the database it looks and acts exactly like the original object--it has the same data and functions. Even the pointers in an object are valid after loading it from the database; when POET loads an object it checks to see if there are any pointer references, loads any referenced objects to memory, and sets the pointer to the address of the object it just loaded. If you do not want this to happen automatically you can decide when referenced objects should be loaded by using POET's ondemand references. Would you like that in tech-speak? POET implements persistence for C++ objects, fully supporting encapsulation, inheritance, polymorphism, object identity, and references among objects. We will discuss these terms in this article, explaining the meaning of each. POET also gives you everything you expect from a database library-queries, device-independent storage formats, sharing objects on a network, nested transactions, and locking. POET uses sets of objects which are analogous to tables in relational systems, but sets hold real C++ objects. Every class stored in a database also has a set which contains all objects belonging to that class. Every POET query operates on a set of objects and returns a set of objects which can be used for further queries. Finally, POET gives you some innovative features. When you are performing queries on huge databases you can keep track of the progress and let the user abort the query if it is taking too long. Interested in what other users are doing with your objects? You.can install callback functions that will be invoked when someone else stores, deletes, locks, or unlocks an object. Do you want the most recent version of an object? You can ask POET to update your objects in your memory if someone else changes them in the database. POET is available on MS-DOS, MS-Windows, MS-Windows NT, Macintosh, OS/2, NeXT, SCO, Interactive, Sun, and various other flavors of UNIX. Both single-user and client/server versions are available. Heterogeneous networks are also supported--no matter what computer your application runs on, it can use any server on the network. Because POET uses the same database format for all systems, objects stored by one computer can be used on any other computer. WHAT IS AN OBJECT ORIENTED DATABASE? Object oriented database systems are a powerful new tool for software developers. Unlike relational and table-oriented systems, they provide full support for the object oriented programming model used in languages like C++ and Smalltalk. This model is intuitive, good at modeling relationships, and very suitable for large software projects. Conventional databases are good at managing large amounts of data, sharing data among programs, and fast value-based queries. They are not very good at modeling the relationships among data; everything must be represented as series of two dimensional tables.An object oriented database combines the semantics of an object oriented programming language with the data management and query facilities of a conventional database system. This makes it easy to manage large amounts of data and to model the relationships among the data. If an object oriented database is integrated with an object oriented language then it should support the semantics of that language--relationships established in the program should automatically be represented in the database when objects are stored. This chapter discusses object oriented systems, databases and object oriented database systems. We will see that object oriented databases have significant advantages compared to conventional table-oriented and relational databases. Small applications will be less complex and easier to understand. Large or complex applications gain the simple, intuitive structure which may mean the difference between success and failure. The Object Oriented Programming Model The first object oriented programming language, Simula, was designed for simulations, as was C++. Natural modeling of objects and their relationships was a driving factor in the design of these languages. Programming in this model is fun; your programs consist of lots of objects asking each other to do things. The early popularity of object oriented languages like Smalltalk was due to the fact that people enjoyed programming in them. People who wanted to do software engineering used more bureaucratic and boring languages. Today object oriented techniques are often seen as a modern form of software engineering. Over the years we have come to realize that modeling is a good way to develop well-structured software systems. The same structures that help us to express the semantic relationships among objects can be used to design programs that are modular, contain well defined interfaces, and are structured along the lines of the problem that is to be solved. Intelligent use of the object oriented programming model results in programs that are better structured and easier to understand. To understand why this is so we must introduce you to the basics of the object oriented paradigm. Classes and Objects Traditional programming languages allow related data to be grouped using data structures. Related code may be grouped by placing it in one program file. A data structure is often directly related to a set of functions which provide it with a certain behavior. For instance, a C program might have a data structure called a circle. This circle might have a radius, a center, a color, and a width and color for drawing the margin. A set of functions will also have to be written for manipulating circles; these functions might draw the circle on the screen, resize the circle, or change its color. These functions make sense only in relation to the circle data structure. In a traditional programming environment we tend to think of the data structure as the circle and the functions manipulate the circle. However, nothing in the data structure is round; it is actually just a set of parameters that can be used to create a circle. The function that draws a circle converts the data structure into a circle on the screen. Neither the data structure nor the functions is a circle by itself. Object oriented programs allow the programmer to combine related code and data in one structure called an object. The definition of this object is called a class. In our example there might be a class called 'circle' which contains both the data for a circle an the functions needed to draw it, change it, or report its characteristics. For instance, this could be the class declaration for our Circle: class Circle { private: int radius; int center_x; int center_y; public: int Draw(); float ComputeArea(); void SetRadius ( int NewRadius); int SetCenter(int x, int y); ... }; Now we can create a circle, set its center and radius, and draw it on the screen: c.SetRadius (50); c.SetCenter (100, 200); c.Draw(); float area = c.ComputeArea() Classes and objects are simple and elegant even for small programs, but they really shine when used in large, complex projects. Encapsulation Encapsulation refers to two properties of objects which have already been alluded to. First, related code and data are grouped together into one entity called an object. This simplifies the structure of a program by stating explicitly how code and data are related. Second, the class definition can hide some of its members from the rest of the program. In our example all of the data structures are 'private.' This means that they can be accessed only by the functions which are members of the Circle class. The rest of the members are 'public', and these serve as the interface to the class. You can change anything in the private part of a class without affecting code that uses the class. If the public interface changes then code using the class may also have to be changed. Inheritance Our circle will probably have other data and functions which are needed by any shape. For instance, our circle may have a color, a position, or a line width, and there may be functions that change these values. If every shape has to have these then we can group them into a class called Shape: class Shape { private: Color shape_color; // Color is a struct or class Color line_color; int x_position; int y_position; int line_width; public: void SetColor (Color NewColor); void SetLineColor (Color NewColor) void SetLineWidth (int Width); void MoveTo (int x, int y); }; Now we can give our circle everything that a shape has simply by saying that a circle is a shape. The C++ syntax looks like this: class Circle : public Shape { private: int radius; ...; public: int Draw(); float ComputeArea(); void SetRadius (int NewRadius); int SetCenter (int x, int y); ... }; In object oriented systems we call this inheritance. Since a Circle is a Shape it inherits everything that a Shape has. For instance, we can create a circle and set its color and position: Circle c; c.SetColor (Chartreuse); c.SetMoveTo (100, 200); Inheritance simplifies the maintenance of code. If we decide to add a new function or data member to every shape we do not have to search through our files and make our changes to circles, ovals, squares, trapezoids, etc. Instead, we simply change the Shape class. Since all shapes are derived from Shape these changes affect every shape in the program. Polymorphism Every shape should be able to draw itself on the screen. It is not very easy to write a Draw() function which works for any arbitrary shape, so we will probably have to write a separate Draw() function for each shape. However, it would be nice to be able to write general purpose functions which work properly with any shape. For instance, we might want to write a function which moves an arbitrary shape: Move (Shape *shape, int new_x, int new_y) { shape->Erase(); shape->MoveTo(new_x, new_y); shape->Draw(); }; The function call shape->Draw() needs to be able to draw any shape, but we have already said that we need a different Draw() function for each shape. If the shape is a circle the function Circle::Draw() should be called, if it is a trapezoid then Trapezoid::Draw should be called. C++ implements polymorphism with virtual functions. In our example the Shape class has a virtual function called Draw(): class Shape { public: virtual void Draw(); }; class Circle : public Shape { public: virtual void Draw (); // Circle::Draw overrides Shape::Draw() }; Since Draw is a virtual function Circle::Draw() overrides Shape::Draw(). This means we can call our Move() function with the address of a circle and the Move() function will call Circle::Draw() and Circle::Erase(). User Defined Data Types In object oriented languages it is possible to add new data types. In fact, a new data type is just a class. For instance, we can define a class called Complex which contains both the data structures and the functions needed to implement complex arithmetic. When we define a new data type it is often nice to redefine operators like +, -, *, or /. In our current example, we might define these operators to support complex arithmetic. Now we can write statements like: Complex f = 2.43; f *= 3.141592654; Identity Conventional database systems distinguish entities by the values they contain. Object oriented systems give each object its own identity. It can be distinguished from any other object, even if another object contains exactly the same values, and references can be made to any object. In C++ the identity of an object is its address, which can be used in pointer references throughout the program. Natural modeling of relationships among objects The major problem in large, complex software systems is managing the relationships among components. Objects are well suited to the natural modeling of relationships, which is a major reason that object oriented languages are so useful for large projects. Computer scientists often talk about modeling the world with ISA and HASA relationships. A programmer ISA person, a laser printer ISA printer, and a help window ISA window. A programmer HASA salary, a laser printer HASA current font, and a help window HASA border. The object oriented programming model has direct support for these concepts. ISA is modeled through the class hierarchy; since a programmer ISA person, Person will be the base class for Programmer. HASA is modeled either by containment or by pointer references. For instance, since a laser printer HASA current font, our LaserPrinter object might contain a Font object or a pointer to a Font object. Our laser printer probably has lots of fonts besides the active one. We need some way to represent a set of references, which we might call a HASMANY relationship. Object oriented languages often do this through container classes, but there is no set of container classes available in all C++ implementations. Later we will show how we have solved this problem by implementing our own containers to be used with the database. Summary The strength of object oriented programming is that programs closely reflect the structure of the problem to be solved. Related code and data are grouped into objects, each of which has a clearly defined public interface. The logical relationships among objects can be explicitly stated using inheritance and polymorphism. Object oriented programs are modular, easy to understand, and easy to maintain. LIMITS OF CONVENTIONAL DATABASE SYSTEMS Database systems are designed for managing large amounts of data, and they provide many important features that object oriented programming languages do not: permanent storage, fast queries, sharing of objects among programs, device independent formats, and sophisticated error handling for database operations. Relational database systems (RDBS) and table-oriented systems based on B-Tree or Indexed Sequential Access Method (ISAM) are the standard systems currently used in most software development. Each requires that all data be portrayed as a series of two dimensional tables. The relational model defines the structures, operations, and design principles to be used with these tables. These systems are quite appropriate for some applications and were a real breakthrough in their time, but software developers are rapidly learning that life is not a series of two dimensional tables. The growing complexity of modern programs and the increasing use of dynamic data models have pushed traditional databases to their limit. The limited data models they support can result in significant software development costs since they do not allow program designs that closely match the problem domain. They are not even worth considering for some application areas like Computer Aided Design (CAD), Computer Aided Engineering (CAE), Multimedia and Office Automation. Limited data types Modern software systems often contain data types which are not easily modeled using such predefined types. For example, a CAD program might have an array of shapes, or a desktop publishing program might model a page as a series of frames which may contain bit maps, paragraphs, or vector drawings. We have already seen that object oriented programs allow us to declare new data types as needed. Conventional databases have a fixed set of data types. The better systems include both simple data types like INTEGER, FLOAT, or CHAR and complex data types like DATE, TIME, or CURRENCY. New data types cannot be added by the user. If your database does not have the data type you need you are stuck. Aggregate data types like arrays are rarely available. The only way to group data is to put it in a table. Limited modeling of data relationships In conventional database systems each item is represented as a row in a table. Tables may be accessed sequentially or by searching for values. The only way to express relationships among items is by setting values in the rows. In each table one or more columns is chosen as the primary key; this must be unique for each row in the table. For instance, the primary keys for a student, a teacher, and a class might each be represented as identification numbers. The relational model is weak when showing many-to-one relationships, which generally require the introduction of a new table. In our example, the only way to show which students are taking a class is to create an 'enrollment' table which has a row for each student and contains the student identification number and class identification number in each row. Since relational databases have no concept of hierarchy it is difficult to model the ISA relationship. Suppose we have a 'people' table, a 'students' table, and a 'teachers' table. Every student is also a person, and some of his fields are in each table. To update all of a student's information you must find the rows of each table whose identification numbers match. Every level in the hierarchy requires a new table, and every program using the database must update every relevant table appropriately. The hierarchy is not explicitly represented in the database; you simply have to know why the various tables are there. No way of grouping code with data We have already seen that object oriented programming languages allow related code and data to be combined to form objects. There is no way to do this in a conventional database system. If you know the name of a table you may use it, and the system will not prevent you from changing the wrong table. As long as you have the right password everything in the database is globally accessible to all of your code. Limited manipulation of data Database languages are often very poor at manipulating data. SQL, for instance, does not allow you to perform computations on your data as input to a query, nor does it allow you to perform computations on the result of a query. To be formal we would say that SQL is not computationally complete even though it is relationally complete; a normal human being might say that SQL is great for searching but lousy for anything else. Because of this, most serious applications are written in conventional programming languages using some kind of SQL-based interface to the database. Poor integration with programming languages In the last paragraph we mentioned that most serious database applications are written in conventional programming languages. Since the database and the host programming language use two different models and different data types the programmer must either perform all operations directly in the database or constantly convert between the two systems. The first method does not let the programmer use many features of the host language; the second means a great deal of overhead and frustration since the relationships among data must be constantly converted to support both programming models. Such a program has two distinct designs, one for the program itself and one for the database. Summary To store data in a conventional database it must be dissected into a series of two dimensional tables. Only predefined data types are supported. Object oriented programming languages have a rich set of features for creating data types and representing the relationships among data which are not supported in such databases. POET: PERSISTENCE AND OBJECT-ORIENTED DATABASE FOR C++ An object oriented database is a database which fully supports the object oriented model. Like an object oriented programming language, it is designed for expressing the relationships among data. Like a conventional database, it is designed for managing large amounts of data and fast value-based queries. Persistence is a language extension which allows the programmer to store and retrieve objects. We have chosen to implement persistence as an extension to C++, which is widespread, portable, efficient, easily extensible, and particularly good at expressing the relationships among objects. You program in standard C++ and use your favorite compiler. Our language extensions are limited to the declaration syntax for persistent classes. We provide a precompiler which converts your persistent classes to ANSI C++ code and a class library which implements the object oriented database. The rest of this section briefly discusses the basic features of POET. It is somewhat abstract--another section, which is named A POET Tutorial will give you a concrete understanding of POET with short programs that show it actually being used. Persistence The original implementation of Smalltalk had a simple method for storing objects; the program's entire memory image could be dumped to disk and restored when running the program later. This scheme has some real advantages. It is very simple to implement, requires almost no effort from the programmer, and fully implements all aspects of the programming language (after all, the program sticks everything in memory somewhere!). It also has some real disadvantages. The number of objects that can be stored depends on the amount of available main memory, only the whole programming context may be stored and retrieved, objects may not be shared among programs or retrieved on another kind of computer, and there is no way to implement intelligent error recovery. In POET, a class is persistent if it is defined using the 'persistent' keyword. Every object of a persistent class has the ability to store itself in a database. Persistence and object orientation If you store an object in a database and read it back it should behave exactly as though it had never been stored. The object you read from the database must have the same identity, encapsulation, inheritance structure, polymorphy, and references as the original object. Many "object oriented" database systems flunk this test! POET correctly handles all aspects of an object's identity and behavior. Resolving references POET automatically converts your C++ pointers/references to a form that can be stored in the database. The objects or data to which they refer is also stored. This means that POET does not just store objects, it also stores the relationships among objects that are found in your C++ program. When you read an object from the database all references are resolved, the referenced objects or data are loaded into memory, and your pointers are set to the appropriate RAM addresses. This means that you can access objects directly using the C++ pointers in your object-everything you need is sitting at the right place in your RAM! If your data structures are densely connected or large then you may want to decide when to load referenced data and objects. POET allows you to do this with on demand references. To do this correctly POET needs to know the location and type of all pointers and references in your persistent objects. The PTXX precompiler parses your persistent class definitions and stores your class definitions in the database. One-to-many references Objects often need to reference many objects. For instance, a father may have many children. C++ does not have a standard way for expressing one-to-many relationships. POET provides a container class called a set which can be used to hold a variable number of items. You can place sets in your objects to hold references when you have one-to-many relationships. Queries You can find objects in your database using queries. The result of a query is stored in a set. This set can be sorted based on any values in the object. Queries can also be performed using the values of objects referenced by an object. To speed up data access you can define indexes for your classes. Object management Each object may exist only once in memory. This ensures that changes made to an object in one part of a program will not be overwritten by another part of the same program. POET is careful to avoid duplicating objects. Whenever a database operation would load an object POET first checks to see if it is already in memory. If so it simply returns a pointer to the existing object. Since each object may have any number of references to it, it is not safe to simply delete the object. But your memory fills up quickly if you never delete anything! POET keeps track of the number of references made to each object with a counter. When you are done with a reference you call its Forget() method. If there are no other active references then the object is deleted. Transient members Objects sometimes contain data or references to data that should not be stored. For instance, an object may contain a pointer to a bit image which is needed only temporarily. You can define these members to be transient so that they will not be stored in the database. Summary POET is integrated with the C++ programming language. It lets you program in standard C++ using your favorite compiler. POET is fully object oriented--the objects you read from the database look and act just like the objects you stored. Moreover, POET automatically resolves your pointer references and stores referenced objects and data in the database. This means that POET does not just store objects, it also stores the relationships among objects that are found in your C++ program. When you load an object these relationships are restored. POET also allows you to do value-based queries as in conventional database systems. SECTION 2 : WHAT CAN YOU EXPECT FROM POET? POET gives you an object oriented database with full support for the semantics of C++. It is powerful and easy to use. When we say that POET is object oriented we mean that it uses classes and objects to provide these features: * Encapsulation * Inheritance * Polymorphism * User-defined data types * Identity * Natural modeling of relationships among objects When we say that it is a database we mean that it provides these features: * Long term storage * Large capacity for storage * Value-based queries * Sharing objects among programs * Device independent formats * Transactions * Locking * Sophisticated error handling for database operations We find that an object oriented database should support: * Resolution of references in the program * One-to-many references * Value-based queries with sorted results * Indexes * Intelligent object management * Transient members Object oriented programming is well suited for large and complex programs. Databases are well suited for large amounts of data and for fast valuebased queries. POET gives you both. When you use POET you have all the tools you need for complex programs which access large amounts of data. A POET TUTORIAL The rest of this article shows you what you can do with POET. We use a series of short programs to illustrate POET's major features. The coverage is brief, but we try to show you why each feature is provided and how you can use it in your programs. Numerous code fragments are provided. When you have finished reading this document you will have a pretty good idea what programming in POET is all about. You will learn how to declare and store persistent objects, find all objects of a particular class, navigate using pointer references, represent many-to-one references, perform value-based queries, sort sets, manage objects, and initialize transient class data members. This will introduce you to most of the vocabulary and concepts used in POET. Storing an object A class is persistent if it is declared using the 'persistent' keyword: persistent class Person { private: char name [30]; short age; Address address; public: .... }; The above class declaration is just a normal C++ class declaration except for the 'persistent' keyword, but your compiler can not read it. Persistent class declarations must be placed in separate header files, which generally use the .HCD extension. These files are compiled with POET's PTXX precompiler, which creates standard C++ header files for you to include in your application, creates the database, and registers classes in the database's class dictionary. After you run PTXX you can include the generated header files in your application, create an object of this type, assign it to a database, and store it: #include <stdlib.h> #include <poet.hxx> // General include file. #include "base.hxx" // BASE.HXX is generated by PTXX for // the database BASE #include "person.hxx" // PERSON.HXX is generated by // PTXX when it processes PERSON.HCD main () { PtBase objbase; objbase.Connect( "LOCAL" ); // Connect to the server objbase.Open( "test" ); // Open the database named test. Person * Susie = new Person( objbase ); Susie -> Store(); objbase.Close(); objbase.DisConnect(); delete Susie; } AllSets: Finding all objects of a class How do we find the people that have been stored in a database? When PTXX encounters a persistent class type declaration it creates a set (or container class) to hold all objects of that type. This set is called an AllSet and it represents all objects of a particular type that have been stored. AllSets are one of the tool classes that PTXX can automatically generate for you. For the persistent class person, PTXX will generate a class named PersonAllSet. You can step through the AllSet sequentially to find all the objects of a given type: PersonAllSet * allPersons = new PersonAllSet( "objbase" ); Person * thisPerson; long i; for ( i=0; allPersons -> Get( thisPerson, i, PtSTART) == 0; i++ ) { thisPerson -> DoSomething(); allPersons -> Unget( thisPerson ); // deletes the object if nobody } // else is using it. delete allPersons; An AllSet also contains objects whose classes are derived from the class. If we have classes called Student, TaxCollector, and Mortician that are derived from Person then all members of these classes are also found in the PersonAllSet. POET preserves polymorphism: if a Mortician is read from the PersonAllSet then it is a complete Mortician with all the right member functions. Note that we call Unget() when we are done with the object. Every object you read takes up memory, so you should get into this habit. Finding objects using pointer references C++ programs often use pointers and references to show the relationships among objects. We want our object-oriented database to be able to resolve these references so that the relationships it finds in the program can be maintained in the database. Suppose that we represent the parents in our Person class with pointers: persistent class Person { .... public: Person * father; Person * mother; .... }; If we have stored a person in the database we would like to be able to read a Person from the database and access the parents directly using these pointers. In other words, the fact that we got this person from a database should not change the way we structure the relationships in the program. This raises two problems: our database needs to be able to find pointers in user-defined objects, and it needs to be able to represent pointer references in an address-independent manner. Finding the references To find pointers in user-defined objects we need access to the declaration of the class. Since compilers have no standard way to pass this information on we have decided to implement our own precompiler which parses all persistent classes and registers their declarations in a class dictionary contained in the database. Class dictionary declarations are also generated for any non-persistent class contained in or referenced by a persistent class. This allows us to intelligently follow the connections found in the C++ program in a manner similar to that used by source level debuggers. If a pointer is set to any value other than zero then POET assumes it is set to an object of the correct type. Uninitialized or dangling pointers can cause problems when you store an object. Make sure that any unused pointers are set to zero--the best way to do this is usually in the constructor for the class. Although it is not necessary for POET, we feel that a constructor should usually initialize all members of an object so that they do not contain random garbage. C++ has no default initialization for the data in an object. Object Identity Now that we can follow these references, how do we represent them in our database? In a C++ program references contain the address of an object. There is only one object at a given address. To be formal, we can say that the address of an object is the object's identity. Suppose that several children all have pointers to one father. Since the pointers all contain the same address we know that they are all talking about the same father. Obviously, the physical address of an object is not very useful in a database; after all, when we retrieve an object from the database it will probably be loaded to a different address, and addresses are terribly machine-dependent. Neither can we depend on the contents of an object to establish its identity; someone might read an object, change its contents, and write it back to the database. Instead, the system will automatically generate a unique identifier for each object when it is first assigned to the database. This identifier is called the object identity. It is used in all references to the object, and can be thought of as the logical address of the object. In our example, each child has a pointer which identifies the father. This would be stored using the father's object identity. The object identity is similar to certain concepts used in conventional database systems. If you have worked with relational database systems then you may want to think of it as the object's primary key; however, the object identity is automatically generated by the system and is never recycled or used for an object of another type. If you have worked with hierarchical or network databases then you are used to the idea of pointers in a database. The object identity is not, however, an address on your hard disk. It is a logical identifier which is associated with a hard disk address. Multiple references to a single object Now that we have object identity we can use it to resolve multiple references to the same object. Consider the following example: Person Adam( objbase ), Cain( bjbase ), Abel( objbase ); . . . Cain.father = &Adam; Abel.father = &Adam; Cain.Store(); // Also stores Adam! Abel.Store(); // Adam has already been stored! The database needs to be smart enough to store only one Adam. When Cain is stored Adam is also stored because Cain references Adam and Adam has not yet been stored. When Abel is stored the system can see that Adam has an object identity, which means that he already exists in the database. The system must update the existing Adam instead of creating a new one. Restoring references So far we have ensured that our database understands the pointer references used by C++, but we also have to ensure that these references are restored when an object is read from the database. In our example, Cain has a pointer which is supposed to refer to his father. If we want to keep our C++ programmers happy we had better make sure that Adam is also loaded into memory and initialize Cain's father pointer to the appropriate address. Ondemand References In the last section we stated that references should be restored automatically when loading an object from the database. This can cause some memory management problems. As long as our example stops at Cain and Abel there is no problem, but a large genealogy tree may not fit into memory. We need to have some way to avoid loading large networks when everything is connected to everything else. One way to do this is with transparent buffering. We could overload the pointer operator and load the object before returning the address. This is elegant, but it has one significant drawback; there is no way to return an error message if, for instance, the reference is to an object that no longer exists. POET solves this problem by implementing a class called 'ondemand' which can be used to store references which should not be automatically resolved. We use template syntax to specify the data type for the reference, but our precompiler generates appropriate C++ version 2.0 code since many compilers do not yet support templates). An ondemand reference can be declared like this: persistent class Person { public: ondemand< Person > odChild; }; If I have a father and a child, and the father is assigned to the database: Person * Father = new Person; Person * pChild = new Person; Father -> Assign( objbase ); Then I can set the reference for the ondemand to say which child is the father's child: Father -> odChild.SetReference( pChild ); Father -> Store(); When I want to load the child into memory I can do this using the GetReference method: Person * p2Child; Father -> Child.Get( p2Child ); // p2Child now points to the child Dependent objects Sometimes an object only makes sense in relationship to another object. We might decide that Adam's children are only relevant to our application if Adam is in the database, that employees are only relevant if the firm is in the database, or that the parts of an engine are only relevant if the engine is in the database. These objects should be deleted if the parent object is deleted. For instance, if a supplier stops selling parts for a particular engine, all of these parts should disappear from the catalog when the engine is deleted. One way to do this is to make the dependency explicit in the class declaration: persistent class Person { depend Person * alter_ego; }; In this example, a person's alter ego will no longer exist if the person is deleted. If Superman is killed then POET makes sure that Clark Kent also disappears. Wow! Sets: One-to-many references In a previous example each person contained pointers to his father and mother. Suppose we wanted each person to contain pointers to its children. Since a person may have many children a simple pointer will not do. An array is also inappropriate because we can not dimension an array when we have no idea how large it will be. However, it would be nice to be able to access items as though they were contained in an array. A linked list would also do the job, but we don't want to force our programmer to implement one. C++ programmers often solve such problems using container classes. For POET we defined a new data type called a 'set' which acts rather like an array but does not have a fixed number of elements. Like an array, our set should be able to hold objects of any data type, so we use templates (our precompiler generates type safe ANSI 2.0 container classes since many compilers do not yet support templates). The type is given when we declare a set: cset< Person * > people; // a set of people cset< int > integers; // a set of integers You may wonder why the declaration is 'cset' instead of just 'set.' Since small sets can be implemented much more efficiently than sets which do not necessarily fit into memory we actually implemented several kinds of sets. A cset is a compact set; all elements are managed directly, and if we are using a PC it all fits in one segment. We also have large sets, which can use all conventional memory on PCs, and huge sets, which are not limited by user RAM. Now let us assume that our Person class includes a set of children: class Person { public: cset< Person * > children; }; Now we can add children to the set and read them back: Person Adam, Cain, Abel; // Sets work even if not // assigned to a database Adam.children -> Append( &Cain ); // Cain is child 0 Adam.children -> Append( &Abel ); // Abel is child 1 Person * second_child; Adam.children -> Get( second_child,1L, PtSTART ); // Are there now two // Abel's or only one? You can also have ondemand references in a set: cset< ondemand< Person >> The Surrogate pointer table In the last example we have created a pointer to Adam's second child by getting it out of the set. We had better make sure that there is only one Abel in the system and that this pointer references it! If our system creates a second Abel then we have violated the C++ concept of identity. This means that we must know if an object is in memory when we get it from a list. If it is already in memory then we simply return a pointer to it; if not, we load it and then return the pointer. To do this POET uses the surrogate pointer table. A surrogate is a representation of the object's identity. This table associates an object's object identity with its current location in memory. Every object which is in memory appears in the surrogate pointer table if it has been assigned to a database. The Link Count Suppose we have several pointers to Abel in memory and someone decides to delete Abel. Every other pointer now points to an invalid object. Before deleting an object our system needs to know how many people are using it. POET keeps track of this using a counter. When a persistent object is created the counter is set to 1. If someone gets the object from a set the counter is incremented; it is now 2. The first person can now safely dispose of the object using the Forget() method--if the link count is greater than 1 then the object remains in memory. Queries: Finding objects using values Programmers frequently need to find objects based on the values they contain. If you are reading in words from a file you may want to know if you have already stored the word in your database, or you may need to find other words starting with the same letter. You may want to know which basketball players are shorter than Ernie Digregorio, which tennis players are older than John McEnroe, or which hockey players have fewer teeth than Gil Lafleur. This kind of query is easily done in most databases but not in most object-oriented languages. Because C++ does not provide us with any standard language mechanisms for queries we have implemented them with our own classes and methods. Each of your classes has different data members so our precompiler generates a query specification class for each class in the database; this class lets you specify any condition for any member of your clas to build query specifications for that class. Every query operates on a set of objects, and the result of a query is also a set of objects. Queries support the standard relational operators (less than, greater than, equal to, less than or equal to, and greater than or equal to) as well as the standard Boolean operators (and, or, not, and exclusive or). Parentheses are used to set precedence. Simple queries A query class contains methods for setting the conditions of a query. For example, the query class for a Person class will have one member function for each member of the Person class: persistent class Person { PtString name; // PtString is a variable length }; // string type in POET Then the precompiler will generate the following query class: class PersonQuery: public PtQuery { ... public: Setname( PtString &value, PtCmpOp condition = PtEQ ); ... }; We can use this query class to look for all people whose names start with the letter "M". We need a PersonAllSet since we want to look at all people, and we also need a set to hold the results and a PersonQuery class: In .hcd file: typedef lset< Person * > PersonSet; In source file: PersonAllSet * all = new PersonAllSet( objbase ); PersonSet * result = new PersonSet; PersonQuery q; q.Setname ( "M*", PtEQ ); // PtString supports wildcard comparisons all->Query ( &q, result ); delete all; delete result; Complex queries When sets or references are present in an object, the database must provide some way to perform queries based on them. POET uses nested queries for this; if you want to find all the parents whose children meet some criterion then you first create a query specification for the child, then install it as part of the parent's query specification. For instance, you may want to find all parents who have children with an I.Q. under 80: PersonAllSet * all = new PersonAllSet (objbase); PersonSet * result = new PersonSet; PersonQuery parent, children; children.SetIQ( 80, PtLT ); // Child's IQ below 80 parent.SetChildren( 1, PtGTE, &children ); // At least one child has // IQ below 80 all -> Query ( &parent, result ); delete all; delete result; In the above example the SetChildren() function uses the parameters 1and PtGTE to say "Greater than or equal to one child meeting this specification." Simple sorts When you perform a query you often want to specify the sort order of the result. Therefore, the query classes that the precompiler generates provide member functions for sorting: PersonQuery: public PtQuery { ... public: SortByname( PtSortOp mode ); ... }; If we wanted to sort another simple query example by the name of the parent then we could do this: PersonAllSet * all = new PersonAllSet( objbase ); PersonSet * result = new PersonSet; PersonQuery q; parent.SortByname( PtASCENDING ); q.Setname( "M*", PtGTE ); all -> Query( &q, result ); delete all; delete result; Complex sorts You could also ask to have these parents sorted according to the IQ of their children. That would look like this: PersonAllSet * all = new PersonAllSet( base1 ); PersonSet * result = new PersonSet; PersonQuery parent, children; children.SortByIQ(PtASCENDING ); // specifies the sort order // based on child's IQ ############# children.SetIQ( 80, PtLT ); parent.SetChildren( 1, PtGT | PtEQ, &children ); all -> Query ( &parent, result ); Indexes Value-based queries can be very slow, especially if they have to examine every record in the database. Indexes can dramatically speed up queries without changing the way they are programmed. If you suspect that a member of your class is likely to be used for queries then you can build an index on it by specifying the 'useindex' keyword in your class declaration and defining the index as a separate class: persistent class Person { .... public: PtString name; protected: useindex PersonNameIndex; .... }; indexdef PersonNameIndex : Person { name[[10]]; // Just the first 10 characters go in index }; A class may have any number of indexes. Don't get carried away, though--only use indexes for fields that are likely to be used for queries. Each index takes up space on your hard disk and forces POET to update the index tree every time an object is added or an indexed field is changed. If fast updates and small databases are important to you then you should use indexes sparingly. Queries provide flexibility at the cost of efficiency. Navigation is much faster than queries, and references require much less disk space than indexes. If you know what relationships exist in your data then you should use pointers. If you need to explore the current values of objects then you should use queries. If fast queries are more important than fast updates or small databases then you can build indexes on those members most likely to be used in query. Transient members You don't always want to store everything associated with an object. For instance, if you have a windowing system and use a window to display the object then you would not want to store the window structure in your database. An obvious solution to this problem is to allow members of a class to be declared as transient, which means that they are not stored when the object is stored: persistent class Person { private: PtString name; transient WINDOW * pWindow; public: Person() : pWindow( NULL ) {} }; When this object is read from the database the viewer will be undefined. Since this can cause unexpected results you will want to initialize this pointer using the class factory constructor as described in the next section. The class factory constructor POET assumes no responsibility for transient members, and they will not be initialized when an object is read from the database. The best place to do this is in the constructor that POET uses to build the object before reading it from the database. This constructor is normally hidden from the programmer; in most POET programs you write and use constructors for your persistent classes exactly like any other C++ constructors. However, our precompiler creates one additional constructor which is called the class factory constructor, and POET always calls this constructor whenever it reads an object from the database. For the Person class in our examples the class factory constructor looks like this: Person( PtBase * base, PtObjId * surr, Ptr2SurrTuple * & info ) : PtObject( base, surr, info ) {} This constructor takes three parameters and passes them on to the constructor of the persistent object's parent. Don't worry about the meaning of these parameters--PtObject knows what to do with them, and you don't have to. If you want to write your own class factory constructor then you should declare it in your persistent class declaration: persistent class Person { private: PtString name; transient WINDOW * víewer; public: Person( PtBase * base, PtObjId * surr, Ptr2SurrTuple * & info ) : PtObject( base, surr, info ) { viewer = new WINDOW; } }; PTXX will notice that you have provided your own class factory constructor and will not generate another one. Locking an Object If many programs can access a database at the same time then you may want to make sure that an object does not change while you are processing it. POET allows you to lock an object when you read it, to lock the results of a query, or to lock all objects in an AllSet. Various kinds of locks can be specified, and you can specify whether the lock should also affect other objects which your object references. This is not the place to show all the possibilities, but we will give one simple example. Suppose you want to prevent any other program from rewriting an object while you are processing it. You can set a lock when you read the object by specifying the lock in your Get() method. This lock can be lifted when you call Unget(). The code to do this looks like this: PtLockSpec LockSpec( PtLK_WRITEvWRITE,PtSHALLOW );// Write lock, shallow MyAllSet.Get( &p, 0, PtSTART, &LockSpec ); // Set the lock, get object p -> ChangeYourself(); p -> Store(); p -> Unget( &p, &LockSpec ); // Frees the lock Transactions Transactions allow you to make a series of changes tentatively. This simplifies error handling when a series of database operations are logically related--if any one operation fails then all of the changes can be undone. Once you start a transaction using PtBase::BeginTransaction() all database changes are kept in a transaction cache instead of being written in the database. If you call PtBase::CommitTransaction() then all changes in the transaction cache are written to the database. You can undo all changes made during the transaction by calling PtBase::AbortTransaction(). These changes are also lost if your program crashes. // objbase is an open database. int AddMarriedCouple( Person * pMan, Person * pWife ) { objbase -> BeginTransaction(); if ( pMan -> Store() != 0 || pWife -> Store() != 0) { objbase -> AbortTransaction(); // don't store one if you return -1; // can't store both } else { objbase -> CommitTransaction(); return 0; } } POET's Transactions nest, which means that they are well suited to the nested function calls that are so common in C++ programming. During nested transactions no changes are written to the database until the last transaction commits. Event handling Sometimes your program needs to know what is going on in the database. POET allows you to install functions in your program as callback functions; it then calls these functions when the conditions you have specified are met. There are two main uses for event handling: progress callbacks can be used to periodically tell your program what percent of a database operation has been completed, and watch & notify can be used to tell your program what other users are doing with objects in the database. Progress callbacks Suppose you have written a program which allows users to perform queries on a huge database. Your users will occasionally make stupid queries which take a long time to complete. For instance, a user may ask for all the books in the Library of Congress, sorted by price. If your application simply waits until the query finishes then your user has no idea what is happening; after a few minutes of waiting he starts asking himself how much longer the query will take, and wishing there were some way to cancel the request. You can install a member function of any persistent class as a callback function which POET then calls periodically during a database operation. Both the address of the object and the function must be specified when this function is installed. POET passes your function a parameter to tell it what percent of the operation has been completed, so your function must take an integer parameter. If your function returns PtEXCABORT then POET terminates the operation; if it returns PtEXCCONTINUE then POET continues: MyPersistentClass::PendingHandler(int percent) { Message("%d percent finished...", percent); if (pressed(CancelButton)) return PtEXCABORT; else return PtEXCCONTINUE; }; To install this function you use the SetActionPending() function, which is a member function of the exception manager. Every PtBase has its own exception manager: MyPersistentClass Obj; base->GetExcMgr()-> SetActionPending(&Obj, (PtFuncInt) Obj::PendingHandler); Once you have installed your function you can perform the query. While the query is being processed POET will call your pending handler, which tells the user what percent of the query has been completed. If the user presses the cancel button then your function tells POET to stop the query. Watch & notify Sometimes your program needs to know what other users are doing with the database. For instance, you may want to make sure that the objects you are using are updated whenever somebody saves a new version of an object in the database. Since you probably don't want to reread every object before you use it the best strategy is to have POET update the object in your memory. You may also want to notify your user that an object has been changed. In POET you can do this by setting a watch on the object. The term "watch" was chosen because this is analogous to setting a watch on data using a debugger. Suppose you have a class called Paragraph, and you want a paragraph to be refreshed if somebody else stores a new version of this paragraph. Paragraph has a function called Display() which shows the paragraph on the screen: persistent class Paragraph { public: .... Display(PtOnDemand* root,PtOnDemand* object, PtWatchMode mode); }; First you have to create a PtMethodRef which contains a pointer to the object and the member function which should be called. Then you create a PtWatchSpec to tell POET that you want it to update the object in your memory and call your display function whenever someone stores a new version in the database. Finally, you call the Watch() function in your object to set the watch: PtObject *p = new Paragraph(base); PtMethodRef Target( p, (PtFuncCallback) &Paragraph::Display ); PtWatchSpec WatchSpec( PtWATCH_UPDATE, PtDEEP, Target ); p->Watch(WatchSpec); Now the watch has been set. If anybody stores a new version of this paragraph then POET will update the object in your RAM to reflect these changes and call the paragraph's Display() function. SUMMARY Object oriented databases combine the semantic power of object oriented languages with the data management facilities of a database system. POET integrates its powerful database functionality with the semantics of C++, so you simply program in C++ using your favorite compiler. You don't have to learn a new programming language to use our database, and our API is simple and easy to learn. Since POET manages your C++ objects directly the programmer never needs to write code to translate your objects into two dimensional tables or to tell us how to load or store your objects. And POET is just as portable as it is powerful-it is available for a wide variety of operating systems and compilers, uses the same file format for all systems, works well in heterogeneous networks, and lets applications share objects simultaneously even when the programs are running on different operating systems. SALES INFORMATION To purchase POET or get more information about POET please contact: POET Software Corporation 4633 Old Ironsides Drive, Suite 110 Santa Clara CA 95054 Sales (408) 970-4640 Tech (408) 970 4647 Fax (408) 970-4630 Thank you for your interest in POET. Brad Sturtevant POET Software Technical Support Compuserve: 70402,74 Internet: 70402.74@compuserve.com brad@poettech.win.net