We're always interested in getting feedback. E-mail us if you like
this guide, if you think that important material is omitted, if you
encounter errors in the code examples or in the documentation, if you
find any typos, or generally just if you feel like e-mailing. Mail to
Frank Brokken
or use an
e-mail form.
Please state the concerned document version, found in
the title. If you're interested in a printable
PostScript copy, use the
form, or
better yet,
pick up your own copy by ftp
from ftp.icce.rug.nl/pub/http.
When using a base class pointer to address an object of a derived class, the
pointer type (i.e., the base class type) normally determines which actual
function will be called. This means that the code example as from section
10.5.2 which uses the storage class VStorage
, will incorrectly
compute the combined weight when a Truck
object (see section
10.2 is in the storage --- only one weight field of the engine part of
the truck is taken into consideration. The reason for this is obvious: a
Vehicle *vp
calls the function Vehicle::getweight()
and not
Truck::getweight()
, even when that pointer actually points to a
Truck
.
However, the opposite is also possible. In C++ it is possible for a
Vehicle *vp
to call a function Truck::getweight()
when the pointer
actually points to a Truck
.
The terminology for this feature is polymorphism:
it is as though the pointer vp
assumes the type of the object it points
to, rather than keeping it own (base class) type.
So, vp
might behave
like a Truck *
when pointing to a Truck
, or like an Auto *
when
pointing to an Auto
etc.. (In one of the StarTrek movies, Cap.
Kirk was in trouble, as usual. He met an extremely beautiful lady who however
thereupon changed into a hideous troll. Kirk was quite surprised, but the lady
told him: ``Didn't you know I am a polymorph?'')
A second term for this characteristic is late binding. This name refers to the fact that the decision which function to call (a base class function or a function of a derived class) cannot be made compile-time, but is postponed until the program is actually executed: the right function is selected run-time.
Vehicle*
will activate Vehicle
's member functions, even when
pointing to an object of a derived class. This is referred to as early or
static binding, since the type of function is known compile-time. The
late or dynamic binding is achieved in C++ with
virtual functions.
A function becomes virtual when its declaration starts with the keyword
virtual
. Once a function is declared virtual
in a base class, its
definition remains virtual
in all derived classes; even when the keyword
virtual
is not repeated in the definition of the derived classes.
As far as the vehicle classification system is concerned (see section
6 ff.) the two member functions getweight()
and
setweight()
might be declared as virtual
. The class definitions
below illustrate the classes Vehicle
(which is the overall base class of
the classification system) and Truck
, which has Vehicle
as an
indirect base class. The functions getweight()
of the two classes are
also shown:
Note that the keyword virtual
appears only in the definition of the base
class Vehicle
; it need not be repeated in the derived classes (though a
repetition would be no error).
The effect of the late binding is illustrated in the next fragment:
Since the function getweight()
is defined as virtual
, late binding
is used here: in the statements above below the (1)
mark, Vehicle
's
function getweight()
is called. In contrast, the statements under
(2)
use Truck
's function getweight()
.
Statement (3)
however will produces a syntax error. A function
getspeed()
is no member of Vehicle
, and hence also not callable via
a Vehicle*
.
The rule is that when using a pointer to a class,
only the functions which are members of that class can be called.
These functions can be virtual
,
but this only affects the type of binding (early vs. late).
virtual
in a base class (and hence in all
derived classes), and when these functions are called using a pointer to the
base class, the pointer as it were can assume more forms: it is polymorph. In
this section we illustrate the effect of polymorphism on the manner in which
programs in C++ can be developed.
A vehicle classification system in C might be implemented with
Vehicle
being a union of struct
s, and having an enumeration field to
determine which actual type of vehicle is represented. A function
getweight()
would typically first determine what type of vehicle is
represented, and then inspect the relevant fields:
A disadvantage of this approach is that the implementation cannot be easily
changed. E.g., if we wanted to define a type Airplane
, which would, e.g.,
add the functionality to store the number of passengers, then we'd have to
re-edit and re-compile the above code.
In contrast, C++ offers the possiblity of polymorphism. The advantage is
that `old' code remains usable. The implementation of an extra class
Airplane
would in C++ mean one extra class, possibly with its own
(virtual) functions getweight()
and setweight()
. A function like:
would still work; the function wouldn't even need to be recompiled, since late binding is in effect.
The fundamental idea of polymorphism is that the C++ compiler does not
know which function to call at compile-time; the appropriate function
will be selected run-time. That means that the address of
the function must be stored
somewhere, to be looked up prior to the actual call. This `somewhere' place
must be accessible from the object in question. E.g., when a Vehicle *vp
points to a Truck
object, then vp->getweight()
calls a member
function of Truck
; the address of this function is determined from the
actual object which vp
points to.
A common implementation is the following. An object containing virtual functions holds as its first data member a hidden field, pointing to an array of pointers holding the addresses of the virtual functions. It must be noted that this implementation is compiler-dependent, and is by no means dictated by the C++ ANSI definition.
The table of addresses of virtual functions is shared by all objects of the class. It even may be the case that two classes share the same table. The overhead in terms of memory consumption is therefore:
Consequently, a statement like vp->getweight()
first inspects the hidden
data
member of the object pointed to by vp
. In the case of the vehicle
classification system, this data member points to a table of two addresses:
one pointer for the function getweight()
and one pointer for the function
setweight()
. The actual function which is called is determined from this
table.
The internal organization of the objects having virtual functions is further illustrated in figure 7.
As can be seen from figure 7, all objects which
use virtual functions must have one (hidden) data member to address a table of
function pointers. The objects of the classes Vehicle
and Auto
both
address the same table. The class Truck
, however, introduces its own
version of getweight()
: therefore, this class needs its own table of
function pointers.
Vehicle
contained its own, concrete,
implementations of the virtual functions getweight()
and
setweight()
. In C++ it is however also possible only to mention
virtual functions in a base class, and not define them. The functions are
concretely implemented in a derived class. This approach defines a
protocol, which has to be followed in the derived classes.
The special feature of only declaring functions in a base class, and not defining them, is that derived classes must take care of the actual definition: the C++ compiler will not allow the definition of an object of a class which doesn't concretely define the function in question. The base class thus enforces a protocol by declaring a function by its name, return value and arguments; but the derived classes must take care of the actual implementation. The base class itself is therefore only a model, to be used for the derivation of other classes. Such base classes are also called ++abstract classes.
The functions which are only declared but not defined in the base class are
called pure virtual functions. A function is made pure virtual by
preceding its declaration with the keyword virtual
and by postfixing it
with = 0
. An example of a pure virtual function occurs in the following
listing, where the definition of a class Sortable
requires that all
subsequent classes have a function compare()
:
The function compare()
must return an int
and receives a reference
to a second Sortable
object. Possibly its action would be to compare the
current object with the other
one. The function is not allowed to alter
the other
object, as other
is declared const
. Furthermore, the function is not
allowed to alter the current object, as the function itself is declared
const
.
The above base class can be used as a model for derived classes. As an example
consider the following class Person
(a prototype of which was introduced
in section 5), capable of comparing two Person
objects by the alphabetical order of their names and addresses:
Note in the implementation of Person::compare()
that the argument of the
function is not a reference to a Person
but a reference to a
Sortable
. Remember that C++ allows function overloading: a function
compare(Person const &other)
would be an entirely different function
from the one required by the protocol of Sortable
. In the implementation
of the function we therefore cast the Sortable&
argument to a
Person&
argument.
other
object is. E.g., the function
Person::compare()
should make the comparison only
if the
other
object is a Person
too: imagine what the statement
strcmp(name, other.name)
would do when the other
object were in fact not a Person
and
hence did not have a char *name
datamember.
We therefore present here an improved version of the protocol of the class
Sortable
. This class is expanded to require that each derived class
implements a function int getsignature()
:
The concrete function Person::compare()
can now compare names and
addresses only if the signatures of the current and other object match:
The crux of the matter is of course the function getsignature()
. This
function should return a unique int
value for its particular class.
An elegant implementation is the following:
For the reader who's puzzled by our `elegant solution': the static int tag
defined in the Person::getsignature()
function is just one variable, no
matter how many Person
objects exist. Furthermore, it's created
compile-time as a global variable, since it's static. Hence, there's only one
variable tag
for the Person
class. Its address, therefore, is
uniquely connected to the Person
class. This address
is cast to an
int
which thus becomes the (unique) signature of Person
objects.
delete
releases memory which is occupied by a
dynamically allocated object, a corresponding destructor is called to ensure
that internally used memory of the object can also be released. Now consider
the following code fragment, in which the two classes from the previous
sections are used:
In this example an object of a derived class (Person
) is destroyed using a
base class pointer (Sortable*
). For a `standard' class definition this
will mean that the destructor of Sortable
is called, instead of the
destructor of Person
.
C++ however allows virtual destructors. By preceding the declaration of a
destructor with the keyword virtual
we can ensure that the right
destructor is activated even when called via a base class pointer. The
definition of the class Sortable
would therefore become:
Should the virtual destructor of the base class be a pure virtual
function or not? In general, the answer to this question would be no: for a
class such as Sortable
the definition should not force derived
classes to define a destructor. In contrast, compare()
is a pure virtual
function: in this case the base class defines a protocol which must be adhered
to.
By defining the destructor of the base class as virtual
, but not as
purely so, the base class offers the possibility of redefinition of the
destructor in any derived classes. The base class doesn't enforce the choice.
The conclusion is therefore that the base class must define a destructor function, which is used in the case that derived classes do not define their own destructors. Such a destructor could be an empty function:
A slight difficulty in multiple inheritance may arise when more than one
`path' leads from the derived class to the base class. This is illustrated in
the code fragment below: a class Derived
is doubly derived from a class
Base
:
Due to the double derivation, the functionality of Base
now occurs twice
in Derived
. This leads to ambiguity: when the function setfield()
is
called for a Derived
object, which function should that be, since
there are two? In such a duplicate derivation, many C++ compilers will fail to
generate code and (correctly) identify the error.
The above code clearly duplicates its base class in the derivation. Such a
duplication can be easily avoided here. But duplication of a base class can
also occur via nested inheritance, where an object is derived from, say, an
Auto
and from an Air
(see the vehicle classification system, section
6). Such a class would be needed to represent, e.g., a
flying car (such as the one in James Bond vs. the Man with the Golden
Gun...). An AirAuto
would ultimately contain two Vehicles
,
and hence two weight
fields, two setweight()
functions and two
getweight()
functions.
AirAuto
introduces ambiguity, when
derived from Auto
and Air
.
AirAuto
is an Auto
, hence a Land
, and hence a
Vehicle
.
AirAuto
is also an Air
, and hence a
Vehicle
.
The duplication of Vehicle
data is further illustrated in
figure 8.
The internal organization of an AirAuto
is shown in
figure 9
AirAuto
object.
The question of which member function getweight()
should be called, cannot
be resolved by the compiler. The programmer has two possibilities to resolve
the ambiguity explicitly:
Note the place of the scope operator and the class name: before the name of the member function itself.
getweight()
could be created for
the class AirAuto
:
The second possibility from the two above is preferable, since it relieves the
programmer who uses the class AirAuto
of special precautions.
However, besides these explicit solutions, there is a more elegant one. This will be discussed in the next section.
Vehicle
is present in one AirAuto
. The
result is not only an ambiguity in the functions which access the weight
data, but also the presence of two weight
fields. This is somewhat
redundant, since we can assume that an AirAuto
has just one weight.
We can achieve that only one Vehicle be contained in an AirAuto
.
This is done by ensuring that the base class which is multiply present in a
derived class, is defined as a virtual base class. The behavior of
virtual base classes is the following: when a base class B
is a virtual
base class of a derived class D
, then B
may be present in D
but
this is not necessarily so. The compiler will leave out the inclusion of the
members of B
when these are already present in D
.
For the class AirAuto
this means that the derivation of Land
and
Air
is changed:
The virtual derivation ensures that via the Land
route, a Vehicle
is
only added to a class when not yet present. The same holds true for the
Air
route. This means that we can no longer say by which route a
Vehicle
becomes a part of an AirAuto
; we only can say that there is
one Vehicle
object embedded.
The internal organization of an AirAuto
after virtual derivation is
shown in figure 10.
AirAuto
object when the base
classes are virtual.
With respect to virtual derivation we note:
Land
or
Air
with virtual derivation. That also would have the effect that
one
definition of a Vehicle
in an AirAuto
would be dropped.
Defining
both Land
and Air
as virtually derived is however by no means
erroneous.
Vehicle
in an AirAuto
is no longer
`embedded' in Auto
or Air
has a consequence for the chain of
construction. The constructor of an AirAuto
will directly call the
constructor of a Vehicle
; this constructor will not be called from
the constructors of Auto
or Air
.
Summarizing, virtual derivation has the consequence that ambiguity in the calling of member functions of a base class is avoided. Furthermore, duplication of data members is avoided.
AirAuto
,
situations may arise where the double presence of the members of a base class
is appropriate. To illustrate this, consider the definition of a Truck
from section 10.2:
This definition shows how a Truck
object is constructed to hold two
weight fields: one via its derivation from Auto
and one via its own
int trailer_weight
data member. Such a definition is of course valid, but
could be rewritten. We could let a Truck
be derived from an Auto
and from a Vehicle
, thereby explicitly requesting the double
presence of a Vehicle
; one for the weight of the engine and cabin, and
one for the weight of the trailer.
A small item of interest here is that a derivation like
is not accepted by the C++ compiler: a Vehicle
is already part of an
Auto
, and is therefore not needed. An intermediate class resolves the
problem: we derive a class TrailerVeh
from Vehicle
, and Truck
from Auto
and from TrailerVeh
. All ambiguities concerning the
member functions are then be resolved in the class Truck
: