home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C/C++ Users Journal 1990 - 1995
/
CUJ.iso
/
mac
/
1994.txt
< prev
next >
Wrap
Text File
|
1996-02-07
|
3MB
|
88,156 lines
A Short Floating- Point Type in C++
William Smith
William Smith is the engineering manager at Montana Software, a software
development company specializing in custom applications for MS- DOS and
Windows. You may contact him by mail at P.O. Box 663, Bozeman, MT 59771- 0663.
Introduction
Even though a typical microcomputer can have up to ten times the memory of one
just a few years ago, there are still programming problems where memory is a
limiting factor. I frequently bump into memory limitations in embedded and
data acquisition applications. Numerous times I have had to work with a large
quantity of floating- point numbers in a confining space. A common situation
is the acquisition of large amounts of data through a 14- bit (or smaller) A-
to- D (Analog to Digital) converter.
Storing these numbers as 32-bit floats always seemed like overkill to me and a
waste of space. This was especially annoying when I had to store tens of
thousands of points in an array and would hit some kind of a memory limitation
such as a segment boundary, physical memory limit, or even a file or disk size
limit. The standard float type works, but it represents a poor match to the
problem to be solved. Matching the floating- point size to what an application
needs can result in significant memory savings in data- intensive programs.
I really only needed a 16-bit floating- point type instead of the native
32-bit float. At first, I played some games and stored the data as short int.
But this forced me to convert the data to float to do anything useful with it.
I wanted a short floating-point type. I even implemented one, albeit crudely,
in C. C allowed me to do it, but the conversion process never was clean or
transparent. With C++, I was finally able to do what I wanted. I was able to
create a short floating-point type that I could use naturally in my
applications. C++ can hide all the dirty work, such as conversions.
The new type, which I call sfloat, even allowed me to control range and
precision. Some situations called for a floating-point type that ranged
between 0 and 10.0 and maximized the precision within that range. Other
situations required a larger signed range but less precision. Being able to
tailor the characteristics of the type to meet an application's needs was a
practical feature I built into sfloat,
I implemented the sfloat type in "Standard C++" (if there is such a beast).
The code works with Microsoft C++ and Borland C++ under MS-DOS and MS-Windows.
It has some dependencies on the size of the standard types float, unsigned
short int, and long. It assumes that:
a float is 32 bits
an unsigned short int is 16 bits
a long is 32 bits
It also assumes that the float type is that defined by the IEEE standard for
32-bit floating-point values. Table 1 gives the IEEE details. As long as a
compiler and operating system conform to these restrictions, the code for
sfloat will probably work in other environments.
Implementation
Listing 1, sfloat.hpp, defines a C++ class called sfloat. The class has
numerous private static members, one protected member and numerous public
member functions. There are even some non-member functions prototyped in
sfloat.hpp.
The static data provides a workspace for conversion between sfloat and float.
This static data is class specific. All instances, or objects, of class sfloat
share the same static data. The protected member s is the only object instance
data. This member is unique to each instance of sfloat. In fact, the sizeof
operator will report the size of sfloat to be the size of this member, 2
bytes.
Constructors
One of the most elemental functions for a C++ class is the constructor. A
constructor has the same function name as for the class. Furthermore, you can
overload the constructor to provide construction from (conversion from)
different types. The sfloat class has three constructors.
sfloat();
sfloat(float f);
sfloat(sfloat& sf);
sfloat() defines the "default" construction of an sfloat object, such as on
the stack. The compiler would generate this function automatically if you do
not specify it. sfloat(float f) converts a floating-point number to an sfloat
to initalize the stored value. sfloat(sfloat& sf) initializes the new object
by making a copy of another sfloat object. These three constructors provide
the functionality needed to support the following declarations using sfloat.
sfloat sf1;
// uses sfloat();
sfloat sf2 = 1.0f;
// uses sfloat(float f);
sfloat sf3 = sf2;
// uses sfloat(sfloat& sf);
These three types of construction and initialization cover the minimum
required to use sfloat type naturally. The code for the constructor functions
resides in Listing 2, sfloat.inl. sfloat() and sfloat(sfloat& sf) are very
simple. On the other hand, sfloat(float f) has to do a bit of work. It has to
convert a float to an unsigned short and assign it to the object instance data
member s.
The conversion process used in sfloat(float f) truncates the mantissa bits to
a lower precision. It also lowers the range of the exponent by discarding
higher-order bits. The conversion process utilizes some of the static data
members of class sfloat as a work space and to hold intermediate values. The
bitwise shift operators << and >> move the bits that will be kept from the
float value into place before they are packed into an unsigned short.
Since none of the constructor functions allocate memory on the heap (free
store) using new there is no need to define a destructor function. C++ will
provide a default destructor that does nothing.
Conversion to float
We also need a way to convert an sfloat object to a float. To use conventional
notation, we need to define the operator function
sfloat::operator float()
Listing 2, sfloat. inl, contains the definition of this function. You will
notice that it's logic is just the reverse of sfloat:: sfloat(float f). The
shift operators once again move the bits of the sfloat into the proper
locations in the 32 bits of a float. The extra bits are filled with zeros.
Overloaded Operators
Operator overloading is one of the features of C++ that allow you to use new
defined types just like the standard existing types. Operator overloading is
not so much an object-oriented feature as a convenience. Table 2, an extract
from Listing 1, lists the operator functions defined for sfloat. This list
includes all the operators that one commonly uses on floating-point numbers.
These operator functions allow you to use objects of the class sfloat just
like you would a standard floating-point type.
Operator overloading is fairly straight-forward feature of C++ and covered
well elsewhere. I recommend the "Stepping Up To C++" series of articles on
"Operator Overloading" by Dan Saks (see CUJ January, March, May, and July
1992). I took a very simple approach to implementing these operators. I
convert to float, use the predefined operations, then convert back to sfloat.
For example, here is the code for the add-assignment operator:
inline sfloat &sfloat::
operator+=(sfloat sf)
{
float f = (float)*this;
f += (float)sf;
*this = (sfloat)f;
return ( *this );
} // operator+=
This techniques is not the most efficient (it has to do three type
conversions), but it sure is simple. My needs for the sfloat type were
data-size driven, not code-speed or code-size driven. Consequently I can live
with the overhead of all those conversions. If you cannot, you could rewrite
some of these routines to operate directly on the sfloat type.
I would like to emphasize that you can get trapped into inefficienc