C/C++ Users Journal 1990

home *** CD-ROM | disk | FTP | other *** search

/ C/C++ Users Journal 1990 - 1995 / CUJ.iso / mac / 1994.txt < prev next >

Wrap

Text File | 1996-02-07 | 3MB | 88,156 lines

A Short Floating- Point Type in C++ William Smith William Smith is the engineering manager at Montana Software, a software development company specializing in custom applications for MS- DOS and Windows. You may contact him by mail at P.O. Box 663, Bozeman, MT 59771- 0663. Introduction Even though a typical microcomputer can have up to ten times the memory of one just a few years ago, there are still programming problems where memory is a limiting factor. I frequently bump into memory limitations in embedded and data acquisition applications. Numerous times I have had to work with a large quantity of floating- point numbers in a confining space. A common situation is the acquisition of large amounts of data through a 14- bit (or smaller) A- to- D (Analog to Digital) converter. Storing these numbers as 32-bit floats always seemed like overkill to me and a waste of space. This was especially annoying when I had to store tens of thousands of points in an array and would hit some kind of a memory limitation such as a segment boundary, physical memory limit, or even a file or disk size limit. The standard float type works, but it represents a poor match to the problem to be solved. Matching the floating- point size to what an application needs can result in significant memory savings in data- intensive programs. I really only needed a 16-bit floating- point type instead of the native 32-bit float. At first, I played some games and stored the data as short int. But this forced me to convert the data to float to do anything useful with it. I wanted a short floating-point type. I even implemented one, albeit crudely, in C. C allowed me to do it, but the conversion process never was clean or transparent. With C++, I was finally able to do what I wanted. I was able to create a short floating-point type that I could use naturally in my applications. C++ can hide all the dirty work, such as conversions. The new type, which I call sfloat, even allowed me to control range and precision. Some situations called for a floating-point type that ranged between 0 and 10.0 and maximized the precision within that range. Other situations required a larger signed range but less precision. Being able to tailor the characteristics of the type to meet an application's needs was a practical feature I built into sfloat, I implemented the sfloat type in "Standard C++" (if there is such a beast). The code works with Microsoft C++ and Borland C++ under MS-DOS and MS-Windows. It has some dependencies on the size of the standard types float, unsigned short int, and long. It assumes that: a float is 32 bits an unsigned short int is 16 bits a long is 32 bits It also assumes that the float type is that defined by the IEEE standard for 32-bit floating-point values. Table 1 gives the IEEE details. As long as a compiler and operating system conform to these restrictions, the code for sfloat will probably work in other environments. Implementation Listing 1, sfloat.hpp, defines a C++ class called sfloat. The class has numerous private static members, one protected member and numerous public member functions. There are even some non-member functions prototyped in sfloat.hpp. The static data provides a workspace for conversion between sfloat and float. This static data is class specific. All instances, or objects, of class sfloat share the same static data. The protected member s is the only object instance data. This member is unique to each instance of sfloat. In fact, the sizeof operator will report the size of sfloat to be the size of this member, 2 bytes. Constructors One of the most elemental functions for a C++ class is the constructor. A constructor has the same function name as for the class. Furthermore, you can overload the constructor to provide construction from (conversion from) different types. The sfloat class has three constructors. sfloat(); sfloat(float f); sfloat(sfloat& sf); sfloat() defines the "default" construction of an sfloat object, such as on the stack. The compiler would generate this function automatically if you do not specify it. sfloat(float f) converts a floating-point number to an sfloat to initalize the stored value. sfloat(sfloat& sf) initializes the new object by making a copy of another sfloat object. These three constructors provide the functionality needed to support the following declarations using sfloat. sfloat sf1; // uses sfloat(); sfloat sf2 = 1.0f; // uses sfloat(float f); sfloat sf3 = sf2; // uses sfloat(sfloat& sf); These three types of construction and initialization cover the minimum required to use sfloat type naturally. The code for the constructor functions resides in Listing 2, sfloat.inl. sfloat() and sfloat(sfloat& sf) are very simple. On the other hand, sfloat(float f) has to do a bit of work. It has to convert a float to an unsigned short and assign it to the object instance data member s. The conversion process used in sfloat(float f) truncates the mantissa bits to a lower precision. It also lowers the range of the exponent by discarding higher-order bits. The conversion process utilizes some of the static data members of class sfloat as a work space and to hold intermediate values. The bitwise shift operators << and >> move the bits that will be kept from the float value into place before they are packed into an unsigned short. Since none of the constructor functions allocate memory on the heap (free store) using new there is no need to define a destructor function. C++ will provide a default destructor that does nothing. Conversion to float We also need a way to convert an sfloat object to a float. To use conventional notation, we need to define the operator function sfloat::operator float() Listing 2, sfloat. inl, contains the definition of this function. You will notice that it's logic is just the reverse of sfloat:: sfloat(float f). The shift operators once again move the bits of the sfloat into the proper locations in the 32 bits of a float. The extra bits are filled with zeros. Overloaded Operators Operator overloading is one of the features of C++ that allow you to use new defined types just like the standard existing types. Operator overloading is not so much an object-oriented feature as a convenience. Table 2, an extract from Listing 1, lists the operator functions defined for sfloat. This list includes all the operators that one commonly uses on floating-point numbers. These operator functions allow you to use objects of the class sfloat just like you would a standard floating-point type. Operator overloading is fairly straight-forward feature of C++ and covered well elsewhere. I recommend the "Stepping Up To C++" series of articles on "Operator Overloading" by Dan Saks (see CUJ January, March, May, and July 1992). I took a very simple approach to implementing these operators. I convert to float, use the predefined operations, then convert back to sfloat. For example, here is the code for the add-assignment operator: inline sfloat &sfloat:: operator+=(sfloat sf) { float f = (float)*this; f += (float)sf; *this = (sfloat)f; return ( *this ); } // operator+= This techniques is not the most efficient (it has to do three type conversions), but it sure is simple. My needs for the sfloat type were data-size driven, not code-speed or code-size driven. Consequently I can live with the overhead of all those conversions. If you cannot, you could rewrite some of these routines to operate directly on the sfloat type. I would like to emphasize that you can get trapped into inefficienc