home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C/C++ Users Journal 1990 - 1995
/
CUJ.iso
/
unix
/
1995.txt
< prev
Wrap
Text File
|
1996-02-07
|
3MB
|
77,685 lines
Portable Byte Ordering in C++
Philip J. Erdelsky
Philip J. Erdelsky, Ph.D., is an R&D Engineer for Data/Ware Development, Inc.,
in San Diego, California. He has been writing software in C and C++ for more
than ten years. He can be reached at 75746.3411@compuserve.com.
Which End is Up?
I once facetiously asked a computer scientist where he stood on one of the
greatest issues facing computing science. He must have sensed my mood, because
he answered "squarely on the fence" before I could tell him what the issue
was.
The issue, of course, was byte order. Some CPUs, such as the Intel 80X86
family, store multi-byte words in little-endian order, with the least
significant byte (the little end) at the lowest address. Others, such as the
Motorola 680X0 family, store them in big-endian order, with the most
significant byte (the big end) at the lowest address. The terms "big endian"
and "little endian" are supported by a literary allusion to Jonathan Swift's
classic novel Gulliver's Travels, in which two nations fought a war to
determine whether soft-boiled eggs should be opened at the big end or the
little end.
Each byte order has its own small advantage. Due to historical accident,
numbers are written in big-endian order in English and other major western
languages. That makes a memory dump easier to read if big-endian order is
used. Addition, subtraction, and multiplication are done in little-endian
order, so a little-endian CPU sometimes has a slight speed advantage when
performing these operations in multiple precision.
Could the world standardize on one byte order in the foreseeable future? It
does not seem likely. The world cannot even decide which side of the road to
drive on. The difference becomes a problem when little-endian and big-endian
devices communicate with each other. It can be even more of a problem when
their operating code has to be ported from one CPU to another with a different
byte order.
Writing a conversion routine is no problem. An experienced C programmer can
whip one up in a minute. However, finding all the places where conversions are
required can be difficult, unless the code was written with conversion in
mind. That is where the techniques of C++ come in.
First of all, a communication standard has to be established. If two devices
communicate through a SCSI channel, all multi-byte values should be sent over
the channel in big-endian order, which is the SCSI standard. Then conversions
must be made to and from the CPU byte order, so the program can perform valid
arithmetic operations on the data.
The Types and Classes
Listing 1 shows the header file endian.h, which contains nearly all the code.
Listing 2 shows the file endian.cpp, which defines a useful union -- all the
other code needed besides the header. The header defines three simple types,
with names that are fairly standard:
BYTE -- a single unsigned byte
WORD -- a two-byte unsigned word
DWORD -- a four-byte unsigned double word
Variables of type WORD and DWORD are implicitly assumed to be in the order
appropriate for the CPU, so the program can compute with them freely.
The code also defines four classes of single and double words in specific byte
orders:
BEWORD -- a big endian WORD
BEDWORD -- a big endian DWORD
LEWORD -- a little endian WORD
LEDWORD -- a little endian DWORD
Of course, two of these types are substantially the same as WORD and DWORD,
but the programmer does not need to know that while coding. The restrictions
of C++ will prevent the program from performing arithmetic operations directly
on them. This is important because such operations will become invalid when
the program is ported to a CPU with a different byte order.
Conversions
Conversions from CPU order to big-endian or little-endian order are performed
by a member function or by a class constructor. For example:
LEWORD y(0xABCD);
BEWORD x;
x.set(0x1234);
It is also possible to overload operator=, but this can cause problems in some
implementations when unions containing these special types are initialized or
assigned.
Conversions from big endian or little endian order to CPU order are performed
by a member function called value. For example, the following code adds 3 to a
big endian WORD:
BEWORD x;
x.set(x.value() + 3);
An attempt to do this in a nonportable fashion will be flagged as a
compile-time error:
BEWORD x;
x = x + 3; // ERROR!
In this case, Turbo C++ reports, "Operator cannot be applied to these operand
type."
The compiler knows the byte order of the CPU on which its object code will
run, but will not reveal it at preprocessing time. If the programmer has this
information, the code can be made more efficient by defining either_BIG_ENDIAN
or _LITTLE_ENDIAN (but not both!) to indicate the byte order to be used. For
example, if _BIG_ENDIAN is defined, then x.set(0x1234), when x is of type
BEWORD, will generate the code for a simple assignment.
If neither _BIG_ENDIAN nor _LITTLE_ENDIAN is defined, the compiler will
generate less efficient code that will work on any CPU. For example, if x is
of type BEWORD, x.set(0x1234) will generate code that performs the following
operations:
y = 0x1234
first byte of x = y >> 8
second byte of x = y
If shifting is a particularly slow operation, it might be advisable to include
a quick test for byte order at run time, and skip the shifting if the byte
order of the CPU is the same as that of the word or double word being
converted.
If _RUN_TIME_ENDIAN is defined, the code will define a quick test, big_endian,
which returns a true value (1) if it is executed on a big_endian CPU and a
false value (0) otherwise. The code will also define a similar test called
little_endian. These tests are used to skip the shifts where possible. In most
implementations of C++, the tests involve no more code than testing a flag.
Indeed, that is precisely how they are implemented. The initialized union
_endian compiles with a 1 in either _endian.half[0] or _endian.half[1],
depending on the byte order of the CPU.
It is possible to overload operators, but it is generally more efficient to
convert all multi-byte values to the CPU's byte order and use the regular
operators. The only exceptions are operator== and operator!=, which do not
depend on byte order as long as both operands use the same byte order, and
tests against zero, which are implemented as the member functions zero and
nonzero.
All member functions have been defined inline, which makes them run fast and
generates absolutely no code for member functions that are not called. If
minimizing code size is desirable, it may be advisable to code some of them
separately.
Listing 1 The File ENDIAN.H
// Portable Byte Ordering in C++
// by Philip J. Erdelsky
// Public Domain -- No Restrictions on Use
// If the byte order of the target machine is known, include ONE of
// the following statements:
// #define _BIG_ENDIAN
// #define _LITTLE_ENDIAN
// If the byte order of the target machine is to be determined at run
// time for each conversion, include the following statement:
// #define _RUN_TIME_ENDIAN
#ifndef _ENDIAN
#define _ENDIAN 1
typedef unsigned char BYTE;
typedef unsigned short WORD; // two-byte word
typedef unsigned long DWORD; // four-byte double word
#ifdef _RUN_TIME_ENDIAN
extern union_endian_union
{
DWORD whole;
WORD half[2];
} _endian;
inline int big_endian(void) {return _endian.half[1];}
inline int little_endian(void) {return _endian.half[0];}
#endif
// check for consistent parameter definitions
#ifdef _BIG_ENDIAN
#ifdef _LITTLE_ENDIAN
#error _BIG_ENDIAN and _LITTL