Day 14: Advanced Data Manipulation

It is time to step back from the project at hand and go over some of the more advanced data-manipulation techniques that C++ inherits from C. Today you will learn

Bit Twiddling

Often, you will want to set flags in your objects to keep track of the states of your objects. (Has the object been initialized yet? Is this the first usage? Are you coming or going?)

You can set flags with user-defined Booleans, but when you have many flags, and when storage size is an issue, it is convenient to be able to use the individual bits as flags.

Each byte has 8 bits, so in a 4-byte long code snippet, you can hold 32 separate flags. A bit is set if its value is 1, and it is clear if its value is 0. When you set a bit, you make its value 1, and when you clear it, you make its value 0. You can set and clear bits by changing the value of the long, but that can be tedious and confusing.

C++ provides bitwise operators that act on the individual bits. These are presented in table 14.1.

Table 14.1 The Bitwise Operators

    Symbol     Operator

    &          AND

    |          OR

    ^          exclusive OR

    ~          complement

Operator AND

The AND operator (&) is a single ampersand, as opposed to the logical AND, which is two ampersands. When you AND 2 bits, the result is 1 if both bits are 1, but 0 if either bit is 0 or both bits are 0. The way to think of this is the result is 1 if bit 1 is set and if bit 2 is set.

Operator OR

The second bitwise operator is OR (|). Again, this is a single vertical bar, as opposed to the logical OR, which is two vertical bars. When you OR 2 bits, the result is 1 if either bit is set, or if both bits are set.

Operator Exclusive OR

The third bitwise operator is exclusive OR (^). When you exclusive OR 2 bits, the result is 1 if the two bits are different. The way to think of this is the result is 1 if bit 1 is set and bit 2 is clear, or if bit 1 is clear and bit 2 is set.

The Complement Operator

The complement operator (~) clears every bit in a number that is set and sets every bit that is clear. If the current value of the number is 1010 0011, the complement of that number is 0101 1100.

Setting Bits

When you want to set or clear a particular bit, you use masking operations. If you have a 4-byte flag and you want to set bit 8 TRUE, you need to OR the flag with the value 128; 128 is 1000 0000 in binary, so the value of the eighth bit is 128. Whatever the current value of that bit (set or clear), if you OR it with the value 128, you will set that bit and not change any of the other bits. Assume that the current value of the 8 bits is 1010 0110 0010 0110. ORing 128 to it looks like this:

                9 8765 4321
        1010 0110 0010 0110   // bit 8 is clear
    |   0000 0000 1000 0000   // 128
    -----------------------
        1010 0110 1010 0110   // bit 8 is set

You should note a few things. First, as usual, bits are counted from right to left. Second, the value 128 is all zeros except for bit 8, the bit you want to set. Third, the starting number 1010 0110 0010 0110 is left unchanged by the OR operation, except that bit 8 was set. Had bit 8 already been set, it would have remained set, which is what you want.

Clearing Bits

If you want to clear bit 8, you can AND the bit with the complement of 128. The complement of 128 is the number you get when you take the bit pattern of 128 (1000 0000), set every bit that is clear, and clear every bit that is set (0111 1111). When you AND these numbers, the original number is unchanged, except for the eighth bit, which is forced to zero:

        1010 0110 1010 0110   // bit 8 is set
    &   1111 1111 0111 1111   // ~128
    -----------------------
        1010 0110 0010 0110   // bit 8 cleared

To fully understand this solution, do the math yourself. Each time both bits are 1, write 1 in the answer. If either bit is 0, write 0 in the answer. Compare the answer with the original number. It should be the same except that bit 8 was cleared.

Flipping Bits

Finally, if you want to flip bit 8, no matter what its state, you exclusive OR the number with 128:

        1010 0110 1010 0110   // number
    ^   0000 0000 1000 0000   // 128
    -----------------------
        1010 0110 0010 0110   // bit flipped
    ^   0000 0000 1000 0000   // 128
    -----------------------
        1010 0110 1010 0110   // flipped back

DO and DON'T: Working with Bits

DO set bits by using masks and the OR operator.

DO clear bits by using masks and the AND operator.

DO flip bits using masks and the exclusive OR operator.

Bit Fields

There are circumstances under which every byte counts, and saving six or eight bytes in a class can make all the difference. If your class or structure has a series of Boolean variables, or variables that can have only a very small number of possible values, you may save some room by using bit fields.

Using the standard C++ data types, the smallest type you can use in your class is a type char, which is 1 byte. More often, you will end up using an int, which is 2 or, more often, 4 bytes. By using bit fields, you can store eight binary values in a char and 32 binary values in a long.

Here's how bit fields work: Bit fields are named and accessed like any class member. Their type always is declared to be unsigned int. After the bit field name, write a colon followed by a number. The number is an instruction to the compiler as to how many bits to assign to this variable. If you write 1, the bit will represent either the value 0 or 1. If you write 2, the bit can represent 0, 1, 2, or 3, a total of four values. A 3-bit field can represent eight values, and so on. Appendix C reviews binary numbers. Listing 14.1 illustrates the use of bit fields.

Listing 14.1 Using Bit Fields

    1:     #include <iostream.h>
    2:     #include <string.h>
    3:
    4:     enum STATUS { FullTime, PartTime } ;
    5:     enum GRADLEVEL { UnderGrad, Grad } ;
    6:     enum HOUSING { Dorm, OffCampus };
    7:     enum FOODPLAN { OneMeal, AllMeals, WeekEnds, NoMeals };
    8:
    9:     class student
    10:    {
    11:    public:
    12:        student():myStatus(FullTime),myGradLevel(UnderGrad),myHousing(Dorm),
               myFoodPlan(NoMeals){}
    13:        ~student(){}
    14:        STATUS GetStatus();
    15:        void SetStatus(STATUS);
    16:        FOODPLAN GetPlan() { return myFoodPlan; }
    17:
    18:    private:
    19:       unsigned myStatus : 1;
    20:       unsigned myGradLevel: 1;
    21:       unsigned myHousing : 1;
    22:       unsigned myFoodPlan : 2;
    23:    };
    24:
    25:    STATUS student::GetStatus()
    26:    {
    27:       if (myStatus)
    28:          return FullTime;
    29:       else
    30:          return PartTime;
    31:    }
    32:    void student::SetStatus(STATUS theStatus)
    33:    {
    34:       myStatus = theStatus;
    35:    }
    36:
    37:
    38:    void main()
    39:    {
    40:       student Jim;
    41:
    42:       if (Jim.GetStatus()== PartTime)
    43:          cout << "Jim is part time" << endl;
    44:       else
    45:          cout << "Jim is full time" << endl;
    46:
    47:       Jim.SetStatus(PartTime);
    48:
    49:       if (Jim.GetStatus())
    50:          cout << "Jim is part time" << endl;
    51:       else
    52:          cout << "Jim is full time" << endl;
    53:
    54:       cout << "Jim is on the " ;
    55:
    56:       char Plan[80];
    57:       switch (Jim.GetPlan())
    58:       {
    59:          case  OneMeal: strcpy(Plan,"One meal"); break;
    60:          case  AllMeals: strcpy(Plan,"All meals"); break;
    61:          case WeekEnds: strcpy(Plan,"Weekend meals"); break;
    62:          case NoMeals: strcpy(Plan,"No Meals");break;
    63:          default : cout << "Something bad went wrong!\n"; break;
    64:       }
    65:       cout << Plan << " food plan." << endl;
    66:
    67:    }
Output:
    Jim is part time
    Jim is full time
    Jim is on the No Meals food plan.

In lines 4 through 7, a number of enumerated types are defined. These serve to define the possible values for the bit fields within the Student class.

Student itself is declared in lines 9 through 23. Although this is a trivial class, it is interesting in that all the data is packed into 5 bits. The first bit represents the student's status: full-time or part-time. The second bit represents whether the student is an undergraduate. The third bit represents whether the student lives in a dorm. The final two bits represent the four possible food plans.

The class methods are written as for any other class, and are in no way affected by the fact that these are bit fields and not integers or enumerated types.

The member function GetStatus() reads the Boolean bit and returns an enumerated type, but this is not necessary. It could just as easily have been written to return the value of the bit field directly. The compiler would have done the translation.

To prove that to yourself, replace the GetStatus() implementation with this code:

    STATUS student::GetStatus()
    {
         return myStatus;
    }

There should be no change whatsoever to the functioning of the program. It is a matter of clarity when reading the code; the compiler isn't particular.

Note that the code in line 42 must check the status and then print the meaningful message. It is tempting to write this:

    cout << "Jim is " << Jim.GetStatus() << endl;

That will simply print this:

    Jim is 0

The compiler has no way to translate the enumerated constant PartTime into meaningful text.

In line 57, the program switches on the food plan. Then, for each possible value, it puts a reasonable message into the buffer, which then is printed in line 65. Note again that the switch statement could have been written as this:

    case  0: strcpy(Plan,"One meal"); break;

    case  1: strcpy(Plan,"All meals"); break;

    case  2: strcpy(Plan,"Weekend meals"); break;

    case  3: strcpy(Plan,"No Meals");break;

The most important thing about using bit fields is that the client of the class does not need to worry about the data-storage implementation. Because the bit fields are private, you can feel free to change them later, and the interface will not need to change.

Learning How Memory Works

When you call a function, the code branches to the called function, parameters are passed in, and the body of the function is executed. When the function completes, a value is returned (unless the function returns void), and control returns to the calling function.

How is this task accomplished? How does the code know where to branch to? Where are the variables kept when they are passed in? What happens to variables that are declared in the body of the function? How is the return value passed back out? How does the code know where to resume?

If you don't understand how this works, you will find that programming remains a fuzzy mystery and debugging (covered on day 20) is nearly impossible. The explanation requires a brief tangent into a discussion of computer memory.

Examining the Levels of Abstraction

One of the principal hurdles in programming is grappling with the many layers of intellectual abstraction. Computers, of course, are just electronic machines. They don't know about windows and menus, they don't know about programs or instructions, and they don't even know about 1s and 0s. All that is really going on is that voltage is being measured at various places on an integrated circuit. Even this is an abstraction; electricity itself is just an intellectual concept, representing the behavior of subatomic particles.

Few programmers bother much with any level of detail below the idea of values in RAM. After all, you don't need to understand particle physics to drive a car, make toast, or hit a baseball, and you don't need to understand the electronics of a computer to program one.

You do need to understand how memory is organized, however. Without a reasonably strong mental picture of where your variables are when they are created, and how values are passed among functions, it will all remain an unmanageable mystery.

Partitioning RAM

When you begin your program, your operating system (DOS, UNIX, OS/2, or Windows) sets up various areas of memory based on the requirements of your compiler. As a C++ programmer, you often will be concerned with the global name space, the free store, the registers, the code space, and the stack.

Global variables and their close cousins, static variables, are (surprise!) in global name space. Local variables are on the stack, and variables you create with operator new are on the heap. All this gets much more complex when you zero in on how memory is managed on specific platforms (the Intel x86 platform, for example) and by specific operating systems (Windows, for example) but the essence is the same: You have global name space, a stack, and a heap.

Using Registers and the Instruction Pointer

Registers are a special area of memory built right into the central processing unit (or CPU). Registers take care of internal housekeeping. Much of what goes on in the registers is beyond the scope of this book, but what you are concerned about is the set of registers responsible for pointing, at any given moment, to the next line of code. This book refers to those registers as the instruction pointer. It is the job of the instruction pointer to keep track of which line of code is to be executed next.

The code itself is in code space, which is that part of memory set aside to hold the binary form of the instructions you created in your program. Each line of source code is translated into a series of instructions, and each of these instructions is at a particular address in memory. The instruction pointer has the address of the next instruction to execute.

Using the Stack

The stack is a special area of memory allocated for your program to hold the data required by each of the functions in your program. It is called a stack because it is a last-in-first-out queue, much like a stack of dishes at a cafeteria.

Last-in-first-out means that whatever is added to the stack last will be the first thing taken off. Most queues are like a line at a theater: The first one in line is the first one off. A stack is more like a stack of coins: If you stack 10 pennies on a tabletop and then take some back, the last three you put on will be the first three you take off.

When data is "pushed" onto the stack, the stack grows; as data is "popped" off the stack, the stack shrinks. It isn't possible to pop a dish off of the stack without first popping off all the dishes placed after that dish.

A stack of dishes is the common analogy. It is fine as far as it goes, but it is wrong in a fundamental way. A more accurate mental picture is of a series of cubbyholes aligned top to bottom. The top of the stack is whatever cubby the stack pointer (which is another register) happens to be pointing to.

Each of the cubbies has a sequential address, and one of those addresses is kept in the stack pointer register. Everything below that magic address, known as the top of the stack, is considered to be on the stack. Everything above the top of the stack is considered to be off the stack and invalid.

When data is put on the stack, it is placed into a cubby above the stack pointer, and then the stack pointer is moved to the new data. When data is popped off the stack, all that really happens is that the address of the stack pointer is changed by moving it down the stack.

Understanding How the Stack Works with Functions

When a program running on a PC under DOS branches to a function, the following things happen:

  1. The address in the instruction pointer is incremented to the next instruction past the function call. That address then is placed on the stack, and it will be the return address when the function returns.

  2. Room is made on the stack for the return type you have declared. On a system with 2-byte integers, if the return type is declared to be int, another 2 bytes are added to the stack, but no value is placed in these bytes.

  3. The address of the called function, which is kept in a special area of memory set aside for that purpose, is loaded into the instruction pointer, so the next instruction executed will be in the called function.

  4. The current top of the stack now is noted and is held in a special pointer called the stack frame. Everything added to the stack from now until the function returns will be considered "local" to the function.

  5. All the arguments to the function are placed on the stack.

  6. The instruction now in the instruction pointer is executed, thus executing the first instruction in the function.

  7. Local variables are pushed onto the stack as they are defined.

When the function is ready to return, the return value is placed in the area of the stack reserved in step 2. The stack then is popped all the way up to the stack frame pointer, which effectively throws away all the local variables and the arguments to the function.

The return value is popped off the stack and assigned as the value of the function call itself, and the address stashed away in step 1 is retrieved and put into the instruction pointer. The program thus resumes immediately after the function call, with the value of the function retrieved.

Some of the details of this process change from compiler to compiler, or between computers, but the essential ideas are consistent across environments. In general, when you call a function, the return address and the parameters are put on the stack. During the life of the function, local variables are added to the stack. When the function returns, these are all removed by popping the stack.

Using the Heap

Free memory dedicated to your program but not assigned to the stack is available for your use and is referred to as the heap. On some computers and in some memory models, the stack and the heap actually can run into one another if you put more on the stack than it can hold. It therefore is important to tell your linker how much room to allocate for your stack so that this doesn't happen.

Most often you will use the built-in, compiler-provided operators new and delete for managing the heap, although on day 15 you will see how to override these operators as required.

The best way to think of the heap is as a pile of memory in no particular order and with no particular organization. This is absurd, of course, because it is exactly the same type of memory as the stack (and the code segment, for that matter) uses. You have no orderly access to this memory, however, except through the pointers returned by operator new.

The fact that two allocated variables abut each other in the heap is totally invisible to you. Even if you do become aware of the fact, you cannot count on it not changing over time unless you create a union of the two values. The best approach therefore is not to worry about the physical organization of the heap and, instead, to hang onto that pointer and use it to access your variables on the heap.

Working with Binary and Hexadecimal Values

When you examine memory in a debugger, you often will be confronted with values listed in hexadecimal or binary. Binary is the closest representation of the "real" value, as it corresponds to the bit values that underlie, at least at one level, all your programs and data.

Hexadecimal is simply a convenient way to aggregate binary data so that it is more easily "grokked" by humans. (For those few of you who are not Robert L. Heinlein fans, to grok something is to attain a deep and full understanding of it.) You soon will see that there is a trick for quickly and easily converting between binary and hexadecimal.

What Are These Strange Numbers?

You learned the fundamentals of arithmetic so long ago, it is hard to imagine what it would be like without that knowledge. When you look at the number 145, you instantly see one-hundred forty-five without much reflection.

Understanding binary and hexadecimal requires that you reexamine the number 145 and see it not as a number, but as a code for a number.

Start small: Examine the relationship between the number three and "3." The numeral 3 is a squiggle on a piece of paper, but the number three is an idea. The numeral is used to represent the number.

The distinction can be made clear by realizing that three, 3, |||, III, and *** all can be used to represent the same idea of three.

In base 10 (decimal) math, you use the numerals 0,1,2,3,4,5,6,7,8,9 to represent all numbers. How is the number ten represented?

One can imagine that we would have evolved a strategy of using the letter A to represent ten; or we might have used IIIIIIIIII to represent that idea. The Romans used X. The Arabic system, which we use, makes use of position with numerals to represent values. The first (far right) column is used for "ones," and the next column is used for tens. Thus, the number fifteen is represented as 15 (read that one-five); that is 1 ten and 5 ones.

Certain rules emerge, from which some generalizations can be made:

  1. Base 10 uses the digits 0 through 9

  2. The columns are powers of ten: 1s, 10s, 100s, and so on.

  3. If the third column is 100, the largest number you can make with two columns is 99. More generally, with n columns, you can represent 0 to (10n-1). Thus, with three columns, you can represent 0 to (103-1) or 0-999.

Using Other Bases

It is not a coincidence that we use base 10; we have 10 fingers. One can imagine a different base, however. Using the rules found in base 10, you can describe base 8:

  1. The digits used in base 8 are 0 through 7.

  2. The columns are powers of 8: 1s, 8s, 72s, and so on.

  3. With n columns, you can represent 0 to 8n-1.

To distinguish numbers written in each base, write the base as a subscript next to the number. The number fifteen in base ten would be written as 1510 and read as "one-five, base ten."

Thus, to represent the number 1510 in base 8, you would write 178. This is read "one seven base eight." Note that it also can be read "fifteen," because that is the number it continues to represent.

Why 17? The 1 means 1 eight, and the 7 means 7 ones. One eight plus seven ones equals fifteen. Consider fifteen asterisks:

    *****       *****
    *****

The natural tendency is to make two groups: a group of ten asterisks and another group of five asterisks. This would be represented in decimal as 15 (1 ten and 5 ones). You also can group the asterisks as

    ****        *******
    ****

That is, eight asterisks and seven. That would be represented in base eight as 178--one eight and seven ones.

You can represent the number fifteen in base 10 as 15, in base 9 as 169, in base 8 as 178, and in base 7 as 217. Why 217? In base 7, there is no numeral 8. In order to represent fifteen, you will need two sevens and one 1.

How do you generalize the process? To convert a base 10 number to base 7, think about the columns: in base 7 they are 1s, 7s, 49s, 343s, and so on. Why these columns? They represent 70, 71, 72, 73, and so on. Create a table for yourself:

    4       3       2       1

    73      72      71      70

    343     49      7       1

The first row represents the column number. The second row represents the power of 7. The third row represents the binary value of each number in that row.

To convert from a decimal value to base 7, examine the number and decide which column to use first. If the number is 200, for example, you know that column 4 (343) is 0, and you don't have to worry about it.

To find out how many 49s there are, divide 200 by 49. The answer is 4, so put 4 in column 3 and examine the remainder: 4. There are no 7s in 4, so put a 0 in the 7s column. There are 4 1s in 4, so put a 4 in the 1s column. The answer is 4047.

To convert the number 968 to base 6:

    5       4       3       2       1

    64      63      62      61      60

    1296    216     36      6       1

There are no 1296s in 968, so column 5 has 0. Dividing 968 by 216 yields 4, with a remainder of 104. Column 4 has the value 4. Dividing 104 by 36 yields 2, with a remainder of 32. Column 3 has the value 2. Dividing 32 by 6 yields 5, with a remainder of 2. The answer therefore is 42526.

    5       4       3       2       1

    64      63      62      61      60

    1296    216     36      6       1

    0       4       2       5       2

There is a shortcut when converting from one base to another base (such as 6) to base 10. You can multiply:

    4   *   216     =   864
    2   *    36     =    72
    5   *     6     =    30
    2   *     1     =     2
                        968

Working with Binary Values

Base 2 is the ultimate extension of this idea. There are only two digits: 0 and 1. The columns follow:

    Column:     8       7       6       5       4       3       2       1

    Power:      27      26      25      24      23      22      21      20

    Value:      128     64      32      16      8       4       2       1

To convert the number 88 to base 2, you follow the same procedure: there are no 128s, so column 8 is 0.

There is one 64 in 88, so column 7 is 1 and 24 is the remainder. There are no 32s in 24, so column 6 is 0.

There is one sixteen in 24, so column 5 is 1. The remainder is 8. There is one 8 in 8, so column 4 is 1. There is no remainder, so the rest of the columns are 0:

    0     1     0     1     1     0     0     0

To test this answer, convert it back:

    1   *   64      =    64
    0   *   32      =     0
    1   *   16      =    16
    1   *    8      =     8
    0   *    4      =     0
    0   *    2      =     0
    0   *    1      =     0
                         88

Why Base 2?

The power of base 2 is that it corresponds so cleanly to what a computer needs to represent. Computers do not really know anything at all about letters, numerals, instructions, or programs. At their core, they are just circuitry, at a given juncture, there is a great deal of power or there is very little.

To keep the logic clean, engineers do not treat this as a relative scale"a little power, some power, more power, lots of power, tons of power." Instead, they treat it as a binary scale "enough power or not enough power." Rather than saying even enough or not enough, they simplify it down to "yes or no." Yes or no, or TRUE or FALSE, can be represented as 1 or 0. By convention, 1 means TRUE or Yes, but that is just a convention; it just as easily could have meant FALSE or No.

After you make this great leap of intuition, the power of binary becomes clear. With 1s and 0s, you can represent the fundamental truth of every circuit: There is power or there isn't power. All a computer ever knows is "Is you is, or is you ain't?" Is you is = 1; is you ain't = 0.

Bits and Bytes

After the decision is made to represent TRUE and FALSE with 1s and 0s, binary digits (or bits) become very important. Because early computers could send 8 bits at a time, it was natural to start writing code using 8-bit numbers, called bytes.

With 8 binary digits, you can represent up to 256 different values. Examine the columns: If all 8 bits are set (1), the value is 255. If none is set (all the bits are clear or 0), the value is 0. 0 through 255 represents 256 possible states.

What Is a K?

It turns out that 210 (1,024) is roughly equal to 103 (1,000). This coincidence was too good to pass up, so computer scientists started referring to 210 bytes as 1K or 1 kilobyte, based on the scientific prefix of kilo for thousand.

Similarly, 1,024 times 1,024 (1,048,576) is close enough to 1 million to receive the designation 1M or 1 megabyte, and 1,024 megabytes is called 1 gigabyte (giga implies thousand-million or billion.)

Binary Numbers

Computers use patterns of 1s and 0s to encode everything they do. Machine instructions are encoded as a series of 1s and 0s, and are interpreted by the fundamental circuitry. Arbitrary sets of 1s and 0s can be translated back into numbers by computer scientists, but it would be a mistake to think that these numbers have intrinsic meaning.

The Intel 80x6 chip set, for example, interprets the bit pattern 1001 0101 as an instruction. You certainly can translate this into decimal (149), but that number has no meaning.

Sometimes the numbers are instructions, sometimes they are values, and sometimes they are codes. One important standardized code set is ASCII. In ASCII, every letter and punctuation is given a 7-digit binary representation. The lowercase letter a, for example, is represented by 0110 0001. This is not a number, although you can translate it to the number 97 (64 + 32 + 1). It is in this sense that people say that the letter a is represented by 97 in ASCII; but the truth is that the binary representation of 97, 0110 0001, is the encoding of the letter a, and the decimal value 97 is a human convenience.

Note: Note that the value for the uppercase A is different from the value for the lowercase a. This is because the real meaning of the bit pattern 01100001 is tied to the key pressed (in this case, the a key), and Shift+A is represented by 0100 0001 or 65.

Working with Hexadecimal Values

Because binary numbers are difficult to read, a simpler way to represent the same values is needed. Translating from binary to base 10 involves a fair bit of manipulation of numbers, but it turns out that translating from base 2 to base 16 is very simple because there is a very good shortcut.

To understand this, first you must understand base 16, which is known as hexadecimal. In base 16, there are 16 numerals: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. The last six are arbitrary; the letters A through F were chosen because they are easy to represent on a keyboard. The columns in hexadecimal follow:

    4       3       2       1

    163     162     161     160

    4096    256     16      1

To translate from hexadecimal to decimal, you can multiply. The number F8C represents the following:

    F   *   256     =   15  *   256     =   3840
    8   *    16     =                        128
    C   *     1     =   12  *     1     =     12
                                            3980

Translating the number FC to binary is best done by translating first to base 10, and then to binary:

    F   *   16      =   15  *    16     =    240
    C   *    1      =   12  *     1     =     12
                                             252

Converting 25210 to binary requires the chart:

    Column:     9       8       7       6       5       4       3       2       1

    Power:      28      27      26      25      24      23      22      21      20

    Value:      256     128     64      32      16      8       4       2       1
    There are no 256s.
                   1 128 leaves 124
                        1 64 leaves 60
                             1 32 leaves 28
                                  1 16 leaves 12
                                       1 8 leaves 4
                                            1 4 leaves 0
                                                 0
                                                      0
                   1    1    1    1    1    1    0    0

The answer in binary therefore is 1111 1100.

It turns out that if you treat this binary number as two sets of four digits, you can do a magical transformation.

The right set is 1100. In decimal, that is 12; or in hexadecimal, it is C.

The left set is 1111, which in base 10 is 15, or in hexadecimal is F.

You therefore have the following:

    1111 1100
    F     C

Putting the two hexadecimal numbers together is FC, which is the real value of 1111 1100. This shortcut always works. You can take any binary number of any length and reduce it to sets of 4, translate each set of four to hexadecimal, and put the hexadecimal numbers together to get the result in hexadecimal. Here's a much larger number:

    1011 0001 1101 0111

The columns are 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, and 32768.

    1 * 1 =           1
    1 * 2 =           2
    1 * 4 =           4
    0 * 8 =           0

    1 * 16 =         16
    0 * 32 =          0
    1 * 64 =         64
    1 * 128 =       128

    1 * 256 =       256
    0 * 512 =         0
    0 * 1024 =        0
    0 * 2048 =        0

    1 * 4096 =    4,096
    1 * 8192 =    8,192
    0 * 16384 =       0
    1 * 32768 =  32,768

    Total:       45,527

Converting this to hexadecimal requires a chart with the hexadecimal values:

    65535   4096    256     16      1

There are no 65,535s in 45,527, so the first column is 4,096. There are 11 4096s (45,056), with a remainder of 471. There is 1 256 in 471, with a remainder of 215. There are 13 16s (208) in 215, with a remainder of 7. The hexadecimal number therefore is B1D7.

Check the math:

    B (11)  *   4096    =   45,056
    1       *    256    =      256
    D (13)  *     16    =      208
    7       *      1    =        7

    Total                   45,527

The shortcut version would be to take the original binary number, 1011000111010111, and break it into groups of 4: 1011 0001 1101 0111. Each of the four then is evaluated as a hexadecimal number:

    1011 =
    1 * 1 =        1
    1 * 2 =        2
    0 * 4 =        0
    1 * 8 =        8
    Total            11
    Hex:              B

    0001 =
    1 * 1 =        1
    0 * 2 =        0
    0 * 4 =        0
    0 * 8 =        0
    Total             1
    Hex:              1

    1101 =
    1 * 1 =        1
    0 * 2 =        0
    1 * 4 =        4
    1 * 8 =        8
    Total            13
    Hex =             D

    0111 =
    0 * 1 =        1
    1 * 2 =        2
    1 * 4 =        4
    0 * 8 =        0
    Total             7
    Hex:              7

    Total Hex:  B1D7

Summary

Today you learned a bit more about the layout of memory, including the stack, the heap, and code space. You also saw how to create and manipulate bit fields, and how to turn bits on and off using by bit masks. Finally, you saw in detail how binary and hexadecimal numbers can be manipulated and used.

Q&A

Q: Why do I need to understand bit twiddling?

A: Although new programs have done a good job of hiding much of the internal representation of data from the user, professional programmers simply must understand how the data is stored and manipulated in order to make efficient use of the available machinery.

Q: When would you use a bit field?

A: Use a bit field when you need to store a number of small bits of information and storage size matters to your program. Often it is easier to read a program that "wastes" the space of devoting 2 bytes to a Boolean operator, instead of stuffing more information into a bit field than a programmer easily can understand.

Q: Why is hexadecimal used?

A: There is a convenient mapping between 4 binary bits and one hexadecimal digit.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix A, and make sure that you understand the answers before continuing to the next chapter.

Quiz

  1. What is the difference between OR and Exclusive-Or?

  2. How do you set a bit?

  3. How do you clear a bit?

  4. What is the difference between the heap and the stack?

  5. What is the value 1011 1101 in hexadecimal? Explain two ways to compute the answer.

[Click here for Answers]

Exercises

  1. Write a function that will accept an unsigned short and will return the value with the high and low bits swapped.

  2. Write masks to access the different values in the specification that follows. (Don't worry yet what the different values mean.) The masks all should be const unsigned chars. In the nomenclature that follows, D0 is the lowest bit in the byte, and D7 is the highest.

    For the fields that are more than a single bit, also include a mask that will cover the entire field (all of the relevant bits set). These will be needed in exercise 3.

  3. Exercise 2 shows the specifications for a serial communications chip. Design a class that will encapsulate its behavior. (Write the public part of the class declaration.)

    The chip is configured into your computer to appear at a fixed memory location, as if it were regular memory. The values in the status byte and the received character byte, however, are changed by the chip to reflect the events on the communications line, rather than those set by you. If you read the status byte, for example, the lowest bit (receive ready) would be 0 until some data is sent through the communication line (presumably from some other computer or a modem). When a full byte of data is received, the chip sets the data received bit to 1, and the received character byte to the value that was received. When the byte is read from the receive buffer, the bit is reset (set to 0).

    The four bytes defined later in this exercise will appear to your computer as if they are in three successive memory locations. Your class should accept, in its constructor, the address of byte 0, and it will find the next three bytes in the next three memory locations. In other words, it should treat the passed-in address as a pointer to an unsigned char array with a length of 4.

    Your class should provide the capability to read and write over the serial port. If Read is called and there are no characters available, it should return 00 (the NULL character). If a read error is detected, your serial port class always should return 00 until a special ResetError method is called. (The chip will reset the error bits when you read from the receive data location.) Your Write method should make sure that the chip is ready for the next character to be transmitted (by testing for transmit ready). If the chip isn't ready for another character to be transmitted (it's still working on the last character), your Write function should wait.

    Your class also should have methods to set the different configuration values. (One common error to watch out for: When setting one of the values in the configuration byte, don't wipe out the other values. Use the masks cleverly.)

    Note: The parity error bit is set when the last received character has the incorrect parity, according to the parity setting in the configuration byte. An overrun error occurs when a character has been fully received before the preceding character has been read. This would cause the receiving computer to lose one of the characters.

  4. Write the entire class discussed in exercise 3. Be sure to start the communication port with reasonable defaults for the configuration parameters, and be sure to clear any errors it might have pending.

Go to: Table of Contents | Next Page