Character Messages

Earlier in this chapter, I discussed the idea of translating keystroke messages into character messages by taking shift-state information into account. I warned you that shift-state information is not enough: you also need to know about country-dependent keyboard configurations. For this reason, you should not attempt to translate keystroke messages into character codes yourself. Instead, Windows does it for you. You've seen this code before:

while (GetMessage (&msg, NULL, 0, 0))
{
     TranslateMessage (&msg) ;
     DispatchMessage (&msg) ;
}

This is a typical message loop that appears in WinMain. The GetMessage function fills in the msg structure fields with the next message from the queue. DispatchMessage calls the appropriate window procedure with this message.

Between these two functions is TranslateMessage, which takes on the responsibility of translating keystroke messages to character messages. If the keystroke message is WM_KEYDOWN or WM_SYSKEYDOWN, and if the keystroke in combination with the shift state produces a character, TranslateMessage places a character message in the message queue. This character message will be the next message that GetMessage retrieves from the queue after the keystroke message.

The Four Character Messages

There are four character messages:
Characters Dead Characters
Nonsystem Characters: WM_CHAR WM_DEADCHAR
System Characters: WM_SYSCHAR WM_SYSDEADCHAR

The WM_CHAR and WM_DEADCHAR messages are derived from WM_KEYDOWN messages. The WM_SYSCHAR and WM_SYSDEADCHAR messages are derived from WM_SYSKEYDOWN messages. (I'll discuss what a dead character is shortly.)

Here's the good news: In most cases, your Windows program can process the WM_CHAR message while ignoring the other three character messages. The lParam parameter that accompanies the four character messages is the same as the lParam parameter for the keystroke message that generated the character code message. However, the wParam parameter is not a virtual key code. Instead, it is an ANSI or Unicode character code.

These character messages are the first messages we've encountered that deliver text to the window procedure. They're not the only ones. Other messages are accompanied by entire zero-terminated text strings. How does the window procedure know whether this character data is 8-bit ANSI or 16-bit Unicode? It's simple: Any window procedure associated with a window class that you register with RegisterClassA (the ANSI version of RegisterClass) gets messages that contain ANSI character codes. Messages to window procedures that were registered with RegisterClassW (the wide-character version of RegisterClass) come with Unicode character codes. If your program registers its window class using RegisterClass, that's really RegisterClassW if the UNICODE identifier was defined and RegisterClassA otherwise.

Unless you're explicitly doing mixed coding of ANSI and Unicode functions and window procedures, the character code delivered with the WM_CHAR message (and the three other character messages) is

(TCHAR) wParam

The same window procedure might be used with two window classes, one registered with RegisterClassA and the other registered with RegisterClassW. This means that the window procedure might get some messages with ANSI character codes and some messages with Unicode character codes. If your window procedure needs help to sort things out, it can call

fUnicode = IsWindowUnicode (hwnd) ;

The fUnicode variable will be TRUE if the window procedure for hwnd gets Unicode messages, which means the window is based on a window class that was registered with RegisterClassW.

Message Ordering

Because the character messages are generated by the TranslateMessage function from WM_KEYDOWN and WM_SYSKEYDOWN messages, the character messages are delivered to your window procedure sandwiched between keystroke messages. For instance, if Caps Lock is not toggled on and you press and release the A key, the window procedure receives the following three messages:
Message Key or Code
WM_KEYDOWN Virtual key code for `A' (0x41)
WM_CHAR Character code for `a' (0x61)
WM_KEYUP Virtual key code for `A' (0x41)

If you type an uppercase A by pressing the Shift key, pressing the A key, releasing the A key, and then releasing the Shift key, the window procedure receives five messages:
Message Key or Code
WM_KEYDOWN Virtual key code VK_SHIFT (0x10)
WM_KEYDOWN Virtual key code for `A' (0x41)
WM_CHAR Character code for `A' (0x41)
WM_KEYUP Virtual key code for `A' (0x41)
WM_KEYUP Virtual key code VK_SHIFT (0x10)

The Shift key by itself does not generate a character message.

If you hold down the A key so that the typematic action generates keystrokes, you'll get a character message for each WM_KEYDOWN message:
Message Key or Code
WM_KEYDOWN Virtual key code for `A' (0x41)
WM_CHAR Character code for `a' (0x61)
WM_KEYDOWN Virtual key code for `A' (0x41)
WM_CHAR Character code for `a' (0x61)
WM_KEYDOWN Virtual key code for `A' (0x41)
WM_CHAR Character code for `a' (0x61)
WM_KEYDOWN Virtual key code for `A' (0x41)
WM_CHAR Character code for `a' (0x61)
WM_KEYUP Virtual key code for `A' (0x41)

If some of the WM_KEYDOWN messages have a Repeat Count greater than 1, the corresponding WM_CHAR message will have the same Repeat Count.

The Ctrl Key in combination with a letter key generates ASCII control characters from 0x01 (Ctrl-A) through 0x1A (Ctrl-Z). Several of these control codes are also generated by the keys shown in the following table:
Key Character Code Duplicated by ANSI C Escape
Backspace 0x08 Ctrl-H \b
Tab 0x09 Ctrl-I \t
Ctrl-Enter 0x0A Ctrl-J \n
Enter 0x0D Ctrl-M \r
Esc 0x1B Ctrl-[

The rightmost column shows the escape code defined in ANSI C to represent the character codes for these keys.

Windows programs sometimes use the Ctrl key in combination with letter keys for menu accelerators (which I'll discuss in Chapter 10). In this case, the letter keys are not translated into character messages.

Control Character Processing

The basic rule for processing keystroke and character messages is this: If you need to read keyboard character input in your window, you process the WM_CHAR message. If you need to read the cursor keys, function keys, Delete, Insert, Shift, Ctrl, and Alt, you process the WM_KEYDOWN message.

But what about the Tab key? Or Enter or Backspace or Escape? Traditionally, these keys generate ASCII control characters, as shown in the preceding table. But in Windows they also generate virtual key codes. Should these keys be processed during WM_CHAR processing or WM_KEYDOWN processing?

After a decade of considering this issue (and looking back over Windows code I've written over the years), I seem to prefer treating the Tab, Enter, Backspace, and Escape keys as control characters rather than as virtual keys. My WM_CHAR processing often looks something like this:

case WM_CHAR:
     [other program lines]
     switch (wParam)
     {
     case `\b':          // backspace
          [other program line
          break ;
     case `\t':          // tab
          [other program lines]
          break ;

     case `\n':          // linefeed
          [other program lines]
          break ;

     case `\r':          // carriage return
          [other program lines]
          break ;

     default:            // character codes
          [other program lines]
          break ;
     }
     return 0 ;

Dead-Character Messages

Windows programs can usually ignore WM_DEADCHAR and WM_SYSDEADCHAR messages, but you should definitely know what dead characters are and how they work.

On some non-U.S. English keyboards, certain keys are defined to add a diacritic to a letter. These are called "dead keys" because they don't generate characters by themselves. For instance, when a German keyboard is installed, the key that is in the same position as the +/= key on a U.S. keyboard is a dead key for the grave accent (`) when shifted and the acute accent (´) when unshifted.

When a user presses this dead key, your window procedure receives a WM_DEADCHAR message with wParam equal to ASCII or Unicode code for the diacritic by itself. When the user then presses a letter key that can be written with this diacritic (for instance, the A key), the window procedure receives a WM_CHAR message where wParam is the ANSI code for the letter `a' with the diacritic.

Thus, your program does not have to process the WM_DEADCHAR message because the WM_CHAR message gives the program all the information it needs. The Windows logic even has built-in error handling: If the dead key is followed by a letter that can't take a diacritic (such as `s'), the window procedure receives two WM_CHAR messages in a row—the first with wParam equal to the ASCII code for the diacritic by itself (the same wParam value delivered with the WM_DEADCHAR message) and the second with wParam equal to the ASCII code for the letter `s'.

Of course, the best way to get a feel for this is to see it in action. You need to load a foreign keyboard that uses dead keys, such as the German keyboard that I described earlier. You do this in the Control Panel by selecting Keyboard and then the Language tab. Then you need an application that shows you the details of every keyboard message a program can receive. That's the KEYVIEW1 program coming up next.