home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
kermit.columbia.edu
/
kermit.columbia.edu.tar
/
kermit.columbia.edu
/
archives
/
ucsterminal.zip
/
control.txt
next >
Wrap
Text File
|
1998-11-09
|
42KB
|
1,002 lines
ADDITIONAL CONTROL PICTURES FOR UNICODE
Frank da Cruz
The Kermit Project
Columbia University
New York City USA
fdc@columbia.edu
http://www.columbia.edu/kermit/
Tue Nov 10 00:00:00 1998
THIS IS A PREFORMATTED PLAIN-TEXT ASCII DOCUMENT. IT IS DESIGNED TO BE
VIEWED AS-IS IN A FIXED-PITCH FONT. ITS WIDEST LINE IS 79 COLUMNS. IT
CONTAINS NO TABS. IF IT LOOKS MESSY TO YOU, PLEASE FEEL FREE TO PICK UP
A CLEAN COPY OF THIS OR THE RELATED PROPOSALS BY ANONYMOUS FTP:
HEX BYTE PICTURES FOR UNICODE (plain text)
ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt
ADDITIONAL CONTROL PICTURES FOR UNICODE (plain text)
ftp://kermit.columbia.edu/kermit/ucsterminal/control.txt
TERMINAL GRAPHICS FOR UNICODE (plain text)
ftp://kermit.columbia.edu/kermit/ucsterminal/ucsterminal.txt
Glyph Map (PDF, contributed by Michael Everson)
ftp://kermit.columbia.edu/kermit/ucsterminal/terminal-emulation.pdf
Clarification of SNI Glyphs (Microsoft Word 7.0)
ftp://kermit.columbia.edu/kermit/ucsterminal/sni-charsets.doc
Discussion (plain text)
ftp://kermit.columbia.edu/kermit/ucsterminal/mail.txt
(Note, the Exhibits are on paper and not available at the FTP site.)
ABSTRACT
Extensions are proposed to augment Unicode's repertoire of Control Pictures
at U+2400 with control pictures for other well-known control sets.
Please refer to the TERMINAL GRAPHICS FOR UNICODE proposal for a discussion
of terminal emulation, including motivation for supporting it in Unicode, as
well as for acknowledgements to those who helped with this set of proposals.
CONTENTS
1. Introduction
2. Background
3. C0 Control Pictures
4. C1 Control Pictures
5. EBCDIC Control Pictures
6. IBM 3270 Terminal Orders and Controls
7. Additional Control-Like Pictures
8. Unicode Control Pictures
9. Summary of Proposed Additional Characters
10. References
11. Exhibits
NOTATION
. Numbers in (parentheses) are footnote references, keyed to footnotes
at the bottom of the section in which they appear.
. Numbers in [brackets] are keyed to the References in Section 3.
. Letter-Digit in brackets refers to an Exhibit in Section 4.
For consistency, the References and Exhibits are the same as those in the
accompanying, even though most of the items are not referenced here.
1. INTRODUCTION
In the interest of "show[ing] the presence of ... control codes and the
SPACE unequivocally when data is displayed" [24,p.6-84], Unicode includes a
selection of control pictures. Makers (and supportors, and users) of
terminal emulators, PC-based data monitors and protocol analyzers, and most
other types of software could use this feature of Unicode to better
advantage if it were extended to cover a greater portion of the control
space.
Why are Unicode characters needed for this purpose?
a. This was deemed a worthwhile enough concept in the original Unicode
design to include a block of control pictures for the C0 set.
b. C1 and EBCDIC control sets are also widely used.
c. Real physical terminals include these glyphs.
d. Debug modes of these terminals (as well as data monitors, etc) show
these glyphs in a single fixed-width character cell, of the same
size used for regular characters.
e. Since many communications-oriented applications might make use of these
glyphs, they should be standardized for interoperability, not only with
each other, but also with email, word-processsing, and printing
applications to aid in help-desk and documentation procedures.
While this proposal asks that "display-controls" symbols for C1 and EBCDIC
control characters be added to Unicode, it does not ask that the
corresponding control characters themselves be added.
The characters proposed in this document are assigned temporary Unicode
values from the Private Use area, strictly for reference within (or to)
this document only. Final values should be assigned outside of the Private
Use range.
2. BACKGROUND
Digital VT220 and higher terminals, as well as Televideo, Wyse, HP, Data
General, Perkin Elmer, and other models, allow the user (or, in some cases,
the host) to select whether control characters are acted upon or displayed
graphically. Unicode itself includes its own "control characters" such as
line and paragraph separators, directionality controls, etc.
Normally control characters are used to affect the format and presentation
of glyphs on the screen. In "display controls", "transparent", or "debug"
mode (the terminology varies with the terminal vendor), control characters
are shown graphically rather than performing their normal functions; this
allows analysis and debugging of the host-terminal data stream using a
terminal, emulator, protocol analyzer, or line monitor. It also allows a
more readable form of file dumping and analysis.
A block of control pictures is already found in Unicode at U+2400, but:
a. The illustrations in the Unicode book do not look like the control
pictures that are actually used on terminals;
b. They are for C0 only; there is no corresponding set of C1 control
pictures;
c. There are no pictures for the control characters unique to EBCDIC.
d. Certain other terminal-specific control pictures are missing.
A control picture allows the user to unequivocally determine the identity
and position of control characters in the data stream by displaying each
control chraracter as a unique (and mnemonic) glyph in a single terminal
screen cell.
Terminals do this by arranging the letters (or letter-digit combinations) of
the official abbreviation for the control character in diagonally from upper
left to lower right, as shown in Figure 5.1.
Figure 2.1: Control Picture Display
+---+ +---+
|L | |D | (except the two-character abbreviation appears on the
| | | C | screen with the characters closer together)
| F| | 1|
+---+ +---+
The Unicode illustration for control pictures at U+2400, however, depicts
the abbreviations horizontally. While the description of this block
[24,p.6-84] states that "only the semantic is encoded... a particular
application [can] use the graphic representation it prefers," a horizontal
arrangement is chosen in the illustration (on p.7-188) for all characters
except NL. But if they are implemented this way in a real font, it would be
very difficult for the user to discern the boundary between one control
picture and the next when several of them appear in a row.
It is suggested, therefore, that that next edition of the Unicode Standard
illustrate these characters with the diagonal representation shown in Figure
5.1 (and in ISO 10646 [19]), since it is more likely that Unicode font
designers will follow the illustrations in the Unicode Standard than attempt
to procure the actual terminals or manuals to see how they do it.
The following sections discuss the different control sets, and propose a new
set of control picture glyphs for each set except the C0 set. Each
subsection is to be considered separately except insofar as they overlap.
Control picture characters should have the following properties:
Case: No
Combining Class: 0
Combining Jamo: No
Directionality: Other Neutral (ON)
Jamo Short Name: No
Numeric Value: No
Private Use: No
Surrogate: No
Mirrored: No
Mathematical: No
3. C0 CONTROL PICTURES
Table 3.1 lists the C0 Control Characters from the ASCII Standard [1] (and
also in ISO 646 and ISO 6429). Each C0 control character has an official
designator (from the appropriate ANSI [1] or ISO [18] standard): a 2- or
3-character sequence of (ASCII) alphanumeric characters.
In some terminals, such as the DEC VT320 and above [B1,B2,C1], the control
picture shows the designation in full. In most others, such as the VT220 and
240 [A1-A2], Data General [D1], Televideo [M1], HP [K1], and Perkin Elmer
[20], each 3-character designator is replaced by a 2-character short form,
referred to in this document as the "2X" form. For example, the character
called DELETE has an official abbreviation DEL and a 2X form DT.
The columns of Table 3.1 are as follows:
Code: The Unicode value in hexadecimal.
Val: The value of the control character's code in hexadecimal.
Name: The full ASCII abbreviation for the control character's name.
2X: The 2-character abbreviation used on Televideo, Wyse, HP, etc.
Description: "Symbol for" followed by the character's standard name.
Table 3.1: C0 Control Characters
Code Val Name 2X Description
2400 00 NUL NU Symbol for Null
2401 01 SOH SH Symbol for Start of Heading
2402 02 STX SX Symbol for Start of Text
2403 03 ETX EX Symbol for End of Text
2404 04 EOT ET Symbol for End of Transmission
2405 05 ENQ EQ Symbol for Enquiry
2406 06 ACK AK Symbol for Acknowledge
2407 07 BEL BL Symbol for Bell
2409 09 BS BS Symbol for Backspace
2409 09 HT HT Symbol for Horizontal Tab (1)
240A 0A LF LF Symbol for Line Feed (1)
240B 0B VT VT Symbol for Vertical Tab (1)
240C 0C FF FF Symbol for Form Feed (2)
240D 0D CR CR Symbol for Carriage Return (1)
240E 0E SO SO Symbol for Shift Out
240F 0F SI SI Symbol for Shift In
2410 10 DLE DL Symbol for Data Link Escape
2411 11 DC1 D1 Symbol for Device Control 1 (2)
2412 12 DC2 D2 Symbol for Device Control 2 (2)
2413 13 DC3 D3 Symbol for Device Control 3 (2)
2414 14 DC4 D4 Symbol for Device Control 4 (2)
2415 15 NAK NK Symbol for Negative Acknowledge
2416 16 SYN SY Symbol for Synchronous Idle
2417 17 ETB EB Symbol for End of Transmission Block
2418 18 CAN CN Symbol for Cancel
2419 19 EM EM Symbol for End of Medium
241A 1A SUB SU Symbol for Substitute
241B 1B ESC EC Symbol for Escape
241C 1C FS FS Symbol for Field Separator (3)
241D 1D GS GS Symbol for Group Separator (3)
241E 1E RS RS Symbol for Record Separator (3)
241F 1F US US Symbol for Unit Separator (3)
2420 20 SP SP Symbol for Space (4)
2421 7F DEL DT Symbol for Delete (4)
Notes:
(1) This symbol is also used in the DEC Special Graphics Set.
(2) Note the conflict/coincidence of these 2-character forms with hex
bytes; see Note (3) in Section 4.
(3) These C0 controls have alternative names, listed in Section 7.
(4) Not, strictly speaking, a control character, but not a visible
one either.
Summary and Status:
No new characters, but it is recommended that C0 control pictures be
illustrated diagonally in the Unicode Standard, and that the "2X" forms be
listed as alternatives for font designers, especially for low resolutions
or small point sizes.
4. C1 CONTROL PICTURES
Since Unicode is used as the internal character set in applications (such as
terminal emulators) that deal with non-Unicode character sets externally --
e.g. on network or modem connections -- the other widely-used control sets
should also have control-picture glyphs, just as the C0 set does now.
C1 Control characters are specified in ISO 6429 [18] (ISO Registration Number
77 [28]) and used, among other places, in the VT220 family of terminals [5-9],
Data General terminals [2], and the Wyse 370 [26], where they are represented
in the right half of the "display controls" font as shown in Table 4.1 (DEC
VT320 and higher terminals use the full name [B1-B2], Wyse terminals use the
2X name [G1-G4]; the DEC VT220 puts the hex value in a single character cell
[A1,A2]). As with C0 controls, the "name" is displayed diagonally within the
character cell in all these terminals. Unicode presently includes no C1
control pictures.
The "Code" column in the table shows the temporary Unicode value for reference
within this document only; actual code assignments should be outside the
Private Use area. The other columns are labeled as in Table 3.1.
Table 4.1: C1 Control Characters
Code Val Name 2X Description
80 80 80 (1)
81 81 81 (1)
E022 82 BPH 82 Symbol for Break Permitted Here (2)
E023 83 NBH 83 Symbol for No Break Here (2)
E024 84 IND IN Symbol for Index (3)
E025 85 NEL NL Symbol for Next Line (4)
E026 86 SSA SS Symbol for Start Selected Area
E027 87 ESA ES Symbol for End Selected Area
E028 88 HTS HS Symbol for Character Tabulation Set
E029 89 HTJ HJ Symbol for Character Tabulation with Justification
E02A 8A VTS VS Symbol for Line Tabulation Set
E02B 8B PLD PD Symbol for Partial Line Forward
E02C 8C PLU PU Symbol for Partial Line Backward
E02D 8D RI RI Symbol for Reverse Line Feed
E02E 8E SS2 S2 Symbol for Single Shift 2
E02F 8F SS3 S3 Symbol for Single Shift 3
E030 90 DCS DC Symbol for Device Control String
E031 91 PU1 P1 Symbol for Private Use 1
E032 92 PU2 P2 Symbol for Private Use 2
E033 93 STS SE Symbol for Set Transmit State
E034 94 CCH CC Symbol for Cancel Character
E035 95 MW MW Symbol for Message Waiting
E036 96 SPA SP Symbol for Start Protected (Guarded) Area
E037 97 EPA EP Symbol for End Protected (Guarded) Area
E038 98 SOS 98 Symbol for Start of String (2)
99 99 (1)
E03A 9A SCI 9A Symbol for Single Character Introducer (2)
E03B 9B CSI CS Symbol for Control Sequence Introducer (5)
E03C 9C ST ST Symbol for String Terminator
E03D 9D OSC OS Symbol for Operating System Command
E03E 9E PM PM Symbol for Privacy Message
E03F 9F APC AP Symbol for Application Program Command
Notes;
(1) Undefined in ISO-6429, shown on VT320/WY370 terminal by hex byte
symbols (see text just below these notes).
(2) Defined in ISO-6429, but shown on VT320/WY370 terminal by hex value.
(3) Removed from ISO-6429 in the third edition, but shown as indicated on
VT320 and WY370 terminals. Data General terminals show "ID" rather
than "IN" [D7].
(4) Note the unfortunate coincidence of the 2X form of this character,
"NL", with the EBCDIC Newline (NL) control. Data General Terminals
show "NE" rather than "NL" [D7]. Also see notes in Section 5.
(5) Data General terminals show "CI" rather than "CS" [D7].
As the table indicates, three of the C1 control pictures are unassigned (the
ones marked by "(1)", that would be at U+E020, U+E021, and U+E039 if these
were assigned). These positions should be left vacant in case names are
assigned to these characters in a future revision of ISO 6429, or terminals
are discovered with control pictures for these codes. In the meantime, hex
bytes are used (because this is what the real terminals do); if a hex-byte
block (separate proposal) is defined, they can be taken from that block;
otherwise, the particular values shown here (80, 81, and 99, and possibly
also 98 and 9A) must be defined for this block.
As with C0 controls, it is a matter for the font designer to choose the
full designator from the Name column, or the 2-character alternatives from
the 2X column.
Summary:
29 New characters (if hex bytes are also approved) or 32 (if they are not).
Status:
Needed to replicate the debugging functions of (at least) VT320/420/520
and WY370 terminals, and for debugging any data stream that contains
ISO 6429 C1 controls.
5. EBCDIC CONTROL PICTURES
The EBCDIC family of character sets [13,14,29] includes its own repertoire
of control characters. Many of them, like NUL, SOH, FF, SO, SI, and so on,
are coincident with ASCII C0 controls in name and semantics, and sometimes
also in encoding. Others are unique to EBCDIC.
Table 5.1 shows the EBCDIC control characters [29], in EBCDIC order. The Code
column shows the Unicode value; those starting with 24 are already in Unicode
block U+2400; those starting with E need to be added (these are also marked
with "+" for emphasis). The Val column shows the EBCDIC value (hex). The
Name column shows the EBCDIC abbreviation for the code, and the description
lists "Symbol for" plus the EBCDIC name. No known "2X" forms exist.
Table 5.1: EBCDIC Control Characters
Code Val Name Description
2400 00 NUL Symbol for Null
2401 01 SOH Symbol for Start of Heading
2402 02 STX Symbol for Start of Text
2403 03 ETX Symbol for End of Text
+ E040 04 SEL Symbol for Select (6)
2409 05 HT Symbol for Horizontal Tab
+ E041 06 RNL Symbol for Required New Line (6)
2421 07 DEL Symbol for Delete
+ E042 08 GE Symbol for Graphic Escape
+ E043 09 SPS Symbol for Superscript
+ E044 0A RPT Symbol for Repeat (6)
240B 0B VT Symbol for Vertical Tab
240C 0C FF Symbol for Form Feed (1)
240D 0D CR Symbol for Carriage Return
240E 0E SO Symbol for Shift Out
240F 0F SI Symbol for Shift In
2410 10 DLE Symbol for Data Link Escape
2411 11 DC1 Symbol for Device Control 1
2412 12 DC2 Symbol for Device Control 2
2413 13 DC3 Symbol for Device Control 3 (6)
+ E045 14 RES Symbol for Restore
2424 15 NL Symbol for New Line (2)
2409 16 BS Symbol for Backspace
+ E046 17 POC Symbol for Program Operator Communication (6)
2418 18 CAN Symbol for Cancel
2419 19 EM Symbol for End of Medium
+ E047 1A UBS Symbol for Unit Back Space
+ E048 1B CU1 Symbol for Customer Use 1
+ E049 1C IFS Symbol for Interchange File Separator
+ E04A 1D IGS Symbol for Interchange Group Separator
+ E04B 1E IRS Symbol for Interchange Record Separator
+ E04C 1F IUS Symbol for Interchange Unit Separator (3)
+ E04D 20 DS Symbol for Digit Select
+ E04E 21 SOS Symbol for Start of Significance
241C 22 FS Symbol for Field Separator
+ E04F 23 WUS Symbol for Word Underscore
+ E050 24 BYP Symbol for Bypass
240A 25 LF Symbol for Line Feed
2417 26 ETB Symbol for End of Transmission Block
241B 27 ESC Symbol for Escape
+ E051 28 SA Symbol for Set Attribute
+ E052 29 SFE Symbol for Start Field Extended
+ E053 2A SM Symbol for Set Mode (4)
+ E054 2B CSP Symbol for Control Sequence Prefix (6)
+ E055 2C MFA Symbol for Modify Field Attribute
2405 2D ENQ Symbol for Enquiry
2406 2E ACK Symbol for Acknowledge
2407 2F BEL Symbol for Bell
+ E056 30 (Reserved by IBM for future use)
+ E057 31 (Reserved by IBM for future use)
2416 32 SYN Symbol for Synchronous Idle
+ E058 33 IR Symbol for Index Return
+ E059 34 PP Symbol for Presentation Position (6)
+ E05A 35 TRN Symbol for Transparent (6)
+ E05B 36 NBS Symbol for Numeric Backspace (6)
2404 37 EOT Symbol for End of Transmission
+ E05C 38 SBS Symbol for Subscript
+ E05D 39 IT Symbol for Indent Tabulation
+ E05E 3A RFF Symbol for Reverse Form Feed
+ E05F 3B CU3 Symbol for Customer Use 3 (5)
2414 3C DC4 Symbol for Device Control 4
2415 3D NAK Symbol for Negative Acknowledge
+ E060 3E (Reserved by IBM for future use)
241A 3F SUB Symbol for Substitute
Notes:
(1) Conflict/coincidence with a hex byte.
(2) Conflict/coincidence with C1 2X form; see text just below these notes.
Also note that the NL glyph is part of the DEC Special Graphics
character set [3-9].
(3) The IUS control is sometimes also labeled ITB.
(4) The SM control is sometimes also labeled SW (= Switch).
(5) Note: There is no longer a Customer Use 2 (see Table 5.2).
(6) Supersedes old name from Table 5.2.
The fact that the EBCDIC control character name "NL" is the same as one of
the 2X forms of the C1 control character name "NEL" (the form used by DG
terminals is "NE", not "NL"), together with the fact that the semantics of
these two control characters are similar (though not identical) in their
respective domains, does not necessarily make them candidates for
unification, since the purpose of these sections is to encode the names of
the controls in each domain (ASCII/ISO, EBCDIC, Unicode), not the controls
themselves. If NEL and NL can be unified, then by this logic, so could
numerous other C0, C1, EBCDIC, and Unicode controls whose names were less
similar, e.g. C1 CSI (Control Sequence Introducer) and EBCDIC CSP (Control
Sequence Prefix), or C1 BHP (Break Permitted Here) and Unicode ZWS (Zero
Width Space), and this would defeat the advantage of encoding glyphs for the
names used in each control-character domain, namely that the glyphs would
contain names that are familiar to users of that domain.
Summary:
33 new characters, E040-E060, including 3 reserved.
Status:
Needed for debugging EBCDIC data streams. This block of characters is
separate and distinct from, and independent of, all other blocks in this
proposal. In particular, it is independent of the C1 controls.
For reference, Table 5.2 shows the original names for EBCDIC control
characters [13] that have been superseded by the names shown in Table 5.1.
This proposal does not advocate additional glyphs for these names.
Table 5.2: Obsolete EBCDIC Control Characters
Val Name Description Replaced By
04 PF Punch Off SEL
06 LC Lower Case RNL
0A SMM Start of Manual Message RPT
13 TM Tape Mark DC3
17 IL Idle POC
1A CC Cursor Control UBX
2B CU2 Customer Use 2 CSP
34 PN Punch On PP
35 RS Record Separator TRN
36 UC Upper Case NBS
6. IBM 3270 TERMINAL ORDERS AND CONTROLS
Names for IBM 3270(1) terminal orders and controls [27] that are not already
listed in Tables 3.1-5.1 are shown in Table 6.1, to be used in debugging
3270 data streams. Columns are as in the previous tables, except the Type
column, in which:
O = 3270 Terminal Order [27,Table 4-1]
D = 3270 Terminal Order in normal display [27,p.E-3]
L = LU 1 SCS Control Codes [27,Table 8-2]
F = 3270 Format Control Order [27,Table 4-3]
Notes:
(1) "3270" refers to the IBM 3270 terminal architecture, and not to any
specific 3270 terminal model, such as 3277, 3278, etc.
Table 6.1: 3270 Control Characters
Code Val Name Type Description
E070 1D SF O Symbol for Start Field
E071 11 SBA O Symbol for Set Buffer Address
E072 2C MF O Symbol for Modify Field
E073 13 IC O Symbol for Insert Cursor
E074 05 PT O Symbol for Program Tab
E075 3C RA O Symbol for Repeat to Address
E076 12 EUA O Symbol for Erase to Unprotected Address
E077 04 VCS L Symbol for Vertical Channel Select
E078 14 ENP L Symbol for Enable Presentation
E079 24 INP L Symbol for Inhibit Presentation
E07A 2B FMT L Symbol for Format
E07B 1C DUP F Symbol for Duplicate
E07C 1C DUP D Overscore asterisk (1)
E07D 1E FM F Symbol for Field Mark
E07E 1E FM D Overscore semicolon (1)
E07F FF EO F Symbol for Eight Ones
Notes:
(1) When displayed by an actual 327x terminal, as opposed to an emulator
in "display controls" mode.
Summary:
16 new characters, E070-E07F.
Status:
Needed for debugging IBM 3270 data streams. This block of characters is
supplementary to the one in Section 5, and should not be approved unless
the EBCDIC control picture glyphs are also approved.
7. ADDITIONAL CONTROL-LIKE PICTURES
Table 7.1 shows additional characters included in "display controls" mode on
various terminals.
Table 7.1: Additional Control-Like Pictures
Code Name Description
E090 LS1 Symbol for Locking Shift 1 (1)
E091 LS0 Symbol for Locking Shift 0 (2)
E092 CEX Symbol for Control Extension (3)
E093 IS4 Symbol for Information Separator 4 (4)
E094 IS3 Symbol for Information Separator 3 (5)
E095 IS2 Symbol for Information Separator 2 (6)
E096 IS1 Symbol for Information Separator 1 (7)
E097 Picture of Bell (8)
E098 BP Word Processing Symbol BP (9)
E099 BE Word Processing Symbol BE (9,10)
E09A FN Word Processing Symbol FN (9)
E09B FE Word Processing Symbol FE (9,10)
E09C HF Word Processing Symbol BP (9)
2426 Symbol for Substitute Form Two (Reverse Question Mark) (11)
Notes:
(1) ISO name for SO [18].
(2) ISO name for SI [18].
(3) From JIS C 6225-1979 / ISO # 74 [28].
(4) ISO Name for FS [18].
(5) ISO Name for GS [18].
(6) ISO Name for RS [18].
(7) ISO Name for US [18].
(8) Used on HP terminals in place of Symbol for BEL (U+2407) [K1].
(9) From the Data General Word Processing Set [2].
(10) Conflict/Coincidence with Hex Byte; see Note (3) in Section 4.
(11) The upright reverse question mark is used by DEC VT terminals to
indicate that an invalid code was received. It also stands for SUB
and/or RS in Wyse 370 [G2] and VT220 [A1] display controls mode, and is
a glyph in its own right in the DEC Technical Character Set [C2], the
DG Special Graphics Character Set [D4], and several others. This one
is not in Unicode at present, but is encoded in Amendment 18 to ISO
10646 at the code point shown, with the requisite shape of reverse
upright question mark.
Note that several other C0 controls have distinctive ISO names, such as TC1
for SOH, TC2 for STX, TC3 for ETX...; FE0 for BS, FE1 for HT, FE2 for LF, etc
[28, Registration #001, the ISO 646 Control Set], but I have never seen these
used outside the standard itself.
Summary:
13 characters, E090-E09C.
Status:
The ISO names LS1, LS0, IS4, IS3, IS2, IS1 are suggested for standards
compliance; these might be suggested as glyph variants for SO, SI, FS, GS,
RS, and US rather than encoded separately. However, the HP and DG
symbols, as well as the reverse question mark, are are needed by terminal
emulators.
8. UNICODE CONTROL PICTURES
Table 8.1 lists the nonprinting Unicode characters used for spacing,
directionality control, and general formatting. These characters are in
the U+2000 block, and are indicated by mnemonics inside broken-line squares.
The Code column contains the temporary code value for the proposed symbol.
The Val column contains the Unicode value of the character for which the
symbolic representation is proposed. The Name column contains the
desginator shown in the broken-line square in the Unicode code table, with
a space standing for a line break (but see Note 2).
The suggested glyphs are those shown in the Unicode Standard.
Table 8.1: Unicode Control Characters
Code Val Name Description
E000 2000 NQ SP Symbol for En Quad
E001 2001 MQ SP Symbol for Em Quad
E002 2002 EN SP Symbol for En Space
E003 2003 EM SP Symbol for Em Space
E004 2004 3/M SP Symbol for Three-Per-Em-Space
E005 2005 4/M SP Symbol for Four-Per-Em-Space
E006 2006 6/M SP Symbol for Six-Per-Em-Space
E007 2007 F SP Symbol for Figure Space
E008 2008 P SP Symbol for Punctuation Space
E009 2009 TH SP Symbol for Thin Space
E00A 200A H SP Symbol for Hair Space
E00B 200B ZW SP Symbol for Zero-Width Space
E00C 200C ZW NJ Symbol for Zero-Width Non-Joiner
E00D 200D ZW J Symbol for Zero-Width Joiner
E00E 200E LRM Symbol for Left-to-Right Mark
E00F 200F RLM Symbol for Right-to-Left Mark
E010 2028 L SEP Symbol for Line Separator
E011 2029 P SEP Symbol for Paragraph Separator
E012 202A LRE Symbol for Left-to-Right Embedding
E013 202B RLE Symbol for Right-to-Left Embedding
E014 202C PDF Symbol for Pop Directional Formatting
E015 202D LRO Symbol for Left-to-Right Override
E016 202E RLO Symbol for Right-to-Left Override
E017 206A I SS Symbol for Inhibit Symmetric Swapping
E018 206B A SS Symbol for Activate Symmetric Swapping
E019 206C I AFS Symbol for Inhibit Arabic Form Shaping
E01A 206D A AFS Symbol for Activate Arabic Form Shaping
E01B 206E NA DS Symbol for National Digit Shapes
E01C 206F NO DS Symbol for Nominal Digit Shapes
E01D FEFF ZWN BSP Symbol for Zero Width No Break Space
E01E FFFE FF FE Symbol for Not A Character (Byte Order) (1)
E01F FFFF FF FF Symbol for Not A Character (1)
Notes:
(1) No mnemonic or abbreviation is given for the "not-a-character" characters
in the Unicode Standard. A glyph is suggested for this character to
allow Unicode-based debugging software or monitors to be able to
unambiguously indicate its presence in the data stream.
Summary:
32 characters, E0000-E01F.
Status:
Controversial. Unicode control pictures are not needed for terminal
emulation (at least not unless and until a Unicode-based terminal is
defined), but are included for symmetry with the situation for C0
controls, and for completeness and reference. Makers of word processors,
Web browsers, and other Unicode-based applications might find it desirable
to add debugging features to their products using these glyphs.
9. SUMMARY OF PROPOSED ADDITIONAL CHARACTERS
The following control pictures are proposed:
Unicode Controls: 32 new characters, E000-E01F
C0 Controls: 0 new characters
C1 Controls: 32 new characters, E020-E03F
EBCDIC Controls: 33 new characters, E040-E060
3270 Controls: 16 new characters, E070-E07F
Misc Controls: 13 new characters, E090-E09C
Total Control Pics: 126
Without Unicode: 94
If all the proposed new characters are added to the UCS, this will enable
terminal emulators to fully handle at least the following terminal character
sets, which were not previously covered in full:
ASCII/ISO Display Controls for DEC, Hewlett Packard, Wyse, Televideo,
and others.
EBCDIC Display Controls for the IBM 3270
Table 9.1: Census of New Characters
Code Description
E000 Symbol for En Quad
E001 Symbol for Em Quad
E002 Symbol for En Space
E003 Symbol for Em Space
E004 Symbol for Three-Per-Em-Space
E005 Symbol for Four-Per-Em-Space
E006 Symbol for Six-Per-Em-Space
E007 Symbol for Figure Space
E008 Symbol for Punctuation Space
E009 Symbol for Thin Space
E00A Symbol for Hair Space
E00B Symbol for Zero-Width Space
E00C Symbol for Zero-Width Non-Joiner
E00D Symbol for Zero-Width Joiner
E00E Symbol for Left-to-Right Mark
E00F Symbol for Right-to-Left Mark
E010 Symbol for Line Separator
E011 Symbol for Paragraph Separator
E012 Symbol for Left-to-Right Embedding
E013 Symbol for Right-to-Left Embedding
E014 Symbol for Pop Directional Formatting
E015 Symbol for Left-to-Right Override
E016 Symbol for Right-to-Left Override
E017 Symbol for Inhibit Symmetric Swapping
E018 Symbol for Activate Symmetric Swapping
E019 Symbol for Inhibit Arabic Form Shaping
E01A Symbol for Activate Arabic Form Shaping
E01B Symbol for National Digit Shapes
E01C Symbol for Nominal Digit Shapes
E01D Symbol for Zero Width No Break Space
E01E Symbol for Not A Character (Byte Order)
E01F Symbol for Not A Character
E020 (Reserved)
E021 (Reserved)
E022 Symbol for Break Permitted Here
E023 Symbol for No Break Here
E024 Symbol for Index
E025 Symbol for Next Line
E026 Symbol for Start Selected Area
E027 Symbol for End Selected Area
E028 Symbol for Character Tabulation Set
E029 Symbol for Character Tabulation with Justification
E02A Symbol for Line Tabulation Set
E02B Symbol for Partial Line Forward
E02C Symbol for Partial Line Backward
E02D Symbol for Reverse Line Feed
E02E Symbol for Single Shift 2
E02F Symbol for Single Shift 3
E030 Symbol for Device Control String
E031 Symbol for Private Use 1
E032 Symbol for Private Use 2
E033 Symbol for Set Transmit State
E034 Symbol for Cancel Character
E035 Symbol for Message Waiting
E036 Symbol for Start Protected (Guarded) Area
E037 Symbol for End Protected (Guarded) Area
E038 Symbol for Start of String
E039 (Reserved)
E03A Symbol for Single Character Introducer
E03B Symbol for Control Sequence Introducer
E03C Symbol for String Terminator
E03D Symbol for Operating System Command
E03E Symbol for Privacy Message
E03F Symbol for Application Program Command
E040 Symbol for Select
E041 Symbol for Required New Line
E042 Symbol for Graphic Escape
E043 Symbol for Superscript
E044 Symbol for Repeat
E045 Symbol for Restore
E046 Symbol for Program Operator Communication
E047 Symbol for Unit Back Space
E048 Symbol for Customer Use 1
E049 Symbol for Interchange File Separator
E04A Symbol for Interchange Group Separator
E04B Symbol for Interchange Record Separator
E04C Symbol for Interchange Unit Separator
E04D Symbol for Digit Select
E04E Symbol for Start of Significance
E04F Symbol for Word Underscore
E050 Symbol for Bypass
E051 Symbol for Set Attribute
E052 Symbol for Start Field Extended
E053 Symbol for Set Mode
E054 Symbol for Control Sequence Prefix
E055 Symbol for Modify Field Attribute
E056 (Reserved)
E057 (Reserved)
E058 Symbol for Index Return
E059 Symbol for Presentation Position
E05A Symbol for Transparent
E05B Symbol for Numeric Backspace
E05C Symbol for Subscript
E05D Symbol for Indent Tabulation
E05E Symbol for Reverse Form Feed
E05F Symbol for Customer Use 3
E060 (Reserved)
E070 Symbol for Start Field
E071 Symbol for Set Buffer Address
E072 Symbol for Modify Field
E073 Symbol for Insert Cursor
E074 Symbol for Program Tab
E075 Symbol for Repeat to Address
E076 Symbol for Erase to Unprotected Address
E077 Symbol for Vertical Channel Select
E078 Symbol for Enable Presentation
E079 Symbol for Inhibit Presentation
E07A Symbol for Format
E07B Symbol for Duplicate
E07C Overscore asterisk
E07D Symbol for Field Mark
E07E Overscore semicolon
E07F Symbol for Eight Ones
E090 Symbol for Locking Shift 1
E091 Symbol for Locking Shift 0
E092 Symbol for Control Extension
E093 Symbol for Information Separator 4
E094 Symbol for Information Separator 3
E095 Symbol for Information Separator 2
E096 Symbol for Information Separator 1
E097 Picture of Bell
E098 Word Processing Symbol BP
E099 Word Processing Symbol BE
E09A Word Processing Symbol FN
E09B Word Processing Symbol FE
E09C Word Processing Symbol BP
10. REFERENCES
[1] American National Standards Institute, ANSI X3.4-1986, Code for
Information Interchange (ASCII), 1986.
[2] Data General, Programming the Display Terminal: Models D217, D413, and
D463, Westboro, MA, 1991.
[3] Digital Equipment Corporation, VT100 User Guide, EK-VT100-UG-002,
Maynard, MA, 1979.
[4] Digital Equipment Corporation, VT102 Video Terminal User Guide,
EK-VT102-UG-003, Maynard, MA, 1982.
[5] Digital Equipment Corporation, VT220 Owner's Manual, EK-VT220-UG-003,
Maynard, MA, 1984.
[6] Digital Equipment Corporation, VT220 Series Programmer Reference
Manual, EK-VT240-RM-002, Maynard, MA, 1984.
[7] Digital Equipment Corporation, VT330/VT340 Programmer Reference Manual,
Volume 1: Text Programming, ED-VT3XX-TP-002, Maynard, MA, 1988.
[8] Digital Equipment Corporation, Installing and Using the VT420 Video
Terminal EK-VT420-UG.002, Maynard, MA, 1988.
[9] Digital Equipment Corporation, VT520/VT525 Video Terminal Programmer
Inforamtion, EK-VT520-RM.A01, Maynard, MA, 1994.
[10] Heathkit Manual for the Video Terminal Model H19, The Heath Company,
Benton Harbor, MI, 1979.
[11] Hewlett Packard 2621A/P Interactive Terminal Owner's Manual, 1978.
[12] Hewlett Packard 2648A Graphics Terminal Reference Manual, 1977.
[13] IBM System/360 Principles of Operation, GA22-6821-8, Poughkeepsie,
NY, 1970.
[14] IBM National Language Design Guide, Volume 2: National Language
Support Reference Manual, 4th Edition, SE09-8002-03, North York
ON, 1994.
[15] IBM 3270 Information Display System, Component Description,
GA27-2749-10, 1980.
[16] IBM 3164 ASCII Color Display Station Description, GA18-2317-1, 1986.
[17] ISO International Standard 2022, Information processing -- ISO
7-bit and 8-bit coded character sets -- Code extension techniques,
Third Edition, Geneva, 1986.
[18] ISO/IEC International Standard 6429, Information technology --
Control functions for coded character sets, Third Edition, Geneva, 1992.
[19] ISO/IEC 10646-1, International Standard 10646,
Information Processing -- Multiple-Octet Coded Character Set,
1993-now.
[20] Perkin Elmer Model 1100 User's Manual, Randolph, NJ, 1978.
[21] Siemens Nixdorf, Bildschirmeinheit 97801-5xx Schnittstellen,
Benutzerhandbuch, Mⁿnchen, 1991.
[22] Televideo 922 Video Terminal Display Operator's Manual, Sunnyvale, CA,
1984.
[23] Televideo 965 Video Terminal Display Operator's Manual, Sunnyvale, CA,
1988.
[24] The Unicode Standard, Version 2.0, Addison-Wesley Developers
Press, 1996.
[25] Wyse WY-60 Programmer's Guide, Wyse Technology, San Jose, CA, 1987.
[26] Wyse WY-370 Programmer's Guide, Wyse Technology, San Jose, CA, 1990.
[27] IBM 3270 Information Display System, Data Stream Programmer's Reference,
GA23-0059-06, 1991.
[28] ISO International Register of Coded Characters to Be Used with Escape
Sequences, European Computer Manufacturers Association (ECMA), Geneva,
1985-present.
[29] IBM Character Data Representation Architecture, Level 1 Registry, IBM
Canada Ltd., National Language Technical Centre, Ontario, SC09-1391-00,
1990 (superseded by: IBM Character Data Representation Architecture,
Registration and Registry, IBM Canada Ltd., Toronto, SC09-2190-00, 1995).
[30] Knuth, Donald, "TeX and METAFONT, New Directions in Typesetting",
American Mathematical Society / Digital Press, Bedford MA, 1979.
[31] Apple Computer Corporation, Inside Macintosh, 1984.
[32] HDS-3200 Terminal Series Owner's Manual, Philadelphia PA, 1987.
[33] Zenith Data Systems Video Terminal Z-19-CN Operation Manual, Saint
Joseph, MI, 1981.
[34] Interview 30A/40A Operator's Field Reference Guide, Atlantic Research
Corporation, ATLC-107-919-101, Alexandria, VA, 1982.
11. EXHIBITS
The following exhibits, available only on paper, are reproduced from the
terminal manuals indicated by the numeric reference number. Each exhibit is
1 page unless otherwise indicated.
[A1] VT220 Display Controls Font (Left Half) [5].
[A2] VT220 Display Controls Font (Right Half) [5].
[A3] VT220 DEC Special Graphics Character Set [5].
[B1] VT320 Display Controls Font (Left Half) [7].
[B2] VT320 Display Controls Font (Right Half) [7].
[C1] VT420 Display Controls Font (Both Halves) [8].
[C2] VT420 DEC Technical Character Set [8].
[C3] HDS-3200 DEC Technical Character Set [32].
[D1] Data General US ASCII Character Set [2].
[D2] Data General Word-Processing, Greek, and Math Character Set [2].
[D3] Data General Line Drawing Character Set [2].
[D4] Data General Special Graphics Character Set [2].
[D5] Data General VT Multinational Character Set [2].
[D6] Data General VT Special Graphics Character Set [2].
[D7] Data General ISO 8859/1.2 Character Set [2].
[E1] Siemens Nixdorf 97801 ISO 8859-1 Character Set [21].
[E2] Siemens Nixdorf 97801 Klammern (Brackets) Character Set [21].
[E3] Siemens Nixdorf 97801 Facet Character Set [21].
[E4] Siemens Nixdorf 97801 IBM Character Set [21].
[E5] Siemens Nixdorf 97801 Math Character Set [21].
[E6] Siemens Nixdorf 97801 Character Generator (8 pages) [21].
[F1] Wyse 60 Native, Multinational, PC, and ASCII Character Sets [25].
[F2] Wyse 60 Graphics 1, 2, and 3 Character Sets [25].
[F3] Wyse 60 Standard ANSI, ANSI Graphics, and UK ANSI Character Sets [25].
[G1] Wyse 370 Controls Display Mode (74Hz) [26].
[G2] Wyse 370 Controls Display Mode (60Hz) [26].
[G3] Wyse 370 C0, ASCII, and Special Graphics Character Sets [26].
[G4] Wyse 370 C1, Multinational, and Latin-1 Character Sets [26].
[H1] IBM 3270 Operator Information Area Symbols (10 pages) [15].
[I1] TeX Standard Extension Font [30].
[J1] Apple Symbol Font (2 pages) [31].
[K1] Hewlett Packard 2621A/P National Terminal Character Set [11].
[L1] Heath/Zenith-19 Graphic Symbols (2 pages) [33].
[M1] Televideo 922 ASCII, Supplemental, Special Character Sets (4 pages) [22].
[N1] Sample screen from a data analyzer showing hex display [34].
(End)