home *** CD-ROM | disk | FTP | other *** search
- <!-- $Id: doc-charset.sgm,v 1.10 1995/06/15 20:16:40 connolly Exp $ -->
- <h1 id=datachrs>Characters, Words, and Paragraphs
-
- An HTML user agent should present the body of an HTML document as a
- collection of typeset paragraphs and preformatted text. Except for
- preformatted elements (<tag/PRE/, <tag/XMP/, <tag/LISTING/,
- <tag/TEXTAREA/), each block structuring element is regarded as a
- paragraph by taking the data characters in its content and the content
- of its descendant elements, concatenating them, and splitting the
- result into words, separated by space, tab, or record end characters
- (and perhaps hyphen characters). The sequence of words is typeset as a
- paragraph by breaking it into lines.
-
- <h2 id=charlist>The HTML Document Character Set
-
- The document character set specified in <hdref refid=decl> must be
- supported by HTML user agents. It includes the graphic characters of
- Latin Alphabet No. 1, or simply Latin-1. Latin-1 comprises 191
- graphic characters, including the alphabets of most Western European
- languages.
- <note>Use the non-breaking space and soft hyphen indicator characters is
- discouraged because support for them is not widely deployed.
- </note>
- <note>
- To support non-western writing systems, a larger character repertoire
- will be specified in a future version of HTML. The document character
- set will be [ISO-10646], or some subset that agrees with [ISO-10646];
- in particular, all numeric character references must use code
- positions assigned by [ISO-10646].
- </note>
-
- In SGML applications, the use of control characters is limited in
- order to maximize the chance of successful interchange over
- heterogeneous networks and operating systems. In the HTML document
- character set only three control characters are allowed: Horizontal
- Tab, Carriage Return, and Line Feed (code positions 9, 13, and 10).
-
- The HTML DTD references the Added Latin 1 entity set, to allow
- mnemonic representation of selected Latin 1 characters using only the
- widely supported ASCII character repertoire. For example:
-
- <listing><![CDATA[
- Kurt Gödel was a famous logician and mathematician.
- ]]></listing>
-
- See <hdref refid="lat1ent"> for a table of the "Added Latin 1"
- entities, and <hdref refid="iso-latin-1"> for a table of the code
- positions of [ISO 8859-1] and the control characters in the HTML
- document character set.
-