SGMLS
Section: User Commands (1)
Index
Return to Main Contents
NAME
sgmls - a validating SGML parser
An SGML System Conforming to
International Standard ISO 8879 ---
Standard Generalized Markup Language
SYNOPSIS
sgmls
[
-acdeglrsv
]
[
-xflags
]
[
-yflags
]
filename...
DESCRIPTION
Sgmls
parses and validates
the SGML document entity in
filename...
and prints on the standard output a simple ASCII representation of its
Element Structure Information Set.
(This is the information set which a structure-controlled
conforming SGML application should act upon.)
The following options are available:
- -a
-
Detect and report ambiguous content models.
- -c
-
Describe capacity usage at the end of the parse.
- -d
-
Warn about duplicate entity declarations.
- -e
-
Describe open entities in error messages.
Error messages always include the position of the most recently
opened external entity.
- -g
-
Show the GIs of open elements in error messages.
- -l
-
Output
L
commands giving the current line number and filename.
- -p
-
Parse only the prolog.
Sgmls
will exit after parsing the document type declaration.
Implies
-s.
- -r
-
Warn about defaulted references.
- -s
-
Suppress output.
Error messages will still be printed.
- -v
-
Print the version number.
- -xflags
-
- -yflags
-
Enable debugging output;
-x
applies to the document body,
-y
to the prolog.
Each character in the
flags
argument enables tracing of a particular activity.
-
- t
-
Trace state transitions.
- a
-
Trace attribute activity.
- c
-
Trace context checking.
- d
-
Trace declaration parsing.
- e
-
Trace entities.
- g
-
Trace groups.
- i
-
Trace IDs.
- m
-
Trace marked sections.
- n
-
Trace notations.
Entity Manager
An external entity resides in one or more files.
The entity manager component of
sgmls
maps a sequence of files into an entity in three sequential stages:
- 1.
-
each carriage return character is turned into a non-SGML character;
- 2.
-
each newline character is turned into a record end character,
and at the same time
a record start character is inserted at the beginning of each line;
- 3.
-
the files are concatenated.
A system identifier is
interpreted as a list of filenames separated by colons.
If no system identifier is supplied, then the entity manager will
attempt to generate a filename using the public identifier
(if there is one) and other information available to it.
Notation identifiers are not subject to this treatment.
This process is controlled by the environment variable
SGML_PATH;
this contains a colon-separated list of filename templates.
A filename template is a filename that may contain
substitution fields; a substitution field is a
%
character followed by a single letter that indicates the value
of the substitution.
The value of a substitution can either be a string
or it can be
null.
The entity manager transforms the list of
filename templates into a list of filenames by substituting for each
substitution field and discarding any template
that contained a substitution field whose value was null.
It then uses the first resulting filename that exists and is readable.
Substitution values are transformed before being used for substitution:
firstly, any names that were subject to upper case substitution
are folded to lower case;
secondly,
space characters are mapped to underscores
and slashes are mapped to percents.
The values of substitution fields are as follows:
- %%
-
A single
%.
- %D
-
The entity's data content notation.
This substitution will succeed only for external data entities.
- %N
-
The entity, notation or document type name.
- %P
-
The public identifier if there was a public identifier,
otherwise null.
- %X
-
(This is provided mainly for compatibility with ARCSGML.)
A three-letter string chosen as follows:
-
| | With public identifier
|
| |
|
| No public | Device | Device
|
| identifier | independent | dependent
|
|
Data or subdocument entity | nsd | pns | vns
|
General SGML text entity | gml | pge | vge
|
Parameter entity | spe | ppe | vpe
|
Document type definition | dtd | pdt | vdt
|
Link process definition | lpd | plp | vlp
|
The device dependent version is selected if the public text class
allows a public text display version but no public text display
version was specified.
- %Y
-
The type of thing for which the filename is being generated:
SGML subdocument entity | sgml
|
Data entity | data
|
General text entity | text
|
Parameter entity | parm
|
Document type definition | dtd
|
Link process definition | lpd
|
The value of the following substitution fields will be null
unless a valid formal public identifier was supplied.
- %A
-
Null if the text identifier in the
formal public identifier contains an unavailable text indicator,
otherwise the empty string.
- %C
-
The public text class, mapped to lower case.
- %I
-
The empty string if the owner identifier in the formal public identifier
is an ISO owner identifier,
otherwise null.
- %L
-
The public text language, mapped to lower case,
unless the public text class is
CHARSET,
in which case null.
- %O
-
The owner identifier (with the
+//
or
-//
prefix stripped.)
- %R
-
The empty string if the owner identifier in the formal public identifier
is a registered owner identifier,
otherwise null.
- %S
-
The public text designating sequence if the public text class is
CHARSET,
otherwise null.
- %T
-
The public text description.
- %U
-
The empty string if the owner identifier in the formal public identifier
is an unregistered owner identifier,
otherwise null.
- %V
-
The public text display version.
This substitution will be null if the public text class
does not allow a display version or if no version was specified.
If an empty version was specified, a value of
default
will be used.
System declaration
The system declaration for
sgmls
is as follows:
SYSTEM "ISO 8879-1986"
|
CHARSET
|
BASESET | "ISO 646-1983//CHARSET
|
| International Reference Version (IRV)//ESC 2/5 4/0"
|
DESCSET | 0 128 0
|
CAPACITY | PUBLIC | "ISO 8879-1986//CAPACITY Reference//EN"
|
FEATURES
|
MINIMIZE | DATATAG | NO | OMITTAG | YES | RANK | NO | SHORTTAG | YES
|
LINK | SIMPLE | NO | IMPLICIT | NO | EXPLICIT | NO | |
|
OTHER | CONCUR | NO | SUBDOC | YES 1 | FORMAL | YES | |
|
SCOPE | DOCUMENT
|
SYNTAX | PUBLIC | "ISO 8879-1986//SYNTAX Reference//EN"
|
SYNTAX | PUBLIC | "ISO 8879-1986//SYNTAX Core//EN"
|
VALIDATE
|
| GENERAL | YES | MODEL | YES | EXCLUDE | YES | CAPACITY | YES
|
| NONSGML | YES | SGML | YES | FORMAL | YES | |
|
SDIF
|
| PACK | NO | UNPACK | NO | | | |
|
The memory usage of
sgmls
is not a function of the capacity points used by a document;
however,
sgmls
can handle capacities significantly greater than the reference capacity set.
In some environments,
higher values may be supported for the SUBDOC parameter.
Documents that do not use optional features are also supported.
For example, if
FORMAL NO
is specified in the SGML declaration,
public identifiers will not be required to be valid formal public identifiers.
Certain parts of the concrete syntax may be changed:
-
The shunned character numbers can be changed.
Uppercase substitution can be performed or not performed
both for entity names and for other names.
Either short reference delimiters assigned by the reference delimiter set
or no short reference delimiters are supported.
The reserved names can be changed.
The quantity set can be increased within certain limits
subject to there being sufficient memory available.
The upper limit on NAMELEN is 239.
The upper limits on
ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL,
LITLEN, PILEN, TAGLEN, and TAGLVL
are more than thity times greater than the reference limits.
The upper limit on
GRPCNT, GRPGTCNT, and GRPLVL is 253.
NORMSEP
cannot be changed.
DTAGLEN
are
DTEMPLEN
irrelevant since
sgmls
does not support the
DATATAG
feature.
Ambiguous content models are reported
(as specified by
MODEL YES)
only if the
-a
option is given.
SGML declaration
The SGML declaration may be omitted,
the following declaration will be implied:
<!SGML "ISO 8879-1986"
|
CHARSET
|
BASESET | "ISO 646-1983//CHARSET
|
| International Reference Version (IRV)//ESC 2/5 4/0"
|
DESCSET | 0 9 UNUSED
|
| 9 2 9
|
| 11 2 UNUSED
|
| 13 1 13
|
| 14 18 UNUSED
|
| 32 95 32
|
| 127 1 UNUSED
|
CAPACITY | PUBLIC | "ISO 8879-1986//CAPACITY Reference//EN"
|
SCOPE | DOCUMENT
|
SYNTAX | PUBLIC | "ISO 8879-1986//SYNTAX Reference//EN"
|
FEATURES
|
MINIMIZE | DATATAG | NO | OMITTAG | YES | RANK | NO | SHORTTAG | YES
|
LINK | SIMPLE | NO | IMPLICIT | NO | EXPLICIT | NO | |
|
OTHER | CONCUR | NO | SUBDOC | YES 99999999 | FORMAL | YES | |
|
APPINFO NONE>
|
with the exception that characters 128 through 254 will be assigned to
DATACHAR.
When exporting documents that use characters in this range,
an accurate description of the upper half of the document character set
should be added to this declaration.
For ISO Latin-1, an appropriate description would be:
BASESET | "ISO Registration Number 100//CHARSET
|
| ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
|
DESCSET | 128 32 UNUSED
|
| 160 95 32
|
| 255 1 UNUSED
|
Output format
The output is a series of lines.
Lines can be arbitrarily long.
Each line consists of an initial command character
and one or more arguments.
Arguments are separated by a single space,
but when a command takes a fixed number of arguments
the last argument can contain spaces.
There is no space between the command character and the first argument.
Arguments can contain the following escape sequences.
- \\
-
A
\.
- \n
-
A record end character.
- \|
-
Internal SDATA entities are bracketed by these.
- \nnn
-
The character whose code is
nnn
octal.
- \s
-
A space.
This is used only in filenames or notation identifiers
that contain a space.
(Filenames can occur in the
S,
E
and
L
commands.)
The possible command characters and arguments are as follows:
- (gi
-
The start of an element whose generic identifier is
gi.
Any attributes for this element
will have been specified with
A
commands.
- )gi
-
The end an element whose generic identifier is
gi.
- -data
-
Data.
- &name
-
A reference to an external data entity
name;
name
will have been defined using an
E
command.
- ?pi
-
A processing instruction with data
pi.
- Aname val
-
The next element to start has an attribute
name
with value
val
which takes one of the following forms:
-
- IMPLIED
-
The value of the attribute is implied.
- CDATA data
-
The attribute is character data.
This is used for attributes whose declared value is
CDATA.
- TOKEN token...
-
The attribute is a list of tokens.
This is used for attributes whose declared value is a name token group
or one of
NAME,
NMTOKEN,
NUTOKEN,
NUMBER,
NAMES,
NMTOKENS,
NUTOKENS
or
NUMBERS.
- NOTATION nname
-
The attribute is a notation name;
nname
will have been defined using a
N
command.
This is used for attributes whose declared value is
NOTATION.
- ENTITY name...
-
The attribute is a list of general entity names.
Each entity name will have been defined using an
I,
E
or
S
command.
This is used for attributes whose declared value is
ENTITY
or
ENTITIES.
- ID id
-
The attribute is an id value.
This is used for attributes whose declared value is
ID.
- IDREF id...
-
The attribute is a list of id references.
This is used for attributes whose declared value is
IDREF
or
IDREFS.
- Dename name val
-
This is the same as the
A
command, except that it specifies a data attribute for an
external entity named
ename.
Any
D
commands will come after the
E
command that defines the entity to which they apply, but
before any
&
or
A
commands that reference the entity.
- Nnname sysid pubid
-
Define a notation
nname
associated with system identifier
sysid,
and public identifier
pubid.
If no system identifier was specified,
the notation name will be used
(after being transformed as if it were a substitution value).
If no public identifier was specified,
pubid
will be omitted.
A notation will only be defined if it is to be referenced
in an
E
command or in an
A
command for an attribute with a declared value of
NOTATION.
- Eename typ nname filename...
-
Define an external data entity named
ename
with type
typ
(CDATA,
NDATA
or
SDATA),
notation
not
and a list of files
filename...;
not
will have been defined using a
N
command.
Data attributes may be specified for the entity using
D
commands.
An external data entity will only be defined if it is to be referenced in a
&
command or in an
A
command for an attribute whose declared value is
ENTITY
or
ENTITIES.
- Iename typ text
-
Define an internal data entity named
ename
with type
typ
(CDATA
or
SDATA)
and entity text
text.
An internal data entity will only be defined if it is referenced in an
A
command for an attribute whose declared value is
ENTITY
or
ENTITIES.
- Sename filename...
-
Define a subdocument entity with a list of files
filename....
Such an entity will only be defined if it is referenced in an
A
command for an attribute whose declared value is
ENTITY
or
ENTITIES.
- {ename
-
The start of the SGML subdocument entity
ename.
- }ename
-
The end of the SGML subdocument entity
ename.
- Llineno file
-
-
Llineno
- Set the current line number and filename.
The
filename
argument will be omitted if only the line number has changed.
This will be output only if the
-l
option has been given.
BUGS
Non-SGML characters in literals are counted as two characters for the
purposes of quantity and capacity calculations.
No error message is given for character references or marked section
declarations between the end of the DTD and the start of the document
element.
SEE ALSO
The SGML Handbook, Charles F. Goldfarb
ISO 8879 (Standard Generalized Markup Language),
International Organization for Standardization
ORIGIN
ARCSGML was written by Charles F. Goldfarb.
Sgmls
was derived from ARCSGML by James Clark (jjc@jclark.com),
to whom bugs should be reported.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- Entity Manager
-
- System declaration
-
- SGML declaration
-
- Output format
-
- BUGS
-
- SEE ALSO
-
- ORIGIN
-
This document was created by
man2html,
using the manual pages.
Time: 23:08:53 GMT, June 08, 2025