home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.robelle3000.ai 2014
/
2014.06.ftp.robelle3000.ai.tar
/
ftp.robelle3000.ai
/
papers
/
sd.txt
< prev
next >
Wrap
Text File
|
1995-08-31
|
61KB
|
1,812 lines
Adopting Self-Describing Files
By David J. Greer
Robelle Consulting Ltd.
Unit 201, 15399-102A Ave.
Surrey, B.C. Canada V3R 7K1
Phone: (604) 582-1700
Fax: (604) 582-1799
http://www.robelle.com
Abstract
Query can generate output to self-describing (SD) files, but no
HP products read these files except the old DSG and Listkeeper
products. A self-describing file is a data file which stores a
standard description of its own record format in its user labels.
Thus, it is like a little stand-alone database (a trendy
developer might call this object-oriented). If two software
tools understand SD files, it becomes trivial to transfer data
between them. A user can archive some data in an SD file and
when it is restored five years later the SD file can tell what
the data means. The author describes the internal format of SD
files, gives examples on how to read and write SD files, and
describes problems integrating SD files into a software tool.
Copyright Robelle Consulting Ltd. 1992
Permission is granted to reprint this document (but not for
profit), provided that copyright notice is given.
Introduction
For years, Query's Save Command has been able to create a file
that is self-describing. A self-describing file is one that
contains the information about the fields in the file. Normal
MPE and KSAM files are not self-describing. In general, we know
nothing about the structure of the fields in each record.
Unfortunately, few software tools create or understand
self-describing files. While Query can produce self-describing
files, it cannot use them as input. Our product Suprtool can
both create and understand self-describing files (including KSAM
ones). In addition, Suprtool has a new self-describing format
that removes some restrictions of the original self-describing
structure. In this article we will do the following:
o Describe the format of both the original self-describing file
(this will be a summary of the information in Appendix E of
the Query User Manual) and the new Robelle self-describing
file.
o Show how to create a self-describing file.
o Give a programming example that can understand and provide a
"form" listing of any self-describing file.
o Describe KSAM self-describing files.
o Speculate on what an "open system" self-describing file would
look like.
Query Versus Robelle Self-Describing Files
Throughout this article we will refer to one of two types of
self-describing files. The first kind are equivalent to the ones
produced by Query. They are identified by the version number
" A.00.00". The second kind were designed to overcome
limitations with the original self-describing format. We
identify the revised files by calling them Robelle self-
describing files. They have a version number of " B.00.00".
Examples In This Article
Because we write code in SPL and SPLash!, we will give our
examples in these programming languages. The only word of
caution is to remember that SPL uses zero-based addressing for
all its arrays.
MPE User Labels
User labels are an optional part of an MPE file. User labels are
part of the file, but they are not part of the data (i.e., when
reading the records in the file the user labels are ignored).
User labels are a handy place to store extra information about a
file. Unfortunately, MS-Dos and UNIX have no concept similar to
MPE's user labels (see the section Future Self-Describing Formats
for ideas for UNIX and MS-Dos).
The number of user labels must be specified when the file is
created. On most versions of MPE, the only way to create a file
with user labels is via the FOPEN intrinsic. Newer versions of
MPE/iX allow the ULABEL= keyword on the Build Command to specify
the number of user labels. Each user label is 256 bytes long and
user labels are numbered from zero. You access user labels by
calling the FREADLABEL and FWRITELABEL intrinsics.
Identifying Self-Describing Files
An MPE file that is self-describing has a filecode of 1084. You
will recognize these files by seeing "SD" next to the filecode of
a :listf,2:
FILENAME CODE ----------LOGICAL RECORD--------- ----SPACE----
SIZE TYP EOF LIMIT R/B SECTORS X MX
LOADFILE SD 128W FB 33 10000 35 256 1 *
Recognizing self-describing KSAM files is more difficult. KSAM
sd-files do not have a special file code. Instead, you must look
for a KSAM file with extra file labels. On MPE/iX, this is done
with a :listf ,3 (on MPE V/E use Listdir.Pub.Sys):
What Is A Self-Describing File
A self-describing file stores information in the MPE file labels
about the fields in each record of the file. File labels are
like a special file within a file. An MPE file label is 256
bytes long and an MPE file is created with 0 to 256 file labels.
The file labels are accessed via the Freadlabel and Fwritelabel
Intrinsic. User labels are numbered from zero.
Customarily, tools that create self-describing files leave the
first ten file labels (numbered 0 to 9) for user applications.
The self-describing information is broken into two kinds of
labels: the header label and field labels.
Header Label
If an MPE file has n file labels, they are numbered from 0 to
n-1. The self-describing labels are always added at the end of
the any file labels needed by the user. The last file label will
be the sd-header label and the sd-field labels are arranged
backwards from this label (n-2, n-3, ...). The format of the
header label is similar, but different for Query and Robelle
self-describing files.
Query Header Label
The Query header label consists of the following fields:
version (X8). Always equal to " A.00.00" for Query self-
describing files.
length (J1). The length of each record in the file in bytes. It
appears to always be identical to the MPE record length of the
file.
fields (J1). The number of fields in each file record.
labels (J1). Number of labels used for field descriptions
plus one for the header label. This is different than the
number of MPE labels for the file.
fields'per'label (J1). Each field label contains one or more
field descriptions. Do not assume a fixed number for this
field -- you must check the value of this field.
size (J1). Length of each field descriptor in 16-bit words.
Because MPE file labels are always 128 words long, the
fields'per'label should always be 128 / size. Again, do not
assume a fixed constant for the field descriptor size.
Robelle Header Label
The Robelle header label contains all of the fields of Query's
header label with one change (the version number is different)
and three additions:
1. The version number is " B.00.00" instead of " A.00.00" (note
the space at the beginning).
2. There are three new fields for handling sort keys. These
fields are identical to the fields that you would pass to
Sortinit (in compatibility-mode):
sort'max'keys (J1). Maximum number of keys allowed in this
sd-file. The sort'keys would be declared as:
integer array sort'keys(0:sort'max'keys*3-1)
sort'num'keys (J1). The actual number of keys in the table.
This value must range from zero to sort'max'keys-1.
sort'keys. The sort keys themselves using the same
conventions as Sort/3000. The byte-offsets of each key
start at one and not zero in the sort table. The byte
offsets in each field entry remain the same (i.e.,
zero-based instead of one-based offsets). The sort key
types correspond to those for the Sortinit intrinsic and
not the newer HPSortinit.
SPL Layout of the Header Label
Here is the layout in SPL notation of the Robelle header label.
Note that we use exactly the same layout for accessing Query
header labels (we just ignore all sd'sort'... variables when
accessing Query self-describing files). Each field descriptor is
fifteen words long, but even the Robelle field descriptor only
uses fourteen words. We leave the last word unspecified (our
code always sets the filler words with binary zeroes):
sdheader.srcinc:
integer array sd'header(0:sd'label'len); { 0 : 127 }
byte array sd'version(*) = sd'header;
integer array sd'reclength(*) = sd'header(4);
integer array sd'numfields(*) = sd'header(5);
integer array sd'numlabels(*) = sd'header(6);
integer array sd'fieldsperlabel(*) = sd'header(7);
integer array sd'entrylen(*) = sd'header(8);
integer array sd'sort'max'keys(*) = sd'header(9);
integer array sd'sort'num'keys(*) = sd'header(10);
integer array sd'sort'keys(*) = sd'header(11);
Field Labels
Every self-describing file has one or more field labels of 256
bytes. Each field label has one or more field descriptors. The
first fields in the file will be described in label N-2, the next
set of fields in N-3, and so on. This is opposite to what you
might expect. Self-describing files that Query produces have
eight field descriptors per user label.
Query Field Labels
Query always produces self-describing files with 15 words
reserved for each field descriptor. Each field is described as
follows:
field'name (X16). The name of the field left-justified. Field
names are in upper case.
field'type (J1). The type of the field taken from the following
list:
1. ASCII (type U and X).
2. free form ASCII numbers.
3. signed integer (type I).
4. floating point real (type R).
5. packed decimal (type P).
6. COBOL computational (type J).
7. unsigned integers (type K).
8. zoned decimal (type Z).
9. IEEE floating point (type E). This is a Robelle extension
that applies to either " A.00.00" or " B.00.00"
self-describing files.
10.IMAGE compound field.
field'offset (J1). The offset of the field in bytes. The offset
starts at zero.
field'length (J1). The length of the field in bytes.
reserved'space (4J1). Four words that are reserved for future
use.
Robelle Field Labels
Many HP 3000 applications contain repeated fields. Query
self-describing files map all repeated fields into type "10",
which is useless for applications that understand repeated
fields. It would also be nice if additional user information,
such as the number of decimal points or the format of a date were
available. The Robelle field descriptor provides for all of
these, by using three of the four words of reserved space. All
fields up to field'length are the same as QUERY's (note
especially that field'length is the total length of the field and
not the length of one sub-field). These are the new fields:
field'repeat (J1). In IMAGE terms, this is known as the
sub-count. For simple fields, field'repeat is one (and not
zero).
field'decplaces (J1). Logical number of decimal places in the
field. Zero means there are no decimal points. This field
must be zero if the field'type is byte.
field'date'type (J1). Zero if the field is not a date.
Otherwise, contains a constant that describes the format of
the date. These constants are described below.
reserved'space (J1). One word that is reserved for future use.
Date Format
The date format is mapped into the data type and byte-length of
the field. Here are the constants for each date format:
1 yymmdd
2 ddmmyy
3 mmddyy
4 yymm
5 calendar (MPE intrinsic format)
6 yyyymmdd
7 ddmmyyyy
8 mmddyyyy
9 phdate (PowerHouse format)
10 ask (ASK ManMan format)
SPL Layout of the Field Descriptor
Here is the layout in SPL notation of the Robelle field
descriptor. Note that we use exactly the same layout for
accessing Query field descriptors (we ignore the repeat,
decplaces, and date'type fields for Query self-describing files):
sdfield.srcinc:
integer array sd'field(0:sd'max'field'len); { 0 : 14 }
byte array sd'field'name(*) = sd'field;
integer array sd'field'type(*) = sd'field(8);
integer array sd'field'offset(*) = sd'field(9);
integer array sd'field'bytelen(*) = sd'field(10);
integer array sd'field'repeat(*) = sd'field(11);
integer array sd'field'decplaces(*)= sd'field(12);
integer array sd'field'date'type(*)= sd'field(13);
Support Routines
To make our life easier, we have a standard include file with
both variables and SPL/SPLash! subroutines that we use in many of
our self-describing procedures:
sdsubr.src:
<< Standard variables and subroutines needed to access fields in
a self-describing file. This file must be included after all
variable declarations in a procedure. >>
integer
file'userlabels
,file'foptions
,file'filecode
,current'labelnum
,sd'field'index
;
integer array current'label(0:sd'label'len);
subroutine file'error(local'filenum);
value local'filenum;
integer local'filenum;
begin
xfileinfo(local'filenum);
goto error'exit;
end'subr; <<file'error>>
subroutine read'label'error(local'filenum);
value local'filenum;
integer local'filenum;
begin
p "Unable to read label from self-describing file" err;
file'error(local'filenum);
end'subr; <<read'label'error>>
subroutine read'label(local'filenum,labelnum);
value local'filenum, labelnum;
integer local'filenum, labelnum;
begin
blank(current'label,sd'label'len);
freadlabel(local'filenum
,current'label
,sd'label'len
,labelnum
);
if < then
read'label'error(local'filenum)
else
if > then
if labelnum > file'userlabels then
begin
b'blank(outbuf,bl'outbuf);
move outbuf := "Attempting to read label ";
ascii(labelnum,10,outbuf'(26));
say'errx(outbuf,35,bl'outbuf);
read'label'error(local'filenum);
end'if;
end'subr; <<read'label>>
subroutine file'info(local'filenum);
value local'filenum;
integer local'filenum;
begin
fgetinfo(local'filenum<<filenum iv>>
, <<filename ba>>
,file'foptions <<foptions l >>
, <<aoptions l >>
, <<recsize i >>
, <<devtype i >>
, <<ldnum l >>
, <<hdaddr l >>
,file'filecode <<filecode i >>
, <<recptr d >>
, <<eof d >>
, <<flimit d >>
, <<logcount d >>
, <<physcount d >>
, <<blksize i >>
, <<extsize l >>
, <<numextents i >>
,file'userlabels <<userlabels i >>
);
if <> then
begin
p "Unable to fgetinfo on file" err;
file'error(local'filenum);
end'if;
end'subr; <<file'info>>
logical subroutine get'field(local'filenum,offset);
value local'filenum, offset;
integer local'filenum, offset;
begin
get'field := false;
if sd'field'index < sd'numfields then
begin
if (sd'field'index mod sd'fieldsperlabel) = 0 then
begin
current'labelnum := current'labelnum - 1;
read'label(local'filenum,current'labelnum);
end'if;
offset := (sd'field'index mod sd'fieldsperlabel) *
sd'entrylen;
move sd'field := current'label(offset),(sd'entrylen);
sd'field'index := sd'field'index + 1;
get'field := true;
end'if;
end'subr; <<get'field>>
File'Error
To make our life easier we will take a simple approach to file
system errors. If any MPE file system intrinsic returns an
error, we call the Robelle equivalent of the printfileinfo
intrinsic and then we exit the Formselfdesc procedure. Yes, we
use a goto in the file'error subroutine. This is a good example
of where a goto enhances readability and reliability.
File'Info
We have developed a standard set of subroutines for working with
self-describing files. The file'info subroutine initializes the
file'userlabels, file'foptions, and file'filecode variables
(declared as part of the sdsubr.src file).
Read'Label
It is important to understand the error checking in read'label.
MPE user labels may be allocated space, but they might not
actually be written. For example, after first creating a file
with user labels, none of the user labels have actually been
written to the file. If we get an end-of-file condition from
Freadlabel, we ignore the error unless a programming bug has
caused us to attempt to read a label that is greater than the
number of user labels in the file.
Get'Field
We will describe how the get'field subroutine works later in the
section Understanding Self-Describing Information.
Am I A Self-describing File
We determine if a file is self-describing in two ways:
1. If the file has a filecode of 1084 and it has one or more MPE
file labels.
2. The file is a KSAM file, has more than one label, we can read
the last label, and the last label starts with either the
string " A.00.00" or " B.00.00" (note the space at the
beginning).
Here is a procedure that returns True if the passed filenum is a
self-describing file:
$page "sd'file"
<< Return true if the passed file is self-describing.
>>
logical procedure sd'file(filenum);
value filenum;
integer filenum;
option check 3;
begin
$include sdheader.srcinc
$include sdfield.srcinc
$include sdsubr.src
$page "sd'file/mainline"
sd'file := false;
file'info(filenum);
if file'filecode = 1084 and file'userlabels <> 0 then
sd'file := true
else
if file'foptions.(2:3) = 1 or
file'foptions.(2:3) = 3 then
if file'userlabels > 1 then
begin <<ksam with extra user labels>>
read'label(filenum,file'userlabels-1);
move sd'header := current'label,(sd'label'len);
if sd'version = " A.00.00" or
sd'version = " B.00.00" then
sd'file := true;
end'else;
error'exit:
end'proc; <<sd'file>>
Creating A Self-Describing File
When we describe data structures we usually explain the input
routine first and then the creation/output routine second. For
self-describing files, it is easier to do it in the opposite
order. We will show the structure of a simple self-describing
file and then we will show the code that produced the
self-describing label information for the file.
HowMessy is a Robelle program that reports on database
efficiency. For years, this program has produced a report.
Unfortunately, reports must be read by humans. It would make
more sense for HowMessy to produce a self-describing MPE file
with the efficiency information from one or more databases. You
could then use a tool that understood self-describing files to
report and act on the information from the file produced by
HowMessy. We will show all of the routines in HowMessy's
self-describing module, but first we need to know the structure
of the self-describing file.
HowMessy's Loadfile
HowMessy creates a self-describing file called Loadfile. This
file has one record per database/dataset/search-field for one or
more databases. Here is a "form" listing of the Loadfile:
File: LOADFILE.GROUP.ACCT (SD Version B.00.00)
Entry: Offset
DATABASE X26 1
DATASET X16 27
DATASETNUM I1 43
DATASETTYPE X4 45
CAPACITY I2 49
ENTRIES I2 53
LOADFACTOR I2 57 << .2 >>
SECONDARIES I2 61 << .2 >>
MAXBLOCKS I2 65
HIGHWATER I2 69
PATHSORT X1 73
PATHPRIMARY X1 74
BLOCKFACTOR I1 75
SEARCHFIELD X16 77
MAXCHAIN I2 93
AVECHAIN I2 97 << .2 >>
STDDEVIATION I2 101 << .2 >>
EXPECTEDBLOCKS I2 105 << .2 >>
AVERAGEBLOCKS I2 109 << .2 >>
INEFFICIENTPTRS I2 113 << .2 >>
ELONGATION I2 117 << .2 >>
FUTUREFIELDS X136 121
Limit: 10000 EOF: 33 Entry Length: 256 Blocking: 35
Global Equates
To simplify programming, we use a global constant "equates" that
define specific attributes of Query and Robelle self-describing
files. When reading a self-describing file, we don't need most
of these constants, since the necessary numbers are provided in
the self-describing file header. Here are the equates that we
use when creating self-describing files:
sdequate.srcinc:
equate
sd'max'field'len = 15
,sd'label'len = 128
,sd'max'fieldsperlabel = 8
,sd'filler'labels = 10
;
equate
sd'date'yymmdd = 1
,sd'date'ddmmyy = 2
,sd'date'mmddyy = 3
,sd'date'yymm = 4
,sd'date'calendar = 5
,sd'date'yyyymmdd = 6
,sd'date'ddmmyyyy = 7
,sd'date'mmddyyyy = 8
,sd'date'phdate = 9
,sd'date'askdate = 10
;
Computing the Number of Labels
Before opening the Loadfile, HowMessy must determine how many
labels will be needed. The following routine is used by Robelle
products to compute the number of user labels for a
self-describing file. Note that we continue the Query standard
of reserving the first ten labels (numbered 0 to 9) for other
uses:
$page "sd'compute'labels"
<< Compute how many labels an SD file should have, based only
on the number of fields. Includes the mysterious filler
labels.
>>
integer procedure sd'compute'labels (numfields);
value numfields;
integer numfields;
option check 3;
begin
sd'compute'labels :=
(numfields-1+sd'max'fieldsperlabel) /
sd'max'fieldsperlabel
+ 1 <<for the header label>>
+ sd'filler'labels;
end'proc; <<sd'compute'labels>>
Opening the Loadfile
After computing the number of user labels, we can open a new MPE
file called Loadfile. We designed the HowMessy Loadfile to have
records 256 bytes long. To make our life easier, we have a few
global equates in the HowMessy self-describing module that we'll
use throughout the rest of the examples:
$page "global equates and defines for the selfdesc module"
equate
wl'loadfile = 128
,bl'loadfile = wl'loadfile * 2
,bl'item'name = 16
,max'field = 22 ! fields in Loadfile
;
Here is the actual code to create the Loadfile:
$page "sd'open"
<< Open the Loadfile and initialize the self-describing
information.
>>
logical procedure sd'open(outfile,loadfile'filenum);
integer loadfile'filenum; ! Note by reference -- returned
integer array outfile;
option check 3;
begin
$include localvar.srcinc
byte array
loadfile'filename(0:bl'local'filename)
;
move loadfile'filename := "loadfile ";
loadfile'filenum :=
fopen(loadfile'filename
, << foptions lv >>
,1 <<write>> << aoptions lv >>
,wl'loadfile << recsize iv >>
, << device ba >>
, << formmsg ba >>
,sd'compute'labels(max'field)
,35 << blockfactor iv >>
, << numbuffers iv >>
,10000d << filesize dv >>
, << numextents iv >>
, << initialloc iv >>
,1084 << filecode iv >>
);
if loadfile'filenum = 0 then
begin
error(outfile,10);
xfileinfo(loadfile'filenum);
end'if
else
sd'open := true;
end'proc; <<sd'open>>
Note that the filecode is 1084. For non-KSAM files, this is used
to indicate a self-describing file. When you do a :Listf of such
a file, MPE translates the "1084" filecode into "SD".
Writing the Self-Describing Labels
Having successfully opened a new self-describing file, it's time
to write the self-describing information to the user labels.
Remember that the last user label (N-1) contains the header
information and the field labels are written in backward order
(N-2, N-3, ...). Our routine to write the self-describing
information to the Loadfile writes the field information first
and then updates the header label as the last step:
$page "sd'write'labels"
<< Write out the labels of a self-describing file with the
Loadfile fields.
>>
logical procedure sd'write'labels(outfile,filenum);
value filenum;
integer filenum;
integer array outfile;
option check 3;
begin
$include localvar.srcinc
integer
field'index
,field'offset
,labelnum
;
$include sdheader.srcinc
$include sdfield.srcinc
integer array sd'label(0:sd'label'len);
$include sdsubr.src
$page "sd'write'labels/subroutines"
subroutine write'label(labelnum);
value labelnum;
integer labelnum;
begin
fwritelabel(filenum,sd'label,sd'label'len,labelnum);
if <> then
begin
error(outfile,12);
file'error(filenum);
end'if;
end'subr; <<write'label>>
subroutine init'header;
begin
zero'buf(sd'header,sd'label'len);
b'blank(sd'version,8);
move sd'version := " B.00.00";
sd'numfields := max'field;
field'index := 0;
sd'numlabels := sd'compute'labels(sd'numfields) -
sd'filler'labels;
sd'fieldsperlabel:= sd'max'fieldsperlabel;
sd'entrylen := sd'max'field'len;
end'subr; <<init'header>>
subroutine init'all'labels(curr'label,num'labels);
value curr'label, num'labels;
integer curr'label, num'labels;
begin
while curr'label > num'labels do
begin
write'label(curr'label);
curr'label := curr'label - 1;
end'while;
end'subr; <<init'all'labels>>
subroutine put'field(name,bytelen,decplaces,type);
value bytelen, type, decplaces;
integer bytelen, decplaces, type;
byte array name;
begin
zero'buf(sd'field,sd'max'field'len);
move sd'field'name := name,(bl'item'name);
sd'field'type := type;
sd'field'offset := field'offset;
sd'field'bytelen := bytelen;
sd'field'repeat := 1;
move sd'label(sd'entrylen*sd'field'index) := sd'field,
(sd'entrylen);
field'offset := field'offset + sd'field'bytelen;
sd'field'index := sd'field'index + 1;
if sd'field'index >= sd'fieldsperlabel then
begin
write'label(labelnum);
labelnum := labelnum - 1;
sd'field'index := 0;
zero'buf(sd'label,sd'label'len);
end'if;
b'blank(name,bl'item'name);
end'subr; <<put'field>>
$page "sd'write'labels/mainline"
sd'write'labels := false;
init'header;
file'info(filenum);
field'offset := 0;
sd'field'index := 0;
zero'buf(sd'label,sd'label'len);
init'all'labels(file'userlabels-1,sd'numlabels);
labelnum := file'userlabels - 2;
sd'reclength := bl'loadfile;
b'blank(inbuf,bl'inbuf);
move inbuf' := "DATABASE "; put'field(inbuf, 26,0,1);
move inbuf' := "DATASET "; put'field(inbuf, 16,0,1);
move inbuf' := "DATASETNUM "; put'field(inbuf, 2,0,3);
move inbuf' := "DATASETTYPE "; put'field(inbuf, 4,0,1);
move inbuf' := "CAPACITY "; put'field(inbuf, 4,0,3);
move inbuf' := "ENTRIES "; put'field(inbuf, 4,0,3);
move inbuf' := "LOADFACTOR "; put'field(inbuf, 4,2,3);
move inbuf' := "SECONDARIES "; put'field(inbuf, 4,2,3);
move inbuf' := "MAXBLOCKS "; put'field(inbuf, 4,0,3);
move inbuf' := "HIGHWATER "; put'field(inbuf, 4,0,3);
move inbuf' := "PATHSORT "; put'field(inbuf, 1,0,1);
move inbuf' := "PATHPRIMARY "; put'field(inbuf, 1,0,1);
move inbuf' := "BLOCKFACTOR "; put'field(inbuf, 2,0,3);
move inbuf' := "SEARCHFIELD "; put'field(inbuf, 16,0,1);
move inbuf' := "MAXCHAIN "; put'field(inbuf, 4,0,3);
move inbuf' := "AVECHAIN "; put'field(inbuf, 4,2,3);
move inbuf' := "STDDEVIATION "; put'field(inbuf, 4,2,3);
move inbuf' := "EXPECTEDBLOCKS "; put'field(inbuf, 4,2,3);
move inbuf' := "AVERAGEBLOCKS "; put'field(inbuf, 4,2,3);
move inbuf' := "INEFFICIENTPTRS "; put'field(inbuf, 4,2,3);
move inbuf' := "ELONGATION "; put'field(inbuf, 4,2,3);
move inbuf' := "FUTUREFIELDS "; put'field(inbuf,136,0,1);
if sd'field'index <> 0 then
write'label(labelnum);
move sd'label := sd'header,(sd'label'len);
write'label(file'userlabels - 1);
sd'write'labels := true;
error'exit:
end'proc; <<sd'write'labels>>
Init'Header
We start our procedure by initializing most of the fields in the
header label. We zero out the header label and then we fill in
the variables of the header label. This file has fields with an
implied decimal point, so we want to use the Robelle format of
self-describing files (version number " B.00.00"). The number of
fields is taken from our global equate. The number of
self-describing labels is our computed number less the ten
overhead labels. The number of fields in each label and the
length of each field description are taken from global equates
that match the values used by Query. We also initialize the
field'index variable which is used as an index into a single
label buffer (varies from 0 to sd'fieldsperlabel - 1).
File'Info
To make our code more general-purpose, we will not assume
anything about the HowMessy Loadfile format (this also makes it
easier to change later). Instead, we call Fgetinfo to obtain the
number of labels in our file, so that we know exactly where the
last label is. The file'info subroutine initializes the
file'userlabels variable (declared in the sdsubr.src file) with
the number of user labels in our file.
Init'All'Labels
To be on the safe side, we initialize all user labels in our file
with binary zeroes. Note that our write'label subroutine uses a
procedure global array called label'buf for writing. We
initialize this buffer to binary zeroes and then continually
write it out to all of the self-describing labels. We don't
touch the initial ten labels reserved for other use.
Put'Field
This subroutine handles all of the details of adding a new field
to our self-describing Loadfile. It initializes a new field
record, moves this field record to the appropriate place in
label'buf, and finally writes out labels as we overflow
sd'fieldsperlabel. Each of our fields has a name (we move the
name to inbuf and initialize inbuf to blanks after adding the
field, a byte length, the implied number of decimal places, and a
type (either byte or integer for our file). Note that the
put'field subroutine looks after computing the byte offset of
each field by incrementing a counter.
We initialize each field record with binary zeroes (just to be
safe). We then fill in each portion of field information. We
then move our field record to the label'buf at the correct
offset. This works well in SPL/SPLash!, but is more of a problem
in C or Pascal. In these languages, we would create a
record/structure that was an array of field records and index
into the structure using the current field index. As we filled
up the structure, we would write out a new user label to our
self-describing file.
Finishing Up
After adding all the fields, we have to see if there is one label
record that has not been written to the Loadfile. If so, we
write it out. Finally, the header record is written out. We do
this last, since some of the variables used by put'field were
ones from the header record.
Closing the Self-Describing File
Our next routine closes the self-describing file and handles any
errors from Fclose. Pretty straight-forward MPE programming:
$page "sd'close"
<< Close the loadfile and check for duplicate output files.
>>
logical procedure sd'close(outfile,loadfile'filenum);
value loadfile'filenum;
integer loadfile'filenum;
integer array outfile;
option check 3;
begin
$include localvar.srcinc
sd'close := false;
fclose(loadfile'filenum,2,0); ! Save temp
if <> then
begin
error(outfile,11);
xfileinfo(loadfile'filenum);
end'if
else
sd'close := true;
end'proc; <<sd'close>>
Providing a Shell
To make life easier in HowMessy, we provide one routine for the
main module to call. This routine purges any exiting temporary
Loadfile, creates our new Loadfile, writes out the self-
describing information, and saves the Loadfile. The controlling
HowMessy routine then reopens Loadfile with write-access (this
may seem inefficient, but HowMessy is written in both SPL/SPLash!
and HP Pascal, so it was easier to organize the code this way):
$page "sdcreate"
<< Create Loadfile with all the self-describing information. We
purge any existing file called Loadfile, create a temporary
one, and then fill in the labels.
>>
integer procedure sdcreate(outfile);
integer array outfile;
option check 3;
begin
$include localvar.srcinc
$include mpecmd.srcinc
integer
loadfile'filenum
;
logical subroutine purge'loadfile;
begin
purge'loadfile := false;
say'str "purge loadfile,temp";
say'add rtn;
if mpecmd'execute(outbuf,mpe'print'buffer) then
purge'loadfile := true
else
error(outfile,9);
end'subr; <<purge'loadfile>>
$page "sdcreate/mainline"
sdcreate := 0;
if purge'loadfile then
if sd'open(outfile,loadfile'filenum) then
if sd'write'labels(outfile,loadfile'filenum) then
if sd'close(outfile,loadfile'filenum) then
sdcreate := 1;
end'proc; <<sdcreate>>
Understanding Self-Describing Information
Our HowMessy example showed the form of the Loadfile using a
format similar to the one Query uses, but the input was an MPE
self-describing file instead of an IMAGE dataset. Here is our
example form again:
File: LOADFILE.GROUP.ACCT (SD Version B.00.00)
Entry: Offset
DATABASE X26 1
DATASET X16 27
DATASETNUM I1 43
DATASETTYPE X4 45
CAPACITY I2 49
ENTRIES I2 53
LOADFACTOR I2 57 << .2 >>
SECONDARIES I2 61 << .2 >>
MAXBLOCKS I2 65
HIGHWATER I2 69
PATHSORT X1 73
PATHPRIMARY X1 74
BLOCKFACTOR I1 75
SEARCHFIELD X16 77
MAXCHAIN I2 93
AVECHAIN I2 97 << .2 >>
STDDEVIATION I2 101 << .2 >>
EXPECTEDBLOCKS I2 105 << .2 >>
AVERAGEBLOCKS I2 109 << .2 >>
INEFFICIENTPTRS I2 113 << .2 >>
ELONGATION I2 117 << .2 >>
FUTUREFIELDS X136 121
Limit: 10000 EOF: 33 Entry Length: 256 Blocking: 35
Formselfdesc Procedure
We have developed a stand-alone procedure for producing this
output for a self-describing file. The following is the source
code that we use:
$page "formselfdesc"
<< If the passed filenum is a self-describing file, print a
description of the fields in the file on $stdlist.
>>
logical procedure formselfdesc(sd'filenum);
value sd'filenum;
integer sd'filenum;
option check 3;
begin
$include localvar.srcinc
$include sdequate.srcinc
integer
file'code
,file'userlabels
,file'foptions
,current'labelnum
,field'index
,file'recsize
,file'blkfac
;
double
file'eof
,file'limit
;
byte array
filename(0:bl'local'filename)
;
$include sdheader.srcinc
$include sdfield.srcinc
integer array current'label(0:sd'label'len);
subroutine file'error;
begin
xfileinfo(sd'filenum);
goto error'exit;
end'subr; <<file'error>>
subroutine read'label(labelnum);
value labelnum;
integer labelnum;
begin
freadlabel(sd'filenum,current'label,sd'label'len,labelnum);
if <> then
begin
p "Unable to read label from self-describing file" err;
file'error;
end'if;
end'subr; <<read'label>>
subroutine file'info(blksize);
value blksize;
integer blksize;
begin
b'blank(filename,bl'local'filename);
fgetinfo(sd'filenum <<filenum iv>>
,filename <<filename ba>>
,file'foptions <<foptions l >>
, <<aoptions l >>
,file'recsize <<recsize i >>
, <<devtype i >>
, <<ldnum l >>
, <<hdaddr l >>
,file'code <<filecode i >>
, <<recptr d >>
,file'eof <<eof d >>
,file'limit <<flimit d >>
, <<logcount d >>
, <<physcount d >>
,blksize <<blksize i >>
, <<extsize l >>
, <<numextents i >>
,file'userlabels <<userlabels i >>
);
if <> then
begin
p "Unable to fgetinfo on file" err;
file'error;
end'if;
if file'recsize <> 0 then
file'blkfac := blksize / file'recsize
else
file'blkfac := 1;
if file'recsize < 0 then
file'recsize := file'recsize
else
file'recsize := file'recsize * 2;
end'subr; <<file'info>>
logical subroutine file'is'sd;
begin
file'is'sd := false;
file'info(0);
if file'code = 1084 and file'userlabels <> 0 then
file'is'sd := true
else
if file'foptions.(2:3) = 1 or file'foptions.(2:3) = 3 then
if file'userlabels > 1 then
begin <<ksam with extra user labels>>
read'label(file'userlabels-1);
move sd'header := current'label,(sd'label'len);
if sd'version = " A.00.00" or
sd'version = " B.00.00" then
file'is'sd := true;
end'else;
end'subr; <<file'is'sd>>
logical subroutine get'field(offset);
value offset;
integer offset;
begin
get'field := false;
if field'index < sd'numfields then
begin
if (field'index mod sd'fieldsperlabel) = 0 then
begin
current'labelnum := current'labelnum - 1;
read'label(current'labelnum);
end'if;
offset := (field'index mod sd'fieldsperlabel) *
sd'entrylen;
move sd'field := current'label(offset),(sd'entrylen);
field'index := field'index + 1;
get'field := true;
end'if;
end'subr; <<get'field>>
subroutine print'outbuf(len);
value len;
integer len;
begin
len := bl'outbuf;
while len > 0 and outbuf'(len-1) = " " do
len := len - 1;
print(outbuf,-len,0);
end'subr; <<print'outbuf>>
subroutine print'header(len);
value len;
integer len;
begin
b'blank(outbuf,bl'outbuf);
move outbuf'(4) := "File: ";
move outbuf'(10) := filename,(bl'local'filename);
len := bl'outbuf;
while len > 0 and outbuf'(len-1) = " " do
len := len - 1;
len := len + 5;
len := len + move outbuf'(len) := "(SD Version";
len := len + move outbuf'(len) := sd'version,(8);
len := len + move outbuf'(len) := ")";
print'outbuf(0);
b'blank(outbuf,bl'outbuf);
move outbuf'(7) := "Entry:";
move outbuf'(34) := "Offset";
print(outbuf,-50,0);
end'subr; <<print'header>>
subroutine print'trailer(len);
value len;
integer len;
begin
b'blank(outbuf,bl'outbuf);
len := move outbuf' := " ";
len := len + move outbuf'(len) := "Limit: ";
len := len + dascii(file'limit,10,outbuf'(len));
len := len + move outbuf'(len) := " EOF: ";
len := len + dascii(file'eof,10,outbuf'(len));
len := len + move outbuf'(len) := " Entry Length: ";
len := len + ascii(file'recsize,10,outbuf'(len));
len := len + move outbuf'(len) := " Blocking: ";
len := len + ascii(file'blkfac,10,outbuf'(len));
print(outbuf,-len,0);
end'subr; <<print'trailer>>
subroutine format'field'type;
begin
if 0 <= sd'field'type <= 9 then
case sd'field'type of begin
<<0>> move outbuf'(31) := "?";
<<1>> move outbuf'(31) := "X";
<<2>> move outbuf'(31) := "?";
<<3>> move outbuf'(31) := "I";
<<4>> move outbuf'(31) := "R";
<<5>> move outbuf'(31) := "P";
<<6>> move outbuf'(31) := "J";
<<7>> move outbuf'(31) := "K";
<<8>> move outbuf'(31) := "Z";
<<9>> move outbuf'(31) := "E";
end'case
else
move outbuf'(31) := "?";
end'subr; <<format'field'type>>
logical subroutine field'is'sorted(sort'index);
value sort'index;
integer sort'index;
begin
field'is'sorted := false;
if sd'field'offset + 1 = sd'sort'keys(sort'index*3) and
sd'field'bytelen = sd'sort'keys(sort'index*3+1) then
field'is'sorted := true;
end'subr; <<field'is'sorted>>
subroutine format'sort'key(sort'index);
value sort'index;
integer sort'index;
begin
sort'index := 0;
while sort'index < sd'sort'num'keys do
begin
if field'is'sorted(sort'index) then
begin
move outbuf'(42) := "<<Sort ";
ascii(sort'index+1,10,outbuf'(50));
move outbuf'(52) := ">>";
end'if;
sort'index := sort'index + 1;
end'while;
end'subr; <<format'sort'key>>
subroutine format'date'type;
begin
if 1 <= sd'field'date'type <= 10 then
case sd'field'date'type of begin
<<0>> ;
<<1>> move outbuf'(56) := "<<YYMMDD>>";
<<2>> move outbuf'(56) := "<<DDMMYY>>";
<<3>> move outbuf'(56) := "<<MMDDYY>>";
<<4>> move outbuf'(56) := "<<YYMM>>";
<<5>> move outbuf'(56) := "<<CALENDAR>>";
<<6>> move outbuf'(56) := "<<YYYYMMDD>>";
<<7>> move outbuf'(56) := "<<DDMMYYYY>>";
<<8>> move outbuf'(56) := "<<MMDDYYYY>>";
<<9>> move outbuf'(56) := "<<PHDATE>>";
<<10>>move outbuf'(56) := "<<ASK>>";
end'case;
end'subr; <<format'date'type>>
subroutine format'decplaces;
begin
if sd'field'decplaces > 0 then
begin
move outbuf'(56) := "<< .";
ascii(sd'field'decplaces,10,outbuf'(60));
move outbuf'(63) := ">>";
end'if;
end'subr; <<format'decplaces>>
subroutine print'field'desc(field'repeat);
value field'repeat;
integer field'repeat;
begin
b'blank(outbuf,bl'outbuf);
move outbuf'(10) := sd'field'name,(16);
if sd'version = " B.00.00" then
begin
field'repeat := sd'field'repeat;
sd'field'bytelen := sd'field'bytelen / field'repeat;
end'if
else
field'repeat := 1;
if field'repeat <> 1 then
ascii(field'repeat,-10,outbuf'(30));
format'field'type;
if sd'field'type = 3 or <<integer>>
sd'field'type = 4 or <<real >>
sd'field'type = 7 then <<logical>>
ascii(sd'field'bytelen/2,10,outbuf'(32))
else
if sd'field'type = 5 then <<packed>>
ascii(sd'field'bytelen*2,10,outbuf'(32))
else
ascii(sd'field'bytelen,10,outbuf'(32));
ascii(sd'field'offset+1,-10,outbuf'(39));
if sd'version = " B.00.00" then
begin
format'sort'key(0);
format'date'type;
format'decplaces;
end'if;
print'outbuf(0);
end'subr; <<print'field'desc>>
$page "formselfdesc/mainline"
formselfdesc := false;
if sd'filenum <> 0 then
begin
if file'is'sd then
begin
read'label(file'userlabels-1);
move sd'header := current'label,(sd'label'len);
current'labelnum := file'userlabels - 1;
print'header(0);
field'index := 0;
while get'field(0) do
print'field'desc(0);
print'trailer(0);
formselfdesc := true;
end'if;
end'if;
error'exit:
end'proc; <<formselfdesc>>
A Different Logical Structure
Our HowMessy/Loadfile example had a number of separate
procedures. Our Formselfdesc procedure is self-contained, but we
will describe each subroutine in this procedure.
Formselfdesc Variables
We include our standard files for the global self-describing
equates, header layout, and field layout. We also have a number
of local variables that are used for indexing through the field
labels and other variables needed to enhance the output listing
(e.g., the number of records in the self-describing file).
File'Is'Sd
This is our standard sd'file procedure, rewritten to work as a
stand-alone SPL subroutine. File'Is'Sd looks after calling the
file'info subroutine which calls Fgetinfo. We initialize a
number of variables during the file'info call. Some of these are
used for obtaining self-describing information and some are used
to enhance the format of our form output (e.g., the filename and
the file limit).
Read'Label
The basic strategy we use in this routine is to read a specific
label into a buffer called current'label. We then move this
label to the appropriate self-describing header or field buffer.
Read'label is careful to check for file system errors and abort
if it finds any.
Get'Field
This subroutine is the key to understanding self-describing
files. When get'field is called the last label of the file has
been read into the sd'header record. The variable field'index is
initialized to zero is used as a counter of self-describing
files. Each call to get'field returns one field description in
the sd'field record.
When first called, current'labelnum contains the number of the
last label (minus one, since MPE numbers labels starting at
zero). We check to see if we need to read in a new label with
the statement:
if (field'index mod sd'fieldsperlabel) = 0 then
Note that we use sd'fieldsperlabel as the divisor. This is the
value from our sd'header record and not our equate that we use
when creating self-describing files. Get'Field assumes that the
current label record is in the buffer current'label.
Each user label contains one or more field descriptions (in most
cases there are eight per label). We compute an offset in the
label where the current field description is and then we move the
field description from current'label to our sd'field record.
Print'Field'Desc
This routine looks after printing out the description of one
field. We use the same routine whether we are dealing with Query
(" A.00.00") or Robelle (" B.00.00") self-describing files. We
do have to adjust the byte length for " B.00.00" self-describing
fields, so that the output looks similar to what Query would
produce for an IMAGE dataset. Note how the format'type routine
handles IEEE floating point for either type of self-describing
file.
For " B.00.00" self-describing files, we can produce extra
information. This is handled by the format'sort'key,
format'date'type, and format'decplaces routines (which are only
called for " B.00.00" self-describing files.
Format'Sort'Key
The sort information is stored in the sd'header record as an
offset, a length, and a type. There is no direct way for us to
tell that a field is sorted. Instead, we index through all of
the sort keys checking if the sort key matches the current field
definition (there might not be a match). We use the index into
the sort information as our key to print for the user.
Field'Is'Sorted
To make our code clearer, we encapsulate the code for checking if
a specific sort key matches the current field in a subroutine.
By giving this subroutine a descriptive name, we make the intent
of the format'sort'key routine clearer. Our field'is'sorted
routine checks that the offset (adjusted appropriately for
one-based and zero-based offsets) and the byte length of the
field and the sort key match. We decided to ignore the data type
(the sd'type and the sort'type have different values).
Summary
It's harder to understand self-describing files than it is to
create them. When creating self-describing files you often only
use a few of the self-describing features, but when understanding
them there are no features that you can leave out.
KSAM Self-Describing Files
Self-describing KSAM files are a little trickier to deal with.
The 1084 filecode used for self-describing MPE files doesn't work
well for KSAM. It is more difficult to create a new KSAM file,
since all of the key information must be passed to Fopen. Here
are a few hints for creating and understanding self-describing
KSAM files.
SD (1084) Filecode
You can create a KSAM file with a filecode of 1084, but the
resulting :listf,2 gives no hint that the file is a KSAM file.
Here's an example Build Command of a compatibility-mode KSAM file
with a filecode of 1084 and the resulting :listf,2.
:run ksamutil.pub.sys
>build file1;rec=-80,16,f,ascii;keyfile=file1key; &
key=i,6,2;code=1084
>exit
:listf file1@,2
FILENAME CODE ----------LOGICAL RECORD--------- ----SPACE----
SIZE TYP EOF LIMIT R/B SECTORS X MX
FILE1 SD 80B FA 0 1023 16 48 1 *
FILE1KEY KSAMK 128W FB 98 98 1 112 1 8
Notice how there is no way to identify file1 as being a KSAM
file. For this reason, we don't use the 1084 filecode on
self-describing KSAM files.
Creating KSAM Self-describing Files
We use three steps to create self-describing KSAM files:
1. Compute the number of labels (used in the KSAMUTIL or MPE/iX
build command). You could use our sd'compute'labels
subroutine or you can compute the number of labels as the
truncated value of:
labels = (#fields + 7) / 8 + 11
2. Build your KSAM/V file with KSAMUTIL and specify Labels=[the
number computed above]. For KSAM/XL, use the Build Command
with the userlabel keyword ;ULABEL=x (where x is the number
computed above).
3. Fopen the file as an old file with write access.
The most difficult part is computing the number of labels. For
example, if we have eight fields:
Labels = (8 + 7) / 8 + 11
= 12
MPE V/E:
:run ksamutil.pub.sys
>build file2;rec=-80,16,f,ascii;keyfile=file2k; &
key=i,6,2,,duplicate;labels=12
>exit
MPE/iX:
:build file2;rec=-80,16,f,ascii;key=(i,6,2,dup); &
ksamxl;ulabel=12
Understanding KSAM Self-Describing Files
Our sd'file routine returns true if a given file is
self-describing. If you examine the code in this routine
carefully, you'll see that for KSAM files we have the statements:
if file'foptions.(2:3) = 1 or file'foptions.(2:3) = 3 then
if file'userlabels > 1 then
Note that we check for more than one user label. Why don't we
check for more than zero user labels? All self-describing files
must have at least two labels (one for the header information and
one or more for the field information). When we first
implemented our sd'file routine we only checked for more than
zero user labels. What we found was that many users had
accidentally built KSAM files with one user label (which was
almost always empty). We have no idea why this seemed to be so
common, but by checking for at least two labels we eliminated a
lot of KSAM files that were not self-describing.
Future Self-Describing Formats
We were motivated to create the new Robelle format
self-describing files in order to provide a better interface
between our product Suprtool and ASKPlus from ARES of France.
Pierre Senant of ARES is the R&D Manager and the two of us worked
out the " B.00.00" self-describing format (actually we forced
most of the format on poor Pierre).
ARES have been doing significant R&D work on UNIX and a portable
version of ASKPlus. As an example of how far we can go with
self-describing information, here is an extract of Pierre's
design for a UNIX implementation of self-describing files.
SDASK Files
A C/ISAM file is composed of two files, a data file and an index
file. An SDASK file defines another file called a 'label file'
which contains the complete description of the data file. Like
MPE self-describing files, an SDASK file is composed of a header
portion and a description of each field. The data file and the
label file must be located in the same directory.
Header Format
Pierre's header contains a lot more information than our MPE
header label. Here are the parts of the header:
* Version number.
* File code (1085).
* Checksum (currently unused).
* Number of fields per record.
* Record length (in bytes).
* Number of records.
* Number of sort keys.
* Password.
* Total field area length (in the SDASK file).
* File type (flat, C/ISAM, KSAM, Unibol, ...).
* Data file name.
* Unibol area (for migration from IBM/36 to UNIX)
* Filter: logical expression defining a condition that must be
True for the entries taken into account. Originally developed
for Unibol files, but this feature can be used for any other
system.
Field Description
Each field description is variable length:
* Field type (U, X, I, J, K, P, R). Additional information for
Ascii fields are: Roman-8, PC-8, ANSI-8, Mac-Apple, EBCIDC,
and ISO7-1 ... ISO7-13. For Integer fields, there is
additional information for Intel versus HP. For Real fields,
there is additional information for IEEE versus Classic.
* Length (in bytes).
* Offset.
* Scale (number of decimal places).
* Repeat factor.
* Flags:
* Null value allowed. If this flag is True, each entry in
the data file is preceded by a bitmap field. Each bit
indicates whether the corresponding field value is Null or
not.
* Hidden field.
* Key.
* Duplicate key allowed.
* Field name length.
* Field name.
* Title length.
* Title.
* Edit mask length.
* Edit mask.
* Key file name length.
* Key file (reserved for future implementation on MS-Dos).
Sort Information
Sort descriptors are also variable length:
* Expression length.
* Sort expression (in ASKPlus syntax). For example,
cust-name
cust-zipcode cat cust-address
* Flag: ascending/descending.
Conclusion
Self-describing files are a great idea. As users, we almost
always create MPE and KSAM files with a fixed record structure in
mind. By default, this record structure is lost when we build a
file. With self-describing files, we can retain the structure of
our files.
A Final Example
The HP 3000 has a rich set of tools based on IMAGE. One reason
that so many good tools could be written for IMAGE was the DBINFO
intrinsic. This intrinsic let any program discover the structure
of an IMAGE database. Self-describing files provide the same
flexibility for MPE and KSAM files.
In this example, we show how two tools can be combined by using
self-describing files. Our HowMessy program reports on database
efficiency. While doing so it creates a self-describing file
with the statistics for a database. Once you have this file,
it's possible to use Suprtool to check for certain boundary
cases. For example,
:run howmessy.pub.robelle {create "loadfile"}
Enter database: test.suprtest
HowMessy creates the self-describing file called Loadfile (with
the structure that we've shown previously). We now use Suprtool
to create a file that has all detail datasets that are more than
85% full that also have a capacity greater than one:
:run suprtool.pub.robelle
>input loadfile
>if datasettype = "D" and & {detail dataset}
capacity > 1 and &
loadfactor > 85.00 {more than 85% full}
>output loaddetl,link {create SD file}
>exit
At Robelle, we would use our Xpress electronic mail system to
mail the loaddetl file to the system manager. Another
alternative would be to extract the database and dataset names
and use them to create a batch job to automatically increase the
capacity of detail datasets more than 85% full. The
possibilities are endless, but only because HowMessy could
provide information to Suprtool via the self-describing file.
Software Tools
Few software tools are capable of creating or understanding
self-describing files. This is a shame, since self-describing
files are a powerful data structure. One reason that so few
tools handle self-describing files is that documentation on
self-describing files has been non-existent. I hope that by
publishing this description and the programming examples in this
paper that more vendors and users start creating and accepting
self-describing files.