WWW Data: a World Wide Web Data Base


Detailed Description

WWW Data Relational Data Model

This section provides some basic information on the relational data model used by WWW Data. A good understanding of this data model is necessary to properly operate the system.

The Data Base Analogy

The above picture represents the classical data base analogy. A data base, like the cabinet in the picture, is a collection of files, each file is a collection of records and each record is a collection of data elements.

The following definitions have been partially taken from the Al Stevens' book on CDATA, referenced in the overview section.

Data Element Types

WWWW Data supports the following data element types:

NOTE: Data Representation

These different types are all handled by WWW Data as text strings (i.e. numbers are stored in their ASCII format). This feature makes WWW Data free from any data representation problem.

NOTE: Reference Type

The reference type may contain any HTML text, e.g.

With this type it is possible to insert in a data file references to external entities like images, audios, other HTML pages and so on. This features gives WWW Data multimedia capabilities. On the other hand checking that these references are correct (i.e. that they point to something valid) is a responsibility of the user. No data integrity check is performed by WWW Data on this type.

Data Dictionary

A data base Data Dictionary is a table describing all the data elements of that particular data base; for each data element the following information is contained:

The Data Dictionary is the heart of the data base, all the information needed to handle the data elements is stored in it.

Keys (primary, secondary and foreign keys)

A data file PRIMARY KEY is a set of one or more data elements which uniquely identifies a record.

First Name Family Name Age
John Smith 30
Ann Smith 30
Ann Jordan 30

A simple file

For example, in the above file, only the combination FIRST_NAME, FAMILY_NAME is a primary key. Any other key, i.e. combination of data elements, would not be able to uniquely identify the different records.

A data file SECONDARY KEY is a combination of one or more data elements used only to introduce a order relationship across the records of a file. In the above example, the FAMILY_NAME and the AGE data elements may be used as secondary keys to order the records alphabetically or based on the age.

A data file FOREIGN KEY is a combination of one or more data elements of the file which are the primary key of another one.

A small example

Book Id Book Info
1 The Bible, ...
2 War and Peace, ...
... ...
1000 Good morning Charlie Brown, ...

The Library Books File

Member Id Member Info
1 Albert Alans, ...
2 Bob Bertrand, ...
... ...
2000 Walt Whitehouse, ...

The Library Members File

Book Id Member Id Loan Date
1 2000 28/03/1996
2 1 01/04/1996
... ... ...
1000 2000 01/02/1996

The Library Loans File

In this small data base there are three files: BOOKS, MEMBERS and LOANS. The primary key of BOOKS is BOOK_ID, the primary key of MEMBERS is MEMBER_ID. In the LOANS file the primary key is given by the combination BOOK_ID, MEMBER_ID (one member of the library may have more than one book on loan). The LOANS file correlates (via the LOANS relationship) the books which are on loan with the members who borrowed them. This relationship is expressed by the fact the BOOK_ID and MEMBER_ID, the two data elements belonging to the LOANS primary key, are FOREIGN keys (i.e. they are respectively the primary key of the BOOKS file and of the MEMBERS file).

Integrity Rules

WWW Data respects the following two integrity rules (the definitions here below have been derived from C. J. Date, "Relational Database - Selected Writings", Addison-Wesley 1986, ISBN 0-201-14196-5):

The first rule underlines the importance of the primary key. In WWW DATA the primary key is the only way to address/access a record; without the primary key a record is lost.

The second rule derives from the fact that a foreign key is basically a relationship between two data files. Because this relationship must correlate existing records it is not possible for the foreign key to accept values which do not exist (i.e. do not correspond to some primary key in the referred file). Consider the example presented above. It is not possible to enter a record in the LOANS file where the BOOK_ID does not correspond to a particular book in the BOOKS file (i.e. it is not possible to lend a non existing book). In the same way it is not possible to enter a record in the LOANS file where the MEMBER_ID does not correspond to a particular member in the MEMBERS file (i.e. it is not possible to lend a book to a non existing member).

WWW Data endorses the two integrity rules by adopting the following behaviour:

Schema

The SCHEMA is a formal description of the data base. Although various formalisms/notations have been adopted by the different Data Base Systems existing on the market, all these notations must allow to specify at least:

The notation (language) used to specify the schema is usually called DDL (Data Definition Language).

WWW Data DDL (Data Definition Language)

Hereafter the Schema of the Library data base example is presented. The various parts of this schema will be described in the following sections. For now it is enough to note that:

Moreover the language is case sensitive (e.g. #dictionary is a keyword and #DICTIONARY is not).

; Schema for the Library
Example

#schema LIBRARY

; Data Elements Dictionary

#dictionary
    BOOK_ID,            I, 5
    BOOK_INFO,          A, 30
    MEMBER_PIC,         R, 100
    MEMBER_ID,          I, 5
    MEMBER_INFO,        A, 30
    LOAN_DATE,          A, 10
#end dictionary


#file MEMBERS
    MEMBER_PIC,		B
    MEMBER_ID,		B
    MEMBER_INFO,	B
#end file

#file BOOKS
    BOOK_ID,		C
    BOOK_INFO,		B
#end file

#file LOANS
    BOOK_ID,		C
    MEMBER_ID,		B
    LOAN_DATE,		B
#end file


; Index Specifications

#key BOOKS          BOOK_ID
#key MEMBERS        MEMBER_ID
#key LOANS          BOOK_ID, MEMBER_ID


; Users Specifications

#users
    host1.library.edu,        A
    host2.library.edu,        A
    OTHERS,                   R
#end users
#lifetime                     20


; Max number of records listed
; in the same HTML table

#max-records			3


; Allow the dump of raw data from the data base

;#allow-raw-data-dump		|


; Specify the separator to be used between label(s) and
; and input field(s)

;#label-separator               <BR>


; Web Master, Path Name, CGI-URL, Help, Background, Header and Footer

#web-master john@webserver.library.edu
#path /home/webserver/john/wwwdata
#cgi-url http://webserver.library.edu/cgi-bin/wdcgi
#help /wwwdata/html/wwwgtw.htm
#background BACKGROUND=\"http://webserver.library.edu/images/backgr.gif\"
#header myheader.htm
#footer myfooter.htm

#end schema LIBRARY

Beginning and End

Every WWW Data schema must begin with the statement


#schema <SCHEMA_NAME>

The name <SCHEMA_NAME> will be used by WWW Data in the title of all the screens generated by the WWW Data CGI.

In the same way the WWW Data schema must end with the statement


#end schema

Data Elements Dictionary

The Data Elements Dictionary part of the schema starts with the statement


#dictionary

and ends with the statement


#end dictionary

In the data dictionary there is a line per each data element containing the following information.

Example


#dictionary
    BOOK_ID,            I, 5
    BOOK_INFO,          A, 30
    MEMBER_PIC,         R, 100
    MEMBER_ID,          I, 5
    MEMBER_INFO,        A, 30
    LOAN_DATE,          A, 10
#end dictionary

NOTE: WWW Data uses for its own operations a set of data elements, whose names start with the prefix "_WD". User defined data elements cannot start with the same prefix.

Files Specification

The Files Specification section of the schema defines the different data files. Each file is described by:

its name - which will be used by all WWW Data applications;

and a list containing for each data element belonging to the data file:

Although the user can set the "break line option" of the last data field in a file to either B or C, it'll be forced automatically to B by the system.

Example


#file BOOKS
    BOOK_ID,		C
    BOOK_INFO,		B
#end file

NOTE: WWW Data uses for its own operations a file, whose name is "_WDSERVICE". User defined data files cannot have the same name.

Keys Specification

The Keys Specification section defines the key(s) of a data files. Each file must have at least one key (the primary key). The key, in turn, is composed of one or more data elements.

This is the syntax of a key statement.


#key <FILE_NAME>   <DATA_ELEMENT_LIST>

The <DATA_ELEMENT_LIST> is a list of data elements names, separated by comma. When the same file has more than one key, then the first one is the primary key and the others are secondary keys (i.e. they are used only to introduce an order relationship across the records of the data file).

Example


#key ASSIGNMENTS   CONSULTANT_NO, PROJECT_NO
#key ASSIGNMENTS   CONSULTANT_NO
#key ASSIGNMENTS   PROJECT_NO

User Specification

In these sections the end users of the data base are defined. The end users definition starts with the statement


#users

and ends with the statement


#end users

Each end user is identified by a (partial) internet address. Adopting a partial address allows to identify entire internet domains as single users. The end user identifier (i.e. his partial internet address) is matched by the WWW Data CGI against the environment variables REMOTE_HOST and REMOTE_ADDR. With these mechanism only the end users (i.e. the internet locations) specified in the schema are authorised to access the data.

Together with the end user identifier it is necessary to specify the associated privileges. This is accomplished by associating to each user identifier a character, with the following meaning.

The keyword OTHERS can be used to identify all the users on the network who have not already been defined in the previous lines (i.e. all other users on the Web). This is very useful when some access right needs to be granted to everybody (e.g. a library may want to allow everybody on internet to consult its books catalog). If there is no line with the keyword OTHERS, the WWW Data assumes the following


    OTHERS,                   N

I.E. No access whatsoever is granted to non specified users.

Example


#users
    host1.library.edu,        A
    host2.library.edu,        A
    OTHERS,                   R
#end users

WWW Data keeps a record of the user preferences during a session. If WWW Data had to keep this record for all the possible users, then the data base would grow in an uncontrolled way. To avoid this problem WWW Data deletes the end user record when the user closes regularly the session. If the user quits suddenly the session without closing it properly, his record is first kept for a given number of transactions (i.e. queries) and then deleted automatically.

The parameter shown here below, "LIFETIME", specifies for how many transactions (queries) the records of end users who have not closed their session have to be kept in the data base. The bigger the number, the longer this information is kept in the data base.

Example


#lifetime                     20

Operational Information

The last part of the schema contains some information needed by WWW Data to properly operate.

Max Records

This number specifies how many records can be listed at the same time inside a single HTML table. If the end users enter search conditions selecting a big number of records, the resulting output is partitioned in screens of MAX_RECORDS records. If this entry is not specified WWW Data assumes 20 as default value.

Example


#max-records                    3

Raw Data Dump

This command enables a WWW Data CGI function, similar to the List Records, which generates RAW data in the following format:

field11<SEP>field12<SEP>...field1N
field21<SEP>field22<SEP>...field2N
...
fieldM1<SEP>fieldM2<SEP>...fieldMN

In the above line <SEP> is the separator string (or character) as specified in the command itself.

Example


#allow-raw-data-dump		|

Label Separator

This command specifies the separator string that will be put by WWW Data CGI in between a label and its input field.

Example


#label-separator		<BR>

Using <BR> as label separator will force all the labels to appear on top of their fields. If no label separator is specifed, the default will be " ".

Web Master

WWW Data is designed to work in a reliable way in different situations. However there may be special conditions (like a power loss on the server machine or an enormous number of end users accessing the data base at the same time) which may lead to problems like data indexes corruption, termination of operation and so on. In this case an error message is presented to the end users. This error message contains an automatic link to the data base administrator so that end users can notify him (via electronic mail) the problems they had.

The data base administrator is known to WWW Data as Web Master. Therefore in the schema there must be a line like the following.


#web-master john@webserver.library.edu

The Web Master must be a proper and complete e-mail address, with the format "user@host.domain".

Path Name

The Path Name specifies the location in the server machine (i.e. the directory) where the data files are located.

Example


#path /home/webserver/john/wwwdata

Please notice that this path name is a real directory in the server machine. If this directory does not belong to the directory hierarchy made available by the WWW server to the outside world, it can only be accessed by the WWW Data CGI, i.e. no unauthorised access or manipulation can be performed.

CGI URL

The WWW Data CGI calls itself (from the automatically generated forms) on a number of occasion. To do so the CGI must know its own Uniform Resource Locator (URL).

Example


#cgi-url http://webserver.library.edu/cgi-bin/wdcgi

Help File

The Help File is a variable allowing to tell the WWW Data CGI which is the starting HTML file that has to be used to display the on line help. It is recommended to use an intermediate HTML page (a sort of gateway) so that in the other pages it is possible to use relative addresses. The variable is customisable so that users can decide to show their own help file(s) and not the standard ones provided with WWW Data.

Please note that the help file is a real file name in the server machine. If this file name does not belong to the directory hierarchy made available by the WWW server to the outside world, it can only be accessed by the WWW Data CGI, i.e. no unauthorised access or manipulation can be performed.

Example


#help /wwwdata/html/wwwgtw.htm

Body Background

This entry allows to specify which body background image or color has to be used by WWW Data when generating HTML pages. Please note that quotes have to be prefixed with a "back slash" (\).

Example


#background BACKGROUND=\"http://webserver.library.edu/images/backgr.gif\"

Header and Footer

Each form (HTML page) automatically generated by the WWW Data CGI can contain a standard header and a standard footer. In these files it is possible to insert special (advertising) information about the people/organisation who want to publish their data using WWW Data. If they are not found, no action will be performed by WWW Data CGI.

Please note that the header and footer are real file names in the server machine. If this file names do not belong to the directory hierarchy made available by the WWW server to the outside world, they can only be accessed by the WWW Data CGI, i.e. no unauthorised access or manipulation can be performed.

Example


#header myheader.htm
#footer myfooter.htm

WWW Data CGI Man Machine Interface

General

The WWW Data CGI Man Machine Interface is built up upon a set of simple commands. Each command presents its results to the end users as soon as possible so that they can get an immediate feedback about the system behaviour.

These simple commands, like the operators of a relational Data Manipulation Language (DML), can be combined together to obtain more complex queries.

The WWW Data CGI behaviour depends on its status, characterised by the following parameters:

Active File

While WWW Data is able to handle data bases with more than one data file, its CGI can show and manipulate only one single data file at a time in the same browser's window. It is anyhow possible to open an other window with the browser and from there access another data file. The Active File is the data file visible from the CGI at a given moment in time in a given browser's window. End users can select as Active File every file in the data base.

All the WWW Data CGI commands refer to the currently selected Active File.

Active Key

Each data file may have one or more keys. The Active Key is the currently selected key. The users can change the Active Key at any time. The Active Key establishes an order relationship across the records of the data file (i.e. the physical order in which records have been inserted or modified is not relevant).

Selected Navigation Mode

The WWW Data CGI has two different navigation modes:

Entering search conditions (values) enables the users to select only the records in the data file in which they are interested.

TAKE NOTICE: when end users give the command "Enter Search Values", the current data elements' contents are used as search values. E.G. If the current value of the data element BOOK_INFO in the previous example is "Bi", then, when the command "Enter Search Values" is entered, only the records having the substring "Bi" in the data element BOOK_INFO are shown to the users.

If more search substrings are inserted in a data element (by separating them with a blank), then all the records that have at least one of these substrings in the data element are shown to the users (i.e. conditions are combined with logical OR).

If more search substrings are inserted in different data elements, then only the records satisfying the search conditions for all such data elements are shown to the users (i.e. conditions are combined with logical AND).

Menu Commands

Following is a list of the commands that can be started from the menu. To make WWW Data CGI simulate the menus of a normal Windows (or X-11) application, "HTML Select Elements" enhanced by "Javascript Callbacks" have been used. In browser capable of executing Javascript it is enough to select a menu item to start the corresponding action. In browsers not capable of executing Javascript or with Javascript disabled it is necessary to first select the menu item and then push the button "Exec". By pushing the button "Reset", the menu will be reset to its original status.

Records Commands

The records commands will be used frequently. This is why it is possible to call them via menu selection or via direct buttons. However it has to be pointed out that, at least up to the time of this writing (i.e. April 96), only Netscape is able to handle properly such buttons. For other browsers it is recommended to use the normal interaction with menu selections.

Using the WWW Data Tools Set

WDCOMP

Type: Application

Description: This tool takes the data base schema written by the data base developer(s) and translates it into a C module. The C module has in turn to be compiled and linked to the other objects and libraries distributed with WWW Data.

Command line: wdcomp <SCHEMA_FILE> <OUTPUT_NAME>

Example: wdcomp library.sch library

Output(s):

WDSIZE

Type: Object File

Description: This tool computes the amount of disk space required by the data base. Of course the user needs to enter the maximum number of records expected for each file. The tool is generated by linking the distributed object with the module containing the data base schema and the data base engine library.

Command line: wdsize

Output(s): The expected size of the database is printed on the screen.

WDINIT

Type: Object File

Description: This tool creates / (re)initialises the physical files used by the relational data base engine (the data files have names like *.DAT, the index files *.Xnn). The tool is generated by linking the distributed object with the module containing the data base schema and the data base engine library.

Command line: wdinit

Output(s): The required *.DAT data files and their related *.Xnn index files.

NOTE: in UNIX systems the files generated by WDINIT have to be readable and writable from the user and group of WWW Data CGI application. An easy way to obtain this result is to use the following command:

chmod ugo+x *.DAT *.X*

WDINDEX

Type: Object File

Description: This tool restores index files lost because of crashes, power faults and so on. The tool scans the available data files (whatever is left of them) and rebuilds the index files. The tool is generated by linking the distributed object with the module containing the data base schema and the data base engine library.

Command line: wdindex

Output(s): The restored *.Xnn index files.

WDUNLOAD

Type: Object File

Description: This tool allows to "unload" a data base file into a physical text file. Different formats are supported (e.g. EXCEL, 1-2-3, and so on...). The tool is generated by linking the distributed object with the module containing the data base schema and the data base engine library.

Command line: wdunload [-s <SEPARATORS>] <DATA_FILE> <OUTPUT_FILE>

Example: wdunload -s ",\"" books books.txt

Output(s): A text file with a record per line. The data elements are separated by the first separator if any, or by a <TAB>. When a second separator is present, this is used to "surround" alphanumeric fields (labels in 1-2-3 terminology).

WDLOAD

Type: Object File

Description: This tool makes it possible to "populate" a data base file from a physical text file. Different formats are supported (e.g. EXCEL, 1-2-3, and so on...). The tool is generated by linking the distributed object with the module containing the data base schema and the data base engine library.

Command line: wdload [-s <SEPARATOR>] <DATA_FILE> <OUTPUT_FILE>

Example: wdload -s "|" books books.txt

Output(s): A data file populated with the records present in the text file. If no separator is given then the data elements lengths are used to identify the different fields in a text line.

IDENTIFY

Type: Application

Description: This tool tells the user a special unique identifier of the server machine. This information will be required to register WWW Data.

Command line: identify

Output(s): The machine (computer) Unique Identifier is printed on the screen.

REGISTR

Type: Application

Description: This tool allows the user to promote WWW Data from an unregistered version to a registered version.

Command line: registr

Output(s): The file (application or object file) specified to the tool is updated with the registration information..

WWW Data Limits

Hereafter the limits of WWW Data's data base engine are shown.

Max files in a data base 11
Max data elements in a file 100
Max keys/indexes in a file 5
Max key length (in bytes) 255
Max file size (in bytes) 2^31 = 2 Gigabytes

NOTE: The above figures apply only to WWW Data's data base engine and not to the whole client/server system, which is composed by the WWW Data CGI application, the WWW Server, the communication link and the WWW Client (i.e. the Browser).

The RAM memory available on the client machine may impose a limit to the maximum number of rows that an HTML table can contain. This problem can be avoided by selecting a proper value for the maximun number of records that can be listed at the same time inside a single HTML table.