A brief history of Gene
1. Gene File Format
Gene's files are stored in a flat-file text format, for ease of network
transmission and use by other applications. You can use Gene to read
any text file on your computer, whether or not that file was created by
Gene, however Gene will complain if the file it reads does not conform
to the format Gene expects. This section documents that format.
As with any Macintosh file, Gene's database files have two parts: the
resource fork and the data fork. The resource fork contains pictures as
well as descriptions of any extra nonstandard card types used by the
database. The resource fork format is described later, when we disuss
card type definitions. The data fork is text,
organized as a sequence of cards, separated by blank lines. Some of the
information about pictures is also stored here as a special kind of
card. Each card is stored as a sequence of text lines, one for each
field and a variable number for the card's unstructured text notes. The
first line of a card gives the type and name of the card, and succeeding
lines fill in field values and supply the card's text. An empty line in
the database file signals the end of the card and a start of a new card.
As written by Gene, cards of the same type will be stored
consecutively in the file, in alphabetical order by the cards' names.
However, this order is not important to Gene when it reads the file. We
describe below the details of the card format, and provide a short
example of database file text.
As discussed above, each card is stored as a sequence of lines in the database file.
Blank lines are used to separate cards from each other.
Gene's format requires all the different lines describing a card
to include a colon. In the first line, the colon separates the card's type from its name. In the field lines, it separates the field name from its value.
And in the text lines, the colon appears at the start of the line.
1.2. Format for Card Type and Name
The first line of a card supplies the card's type and name. The card
type comes first, followed by a colon. If the card has a name, the card
name follows the colon. No two cards of the same type can have the same
name. If the card type does not include a name field, or if the card
can have a name but is unnamed, the first line of the card ends
immediately after the colon.
An unnamed card should always include a link to some other card;
otherwise there would be no way to open the card. However Gene does not
check to make sure that the cards in its files have links or names.
The card format includes one line for each nonempty field of the card.
Each such line consists of the field's name and a colon, followed by the
field's value. Field values are written the same way they would be
displayed when the card is viewed, and their format is described in the
Gene User Guide.
A link field may refer to a card appearing later in the database.
If a link refers to a card that does not appear in the database,
a new card will be created with the given name.
Gene represents pictures as a special type of card. Pictures are
distinguished from other cards in Gene by having fields of type "PictID"
and "PictLink".
- A "PictID" is just a number giving the resource IDs of
the PICT or alis resources
defining the picture itself. There should be exactly one resource
of either type in the resource fork of the file.
- A "PictLink" is a special type of field that
represents the buttons of the picture. It is stored the database by
multiple lines, one per button, each line starting with the field name
and a colon. The format of the data in each line is six fields,
separated by colons: four numbers for the coordinates of the top right
and bottom left button corners, the type of card the button forms a link
to, and the name of the linked card.
1.4. Format for Card Text
The text area in a Gene card is by definition unformatted. However
there is some minimal amount of formatting needed within the database
file, so that Gene can distinguish between the text of one card and data
from subsequent cards. In particular, we allow the text area to contain
blank lines, and these must not be confused in the database file with
the blank lines used to separate cards.
For this reason, every line of text in the database file begins with
a colon. Text is thus distinguished from the fields in that the lines
for fields begin with the field name (field names are not allowed to
begin with a colon).
When Gene writes a database file it will place the text after the
fields. However it can read files in which text and fields are
mixed. The lines of text will occur in the card in the same order
in which they appear in the database file.
Gene represents pictures as a special type of card, together with some
extra information stored in the resource fork of
the database file. Pictures are distinguished from other cards in Gene
by having two special fields, of type PictID and
PictLink. The PictID gives the resource IDs
of the PICT or alis resources
defining the picture itself, and the PictLink stores the buttons linking
the picture to other cards. Because the fields of a picture can only be
manipulated indirectly with the Edit Picture dialog, a picture card
should have at most one more field, its name.
1.6. Example of Database Format
The following lines are taken from a database file and show three cards.
The first two (a person and a document) have names, while the third (a
marriage) is of an unnamed type. The first card has five lines of text,
while the others have empty text panes. In addition, the person and
marriage cards contain links to two other person cards, for Hannah
Brooks and Thomas Fox (1).
Document: Descendants of Thomas Fox
Author: Henry Fox
:Full title:
:Descendants of Thomas Fox, circa 1620, of Concord, Massachusetts
:
:Lots of information on the Fox family.
:The NYPL copy is in pretty bad condition, and some pages are missing.
Person: Thomas Fox (2)
Birthday: 26 February 1650
Mother: Hannah Brooks
Father: Thomas Fox (1)
Sex: male
Marriage:
Wife: Hannah Brooks
Husband: Thomas Fox (1)
Date: 13 December 1647
2. Resources for Defining Card Types
All the cards used by Gene were defined by including certain
resources (a form of structured information used in all Macintosh
files) as part of the Gene application. By including similar resources
in a Gene database file, you can define additional card types that can
be used in that file. For instance, in a database of royalty and
nobility, it might make sense to have cards describing the titles and
heraldry of each person. Here we define the format of those resources;
Gene also allows some other resources,
described later, to appear in database files.
2.1. Creating and Saving Files with Resource Definitions
Gene does not provide an interface for editing file resources; instead
you must use a general purpose resource editor such as ResEdit
(available from Apple). Always make a backup copy of your file before using ResEdit.
As a convenience for ResEdit users, Gene includes a TMPL
resource in its database files, which ResEdit uses to describe the
format of Gene's card definition resources.
If Gene reads a database file that contains definitions of new card
types, it will remember those definitions and write them out again when
it saves the file. If a Gene database includes resources that are not
meaningful to Gene, those resources may be lost when the file is saved.
When you use the Merge command to combine two Gene databases, and the
second file contains definitions of new card types, something mysterious
happens that is likely to produce a bogus file. It should probably be
ok if the first file is the only one with new card type definitions.
The resources used to define card types are the following;
the most important of these for new card definitions are "Card", "Link", and "Lnk#".
- The "Card" resource contains the actual card definitions.
- The "Enum" resource lists the possible values of
enumerated fields.
- The "Link" resource gives a mechanism for
construction of formatted strings, used in link panes, verbose trees,
and GEDCOM output.
- The "Lnk#" resource selects which Link
resource to apply in each situation.
- The "Redr" resource "redirects" card links
through a field on the card; e.g. adoptions are
redirected to make adopted children show up in trees.
"Card" Resources
The "Card" resource contains the actual definition of each new card type.
All other resources are less important (although some such as
Link resources are necessary for certain types of cards).
For each new card type, there should be a "Card" resource,
having the card type name as the resource name. The resource ID is unimportant.
Card resources have four values that are part of the overall definition,
and a sequence of values defining each field of the card. The four
values are:
- Named Field. This is the number of the name field (counting from zero,
i.e. the third field would be 2). For unnamed cards it is -1.
- Sort Field. This gives a date field for positioning the card in other cards'
link panes. The field number is given with the same zero-based convention
used for Named Field, or -1 if there is no date.
- Link String. This is the ID of the card's Lnk# resource.
For backward compatibility with older versions of Gene, if there is no Lnk# resource
Gene will look for a Link resource with the same ID.
- Name Sort. This describes how card names should be sorted
in the Names window. The possible choices are "Alphabetical",
"Case Sensitive", "Person", or "Place". The last two invoke code
in Gene which attempts to find surnames of people, and which
breaks place names into components separated by commas.
After the four general values in a card definition, the Card resource
includes a sequence of two values for each field: the field name (any
string not including a colon) and a string describing the field type.
The possible field types are:
- "Name X" where X is the name of the card type being
defined. There can be at most one name, which should also be specified by Named Field.
- "Date". If the card has date fields, one should be specified by Sort Field.
- "Link X" where X is a card type. This creates a link
to cards of type X, which must have a name field. Cards with link fields
should have a
Lnk# resource
describing how they should be shown in the links pane of the target of the link.
- "Enum X" where X is an enumerated type defined in an
Enum resource.
- "String". A field of this type is treated as unformatted text.
- "Number". A field of this type contains a sequence of decimal digits.
- "PictID". This is just a number representing
a picture's resource ID.
- "PictLink". This is used to represent the buttons in a picture.
We now go through an example of a card definition for the Person card in Gene.
The four general values of the "Person" card resource are:
Named Field: 0
Sort Field: 1
Link String: 128
Name Sort: Person
The six fields of a person have as names and field descriptors:
Name Name Person
Birthday Date
Birthplace Link Place
Mother Link Person
Father Link Person
Sex Enum Sex
"Enum" Resources
Each enumerated type (such as the sex of a person) is specified using an
"Enum" resource. The ID is unimportant, but the name of the resource is
used for the name of the corresponding field type (e.g. the Enum
resource named "Sex" corresponds to fields with type "Enum Sex").
The values of an Enum resource are simply a sequence of strings
specifying the values the corresponding fields can have.
Any field of the given type must either have one of the given strings as
its value, or may be blank.
For the "Sex" Enum resource in Gene, these strings are "male" and "female".
The "Link" resource is used by Gene to turn information from cards into
text strings, in several situations: when generating the information
about the card in the link pane of another card, when drawing verbose trees,
and when exporting Gene data to a GEDCOM file.
A given Link resource will handle one of these situations, for one
specific type of card; Lnk# resources are used
to tell which Link resource to use.
Each Link resource consists of a sequence of operations, and should
be thought of as a small computer program consisting of a sequence of
simple operations that translate the information on a card's
fields into a text string. When a Link resource is used, the
operations are performed one at a time, starting from the first one.
The order in which operations are defined in the Link resource is largely
irrelevant; after each operation is performed, information in that
operation tells Gene the next operation to perform. Six fields appear in
an individual operation, although most operations will leave one or more
of these fields blank.
- The "Type" should be a letter telling which
of various types of operation to perform.
- The "Fld Num" contains the number of a field on
the given card (fields are numbered starting from zero).
- The "Next" field points to the operation to perform after the current one.
- The "Yes" and "No" fields also give numbers of other operations
in this Link resource, and are used by conditional operations.
- The "String" resource is a text string used by the āSā operation.
The operations in a Link resource can be broadly classed into two types.
Some operations output a string from information on the given card,
and then always perform the next operation according to their "Next" field,
which should give the number of another operation in the Link resource.
The Next field of the last operation in the sequence should be set to zero.
Other link operations are "conditional" operations that do not output
anything, but instead control the order in which other operations are
performed. Each such operation tests a condition (such as whether some
field of the card is empty), and depending on whether the answer is yes
or no, performs next the operation with the number in the "Yes" or "No"
field of the conditional. If this number is zero, no operation is
performed. A conditional operation may still have a nonzero "Next" field,
just like an output operation; in this case the operation at that number
is performed after the operations from the "Yes" or "No" field.
The following Link operation types are currently available:
- The "C" operation is a conditional used when we are creating a
string for a link pane, and tests whether the "Fld Num" number of the operation
points to the field of the card that used to make the link.
- The "E" operation is a conditional that tests whether the field given by the field number is empty.
- The "F" operation outputs the information in a field of the card
specified by "Fld Num".
- The "G" operation is like the "F" operation, but modifies the field
for GEDCOM export. Dates are output according to the GEDCOM
specification, and card names are modified to put slashes around the
surname.
- The "I" operation is also used for GEDCOM export. It outputs a
number, the unique ID of the card pointed to by the link field given by
"Fld Num", computed as part of Gene's GEDCOM export code. If the
Field number points to the card's name, or is negative, the ID of the
card itself will be output.
- The "S" operation outputs a fixed string given in the "String" field
of the Link operation.
- The "T" operation is obsolete (its functionality has been
replaced by Lnk# resources) but is retained for backward
compatibility with older Gene databases; it is a conditional for testing
what situation is causing us to use this Link resource.
We give as an example a link resource used to create the link pane text
for "Death" cards.
We show side by side the actual sequence of operations, and a version of the same program expressed in slightly more intelligable pseudocode.
1. C Fld:0 Next:5 Y:2 N:3 If link pane is for person who died
2. S Str:"Died " output "Died "
3. F Fld:0 Next:4 else output name of person who died
4. S Str:" died " output " died "
5. F Fld:1 Next:6 Output date of death
6. C Fld:2 No:7 If link pane isn't for place of death
7. E Fld:2 No:8 and place of death isn't empty
8. S Next:9 Str:" at " output " at "
9. F Fld:2 output place of death
"Lnk#" Resources
The "LnkN" resource provides a mechanism for applying different Link resources
to produce strings for a given card type in different contexts.
Contexts such as creating link panes, tree drawings, and GEDCOM output,
are represented in Gene simply as small numbers, starting from zero.
Each Lnk# resource is simply an array of Link resource IDs.
The first ID in the list is used to create strings for context zero,
the second for context one, and so on.
If there is no need to create a string in a certain context, the ID in that position of the array should be zero. The Lnk# resource also supplies a "default" Link resource
id to be used when no context-specific resource is available.
The following contexts are currently in use by Gene.
- 0: Link pane text.
- 1: Person in verbose tree.
- 2: Link to person in verbose tree.
- 3,4: GEDCOM output (controlled by gOut resources).
For backward compatibility with older versions of Gene,
if no Lnk# resource has the resource number specified in the card definition,
Gene will instead look for a Link resource with that number,
and use it in all contexts.
"Redr" Resources
The "Redr" resource is used to "redirect" one card to point to another,
so that a reference to the first type of card becomes an indirect
reference to the second type of card. The resource has two fields: the
name of a card type, and the number of the field on that card that
should be used as the indirect reference. There can be at most one Redr
resource per card type.
For tree drawings, a Redr resource is used on adoption cards, pointing
to the person card of the adoptee. The tree drawing commands use this
information to include adopted children in trees.
It is intended that Redr resources can be used on named cards, causing
any links to those cards to automatically become links to the indirect
reference. This could only happen if both cards had names in the same
card name list, which is not yet possible.
Along with the resources used to define card types,
a Gene database file can contain the following other types of resource.
"alis" Resources
The "alis" resource is a Macintosh-standard way of storing aliases to files.
(An alias contains not only the file name but also some other information
that helps the Macintosh system to make sure that it is pointing to the correct
file and to find the file if it has moved.) Gene uses aliases for
storing pointers to PICT files, containing pictures that are stored
externally to the Gene database itself. Each alis resource should
correspond to a number in the PictID field of a
picture card. Since pictures can also be stored as
PICT resources within the database, the alis and PICT resource
ID's should be chosen to be distinct from each other.
This numbering is done automatically by Gene when a user creates a picture.
"Dflt" Resources
The "Dflt" resource lets Gene automatically fill in certain enum fields
when links are made. For instance, when the "Wife" field of a marriage
card is set, the sex of the wife is set to female if it was blank. The
name and ID of Dflt resources are unimportant. Each Dflt resource has
four values:
- "Link Field" is the field which when changed causes a default. In the
example above, this value is 0 ("Wife" is the first field of a marriage card).
- "Enum Field" is the field to be filled by default (5, for a person's
"Sex" field).
- "Card Type" is the name of the card type containing the link field ("Marriage").
The type of the enum field's card is determined by looking at the
definition of the link field.
- "Value" specifies the default value ("female").
"gOut" Resources
The "gOut" resource controls the translation of the Gene database to a GEDCOM
file in Gene's Export command.
A GEDCOM file consists of a sequence of lines.
Each line consists of a "level number", an optional name, a four-letter "tag",
and then some data which may be the name of another line or unstructured text.
In Gene, names of GEDCOM lines have the form "@Xid@" or "@Xid1:id2@"
where X is a letter and id, id1, and id2 are numbers formed using the 'I'
operation in a Link resource.
The level numbers represent a tree: the root of the tree is not specified
and should be thought of as being at level -1; lines with a level number of zero
are children of the root; lines with level number one are children
of the previous level-zero line; and so on.
Gene's output consists of some preamble lines, the data itself, and some postamble
lines. The data lines are structured as a sequence of level-0 subtrees,
each of which corresponds to either a single Gene card or a pair of cards.
Each subtree can contain lines from the corresponding card(s), and also
from the cards with links to those cards.
One particular type of GEDCOM line that needs to be dealt with specially
in Gene is the "NOTE" (the word "NOTE" appears as the tag of such lines).
NOTEs have a function analogous to the text fields
of Gene's cards. Text in note lines is limited to 80 characters,
but notes can be extended by two kinds of lines, each of which
must have a level number one greater than the "NOTE" line: "CONC"
adds more text to the previous NOTE line itself, and "CONT" continues
the note on a new line. As we discuss below, gOut resources
provide a mechanism for translating text into NOTE lines and their continuations.
Gene automatically breaks the note into shorter chunks using CONC and CONT lines.
Each gOut resource controls the lines output by some card type
in one of these subtrees. A gOut has five components:
- The "Link Type" gives a context number to be used by the
card's Lnk# resource to determine which
Link resource
to use to output the card's GEDCOM lines.
- The "Note Number" controls translation of the card's text into note lines,
after the lines created by the "Link" resource. This number gives the level
which the NOTE line should use; successive CONC and CONT lines use a number
one greater than this. If the given number is -1, no NOTE is created.
- "ID1" and "ID2" control which subtree of the GEDCOM file the output
should be included in. These give field numbers for the given card
type, which are typically link fields pointing to the card or pair of
cards corresponding to the subtree the output should be created. If the field number
is a name field, the card itself is used; if the number is -1, no card is used
(so in gOut's for subtrees corresponding to a single card, ID2 should be -1).
- The "Card Type" specifies the type of card supplying the data to be output.
For example, Gene has two gOut resources with card type "Person". The
first is used to create a subtree containing most of the information
about the person, which in Gene is stored in several different cards;
this first gOut has link type 3, ID1=0 and ID2=-1 (specifying that the information is
to be put in a subtree corresponding to the given person), and note
number 1. The second gOut is used to put a pointer to the person in a
subtree corresponding to a "family". Gene itself has no concept of families,
so it creates families in the GEDCOM file for any two people involved
in a birth or marriage. The second gOut has link type 4, ID1=2 and ID2=1,
telling Gene to put the GEDCOM text in a subtree indexed by the father and mother;
it has note number -1 since the person's text pane is handled by the other gOut.
"PICT" Resources
The "PICT" resource is a Macintosh-standard way of storing bitmap
images. Gene uses PICTS for storing pictures internally to the Gene
database. Each PICT resource should correspond to a number in the PictID field of a picture card. Since pictures can
also be stored as alis resources within the
database, the alis and PICT resource ID's should be chosen to be
distinct from each other.
"TMPL" Resources
The "TMPL" resource is used to describe the format of the other
resources, so that they can be edited easily within ResEdit.
When you save a database file, Gene will automatically create a set of
TMPL resources in that file, describing the other possible types of
resource. Even if a file uses only the standard types of cards, the
TMPL resources will still be created, so that you can use ResEdit to
define new card types. These TMPL resources are not used by Gene, but
they should not be changed.
"Tplt" Resources
The Tplt resource is used to define ways of creating new cards using
information from existing cards. Tplt resources consist of some general
values specifying what kind of card the template creates and when it
applies, followed by a list of fields to copy from the old card to the
new one.
Each Tplt resource corresponds to an entry in the Templates menu. The
name of the menu entry is taken from the name of the resource. Each
menu entry may come from several Tplt resources, for instance if the
template depends on the gender of a person. If no Tplt resource with a
given name applies, the menu entry will be inactive.
Each Tplt resource contains the following values:
- Command Key. This is a character used to abbreviate the
template command, or blank for templates with no abbreviation.
the command keys cmd-B, cmd-D, and cmd-M are reserved in Gene for use as templates,
so this key may consist of letters B, D, or M, or punctuation other
than the period.
- From Card. This specifies the type of card to which the template applies.
- To Card. This specifies the type of new card the template creates.
- Enum Value and Enum Field. If the template works for any card of
the From Card type, Enum Field should be -1. Otherwise, if the template
applies e.g. only to male people, Enum Field specify the field of the
From Card used to tell whether it applies (5, for Sex) and Enum Value
should be the value for which it applies ("male").
After the general values specified above, each Tplt resource contains a
list of the fields to copy from the previously existing card to the new
card created by the template. Each entry in the list consists of two
numbers, the first one specifying a number of a field in the existing
card, and the second one specifying the number of the field in the new
card to which the information should be copied. When Gene performs a
template command, it goes through this list and copies the field values
as strings. It is not necessary for the fields on the existing card and
the newly created card to have the same types; Gene will create a string
for the copied value and attempt to set the new card's field value to
that string, as if you had typed that string directly into the new card.
The following example gives the values describing the "Child"
template, active when the currently open card refers to a male person.
Command Key B Cmd-B is a shortcut for this template.
From Card Person It copies information from a person card
To Card Person to a new person card.
Enum Value male It is only active when the person is male
Enum Field 5 (there is a similar Tplt for female people).
From Field 0 The template copies the person's name
To Field 4 to the Father field of the new card
A template that copies more than one field (such as the Divorce template
for Marriage cards, which copies both the Husband and Wife fields)
would have more than one From Field and To Field pair,
one for each piece of information that is copied.
"Tree" Resources
Tree resources describe commands in the Tree menu. The name of each
command is taken from the corresponding resource name.
Each Tree resource consists of the following values.
- "Tree Type" can be one of "F", "G", "L", or "R", corresponding to
Ancestor, Ancestor Grid, Descendant, and Relation tree drawing styles respectively.
- The "Card Name" is the type of card appearing in the trees ("Person"
for the trees built in to Gene). The card must have a name field.
- After these two values, the resource contains a list of field
numbers. These numbers specify the fields by which connections can be
found (the Mother and Father fields for all the built in trees). These
fields must be links to cards of the same type specified in Card Name.
4. A Brief History of Gene
We divide the history of Gene's development into three parts:
- The past. We describe the previous versions of
Gene, and explain why Gene's present version number is 4.
- The present. We list some of the more
important recent changes to Gene.
- The future. We outline some features that we
plan to eventually add to Gene (perhaps not in exactly the same form as described).
4.1. Previous Versions of Gene
Gene 4.x is the fourth incarnation of a program that has been rewritten
in several languages, on several platforms, with several database
formats.
David Eppstein wrote Gene 1.0 in Pascal on a DECsystem-20 at Stanford
University. It used a standard DEC-20 command line interface and its
data format resembled the fields of the present person cards. He later
rewrote Gene 2.0 using Prof. Don Knuth's WEB system of "literate programming".
David then moved to Columbia University, and rewrote Gene 3.0 in C++
for Unix, keeping a similar command-line interface. To make up for the
lack of a text pane or other cards than people, he added complex
user-defined fields to the file format (making it more similar to GEDCOM
than Gene's present format).
After David and Diana married, moved to Irvine, and started using the
Macintosh, Diana rewrote Gene 4.0, with design input from David. The
tree drawing code was adapted from the previous version of Gene, and we
re-used some basic data structures such as the splay trees used to look
up people's names, but the data storage and user interface code is
completely new. Gene is now written in Symantec C++ for Macintosh, and
uses a simple but flexible card-based format (documented in this file)
and a point-and-click user interface (documented in the
Gene User Guide).
After the original release of Gene 4.0 in June 1994, three minor
releases 4.0.1, 4.0.2, and 4.0.3 were made. These mostly contained
bugfixes, but Gene 4.0.3 also included some changes of functionality: a
better user interface for setting tree drawing widths, templates for
children from marriage cards, and backward compatibility with 68000
machines and system 6.
This document describes Gene 4.1, the first major update to Gene.
We expect to officially release Gene 4.1 by the end of summer, 1995.
Along with more bugfixes, it adds the following features to Gene 4.0:
- Pictures, stored either within the Gene database
or as links to external files.
- Buttons linking pictures to other cards.
- Import and export of GEDCOM data.
- Preference dialogs including control of date formatting.
- Ranges of years in dates.
- Improved tree drawings including more font sizes, more styles,
new ancestor grid layout, menu to switch layout type within the tree dialog,
and ability to save PICT files.
- New card types: Event and Family Event.
- Better alphabetization of people without surnames.
(Previously Gene mixed their names among the surnames;
now they are grouped at the start of the name list.)
4.3. Gene Futures
Our top priorities for Gene are
- Report generation including options for creating
files in formats such as RTF, HTML, and Simpletext.
- A more general search command for finding cards with fields matching
user-defined patterns.
- Macintosh drag-and-drop support.
Other possible future features include
- More card types including aliases for people with several names.
- Pop-up menus to set fields (e.g. for links, this might be
a menu of open cards)
- Trees that go up and down (showing descendants of ancestors).
- More control of the text displayed in tree drawings.
- A smarter merge command including automatic detection of duplicate information.
- Searching the database for incomplete or inconsistent information.
- The ability to create and manipulate subsets of the database.
- A user interface for defining new card types.
- Scripting including scriptable actions when opening a file.
- A menu bar entry for selecting among open windows.
Copyright 1995, David and Diana Eppstein.