home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Encyclopedia of Graphics File Formats Companion
/
GFF_CD.ISO
/
formats
/
hdf
/
spec
/
hdf_03.txt
< prev
next >
Wrap
Text File
|
1994-06-01
|
181KB
|
4,809 lines
NCSA HDF Specifications
DRAFT
January 1993
University of Illinois at Urbana--Champaign
Introduction
Overview
The Hierarchical Data Format (HDF) was designed to make the sharing
of scientific data between different people, different projects,
and different types of computers easy and self-describing. An
extensible header, along with carefully crafted internal layers,
provides a system that can grow along with the software that NCSA
develops. This chapter provides a brief overview of HDF
capabilities and design.
Why HDF?
A fundamental requirement of scientific data management is the
ability to access as much information in as many ways and as
quickly and easily as possible. To make this possible, there needs
to be a data storage and retrieval system that facilitates these
capabilities. Specific needs of such a system include the
following.
* Support for scientific data and metadata. Scientific data is
characterized by a variety of different data types and
representations, data sets (including images) that can be
extremely large and complex, and the need to attach accompanying
attributes, parameters, notebooks, and other metadata.
* Support for a range of hardware platforms. Data can originate
on one machine, only to be used later on many different
machines. Scientists must be able to access data and metadata
on as many hardware platforms as possible
* Support for a range of software tools. Scientists need a variety
of software tools and utilities for easily searching, analyzing,
archiving, and transporting the data and metadata. These tools
range from a library of routines for reading and writing data
and metadata to small utilities that simply display an image on
a console, to full-blown database retrieval systems that provide
multiple views of thousands of sets of data and metadata.
* Rapid data transfer. Both the size and the dispersion of
scientific data sets require that mechanisms must exist to get
the data from place to place rapidly.
* Extendibility. As new types of information are generated and new
kinds of science are done, a means must be provided to support
them.
What is HDF?
The structure of HDF. HDF is a self-describing extensible file
format based on the use of tagged objects that have standard
meanings. The idea is to store both a known format description and
the data in the same file. HDF tags describe the format of the data
in the sense that each tag is assigned a specific meaning--one tag
is assigned to "Color Palette," another is assigned to "Raster
Image," and so on (see Figure 1). A program that has been written
to understand a certain list of tag types can scan the file for
those tag types and process the data. This program also can ignore
any data that is beyond its scope.
The set of available data objects encompasses both primary and
secondary data (metadata). Most HDF objects are machine- and
medium-independent, physical representations of data and metadata.
HDF Tags. HDF is designed with the assumption that we cannot know
a priori what types of data objects will be needed in the future,
nor can we know how scientists will want to view their data. As new
science is done, new types of data objects are needed, and new tags
must be created. In order to avoid unnecessary proliferation of
tags, and to insure that all tags are available to potential users
who need to share data, a portable public domain library is
available that interprets all public tags. The library contains
user interfaces designed to provide views of the data that are most
natural for users. As we learn more about the way scientists need
to view their data, we can add user interfaces that reflect data
models consistent with those views.
Types of data and structures. HDF currently supports the most
common types of data and metadata that scientists use, including
multidimensional gridded data, 2d and 3d raster images, polygonal
mesh data, multivariate datasets, sparse matrices, finite-element
data, splines, non-Cartesian coordinate data, and text. In the
future there will almost certainly be a need to incorporate new
types of data, such as voice and video, some of which might
actually be stored on other media than the central file itself. In
this sense, it may be desirable to employ the concept of a "virtual
file", which functions like a file, but doesn't fit our normal
notion of a file as a monolithic sequence of bits stored entirely
on a disk or tape somewhere.
HDF also makes it possible for the user to include annotations,
titles, and specific descriptions of the data in the file, so that
files can be archived with human-readable information about the
data and its origins.
One collection of HDF tags supports a hierarchical grouping
structure called vset that allows scientists to organize data
objects within HDF files to fit their views of how the objects go
together, much as a person in an office or laboratory organizes
information in folders, drawers, journal boxes, and on their
desktops.
*** INSERT FIGURE HERE ***
Backward and forward compatibility. An important goal of HDF is to
maximize backward and forward compatibility among its interfaces.
This is not always achievable, because changes sometimes have to
be made to the way data is organized in order to enhance
performance, to correct errors, or for other reasons. However,
whenever possible, HDF files should not become out of date. For
example, suppose a site falls far behind in the HDF standard, so
its users can only work with the portions of the specification that
are three years old. Users at this site might produce files with
their old HDF software, then read them with newer software designed
to work with more advanced data files. The newer software should
still be able to read the old files.
Conversely, if the site receives files that contain objects that
its HDF software does not understand, it should still be able to
list the types of data in the file, and it should still be able to
access all of the older types of data objects that it understands,
despite the fact that the older types of data objects are mixed in
with new kinds of data. In addition, if the more advanced site uses
the text annotation facilities of HDF effectively, the files will
arrive Appendix A, "NCSA HDF Tags," presents a list of brief
descriptions of the tags assigned at NCSA for general use.
Appendix B, "Header Files," includes the general header files used
in compiling all HDF libraries.
Form of Presentation
The material in this manual is presented in text or Presentation
screen displays.
Text
In explaining various features and commands, this manual often
presents a word within a paragraph in italics to indicate that the
word is defined within the paragraph.
Portions of this manual refer to other portions of the manual where
the other portions explain related topics. These cross references
usually mention the title of sections or chapters enclosed in
quotation marks, such as, See Chapter 1, "The Basic Structure of
HDF Files."
Screen Displays.
Screen displays in this manual are presented in Courier type.
long process of redesigning the lower layers of HDF began. As of
this writing, in Summer 1982, we are about to release the first
version of HDF that incorporates the new lower layers of HDF.
Use of This Manual
This manual is designed for software developers who are designing
applications or routines for use with HDF files and for users who
need detailed information about HDF. Users who are interested in
using HDF to store or manipulate their data do not normally need
the kind of detail presented in this manual. They should instead
consult a user manual, such as "HDF Calling Interfaces and
Utilities," "HDF Vset", or perhaps a manual having to do with
software that uses HDF.
Manual Contents
The manual is organized into the following chapters:
Chapter 1, "The Basic Structure of HDF Files," introduces and
describes the components and organization of Hierarchical Data
Format files.
Chapter 2, "HDF Software Overview," describes the organization of
the software layers that make up the basic HDF library.
Chapter 3, "The NCSA HDF General Purpose Interface," describes the
HDF modules that make up the general purpose HDF routines,
sometimes referred to as the lower layer of HDF.
Chapter 4, "Sets and Groups," explains the role of sets and groups
in an HDF file. It contains descriptions of raster image sets,
scientific datasets, and Vsets. Vsets are covered in more detail
in another chapter.
Chapter 5, "Annotations," explains how annotations are currently
organized in HDF files.
Chapter 6, "Number Conversion," describes the HDF module that is
used for number conversion.
Chapter 7, "Vsets," describes the structure and functioning of the
Vset module.
Chapter 8, "Portability," describes techniques and conventions used
in the HDF code to achieve portability.
Chapter 9, "HDF Conventions," presents guidelines regarding the use
of HDF that are not discussed elsewhere.
Table of Contents
Introduction
Overview vii
Why HDF vii
What Is HDF viii
Some History x
Use of This Manual x
Chapter 1 The Basic Structure of HDF Files
Chapter Overview 1.1
File Header 1.1
Data Object 1.1
Physical Organization of HDF Files 1.4
Sample HDF File 1.5
Chapter 2 Software Overview
Chapter Overview 2.1
Software Layers 2.1
Organization of HDF Software 2.2
Some HDF Conventions 2.5
Chapter 3 The NCSA HDF General Purpose Interface
Chapter Overview 3.1
Introduction 3.1
Overview of the interface 3.2
Function Specifications 3.6
Chapter 4 Sets and Groups
Chapter Overview 4.1
Sets 4.1
Groups 4.2
Raster Image Sets 4.4
Scientific Datasets 4.6
Vsets and Vdatas 4.12
Appendix: The Raster-8 Set 4.13
Chapter 5 Annotations
Chapter Overview 5.1
Types of Annotations 5.1
File Annotations 5.1
Object Annotations 5.1
Getting Reference Numbers for Object
Annotations 5.2
Chapter 6 Tag Specifications
Overview 6.1
The HDF Tag Space 6.1
Physical Storage Methods 6.1
Specifications for Supported Tags 6.4
Chapter 7 Making HDF Portable
Chapter Overview 7.1
The HDF Environment 7.1
Organization of Source Files 7.2
Passing Strings Between.FORTRAN and C 7.5
Function Return Values between FORTRAN and C 7.7
Differences in Acceptable Routine Names 7.8
ANSI C vs. Old C 7.11
Type Differences 7.12
Access to Library Functions 7.15
Figures and Tables
Figure 0.1 Raster Image Sets in an HDF File viii
Figure 1.1 Three Data Objects 1.1
Figure 1.2 A Data Descriptor 1.2
Figure 1.3 Model of a Data Descriptor Block 1.3
Figure 1.4 Sample Data Descriptor Block 1.4
Figure 1.5 Physical Representation of Data Objects 1.5
Figure 2.1 HDF software layers 2.1
Figure 4.1 Physical organization of Sample RIG Groupings 4.3
Figure 5.1 Three SDS Tags with Their Ref Numbers 5.1
Figure 5.2 Displayed Example of SDS, Ref #, and Annotation 5.2
Figure 6.1 Description Record for a Linked Block Element 6.2
Figure 6.2 A Linked Block Table 6.3
Figure 6.3 A Data Block 6.3
Figure 6.4 Description Record for an External Element 6.4
Figure 7.1 Illustration of the sequence of actions Involved when
a FORTRAN call includes a string as a parameter 7.7
Table 1.1 Parts of a Data Descriptor 1.2
Table 1.2 Summary of the Relationships among Parts of an HDF
File 1.4
Table 1.3 Sample Data Objects in an HDF File 1.5
Table 2.1 HDF 3.2 source code modules 2.5
Table 4.1 Tags for Raster Image Sets 4.5
Table 4.2 Additional tags for Raster Image Sets 4.5
Table 4.3 Required tags for SDG 4.8
Table 4.4 Optional Tags for SDG 4.
Table 4.5 Required tags for NDG 4.9
Table 4.6 Optional Tags for NDG 4.10
Table 4.7 Required Tags for NDG structure that is compatible
with SDG structure 4.10
Table 4.8 Tags for Raster-8 Sets 4.14
Table 5.1 HDF Annotation tags 5.1
Table 6.1 Number Type Values 6.7
Table 6.2 Possible Machine Types 6.8
Table 6.3 Possible Tag Types in an RIG 6.12
Table 6.4 Color Format String Values 6.16
Table 6.5 Possible Tag Types in an NDG 6.21
Table 6.6 Possible calibrated data types 6.28
Table 6.7 Possible Tag Types in an SDG 6.34
Table 6.9 Scientific Data Dimension Record Fields 6.12
Chapter 1 The Basic Structure of HDF Files
Chapter Overview
File Header
Data Object
Data Descriptor
DD Blocks
Data Element
Naming and Assigning Tags
Physical Organization of HDF Files
Sample HDF File
Chapter Overview
This chapter introduces and describes the components and
organization of Hierarchical Data Format (HDF) files.
File Header
The first component of an HDF file is the file header (FH), which
takes up the first four bytes in an HDF file. The file header is
a signature that indicates that the file is an HDF file.
Specifically, it is the 32-bit magic number with the 32-bit
hexadecimal value 0e031301.
NOTE: HDF assumes big-endian order in reading and writing files.
On some machines the order of bytes in the file header might be
swapped when the header is written to an HDF file, causing these
characters to be written in little endian. To maintain portability
of HDF files when developing software for such machines, you should
counteract this byte-swapping by making sure the characters are
read and written in the exact order shown.
Data Object
The basic building block in an HDF file is the data object, which
contains both data and information about the data. A data object
has two parts: a 12-byte data descriptor (DD) and a data element.
Figure 1.1 shows three examples of data objects.
As the names imply, the data descriptor gives information about the
data, and the data element it the data itself. In other words, all
data in an HDF file has attached to it information about itself.
In this sense, HDF files are examples of self-describing files.
ED. NOTE: Figures are not available in this plain text version
of the specification.
Figure 1.1 Three Data Objects
Data Descriptor (DD)
A data descriptor (DD) has four fields: a 16-bit tag, a 16-bit
reference number, a 32-bit data offset, and 32-bit data length.
These parts of a DD are depicted in Figure 1.2 and are briefly
described in Table 1.1. Explanations of each part appear in the
paragraphs following Table 1.1.
*** INSERT FIGURE HERE ***
Table 1.1 Parts of a Data Descriptor
Part Description
tag designates the type of data in a data element
reference number uniquely distinguishes corresponding data
element from others with the same tag
data identifier tag/ref; uniquely identifies data element
offset byte offset of corresponding data element
length length of data element
Tag
A tag is the part of a data descriptor that tells what kind of data
is contained in the corresponding data element. A tag is actually
a 16-bit unsigned integer between 1 and 65535, but every tag is
also usually given a name that programs can refer to instead of the
number. If a DD has no corresponding data element, the value of its
tag is DFTAG_NULL, indicating that no data is present.. A tag may
never be zero.
Tags are assigned by NCSA as part of the specification of HDF. The
following ranges are to be used to guide tag assignment:
00001 - 32767 reserved for NCSA use
32768 - 64999 user-definable
65000 - 65535 reserved for expansion of the format
Appendix A contains full specifications for all currently supported
NCSA HDF tags. Appendix B, "Assigned Tag Numbers," contains the
current number assignments. See the section 'Some HDF Conventions"
in the chapter "Software Overview" for more information on
allocating tags.
Reference Number
For each occurrence of a tag in an HDF file, a unique reference
number is stored with the tag in the data descriptor. Reference
numbers are 16-bit unsigned integers.
Reference numbers are not necessarily assigned consecutively, so
you cannot assume that the actual value of a reference number has
any meaning beyond providing a way of distinguishing among objects
with the same tag.
Data Identifier
The combination of a tag and its reference number uniquely
identifies the corresponding data object in the file. For this
reason, the tag/ref combination is sometimes referred to as a data
identifier.
Data Offset and Length
The data offset reflects the byte position of the corresponding
data element from the start of the file. The length gives the
number of bytes occupied by the data element. Offset and length are
both 32-bit unsigned integers.
DD Blocks
Data descriptors are stored physically in a linked list of blocks
called data descriptor blocks, or DD blocks. The individual
components of a data descriptor block are depicted in Figure 1.3.
All of the DDs in a DD block are assumed to contain significant
data unless they have a tag that is equal to DFTAG NULL (no data).
In addition to its DDs, each data descriptor block has a data
descriptor header (DDH). The DDH has two fields--a block size field
and a next block field. The block size field is a 16-bit unsigned
integer that indicates the number of DDS in the following DD block.
The next block field is a 32-bit unsigned integer giving the offset
of the next DD block, if there is one. The last DDH in the list
contains a 0 in its next block field.
*** INSERT FIGURE HERE ***
Data Element
A data element is the raw data part of a data object. Its basic
data type is determined by its tag, but other interpretive
information may be required before it can be processed properly.
Each data element is stored as a set of contiguous bytes starting
at the offset given in the corresponding DD (see Figure 1.4).(1)
*** INSERT FIGURE HERE ***
Physical Organization of HDF Files
Physically, the file header, DD blocks, and data elements are
organized as follows. The file header is followed by the first DD
block, which is followed by data elements and, if necessary, more
DD blocks. These relationships are summarized in Table 1.2.
There are no rules governing the distribution of DD blocks and data
elements within a file, except that the first DD block must follow
immediately after the file header. The pointers in the DD headers
connect the DD blocks in a linked list, and the offsets in the
individual DDs connect the DDS to the data elements. Beyond this
basic structure there is no assumed order among the objects in an
HDF file.
Table 1.2 Summary of the Relationships among Parts of an HDF File
Part Constituents
HDF File FH, DD-block, data, DD-block, data, DD-block,
data ...
F H oxOe031301 (32 bit magic number)
DD-block DDH, DD, DD, DD ...
DDH number-of-DDs (16 bits], offset-to-next-DD block (32
bits)
DD tag (16 bits), ref [16 bits], offset (32
bits),length (32 bits)
(1) Some HDF software provides the capability of storing objects
as a series of linked blocks or external elements, but this occurs
at a higher level. At the lowest level each object with a tag/ref
is stored contiguously.
Sample HDF File
Consider an HDF file that contains two 400-by-600 8-bit raster
images. Typically, such a file might contain the objects described
in Table 1.3.
Table 1.3 Sample Data Objects in an HDF File
Tag Ref Data
FID 1 file identifier: user-assigned title for file
FD 1 file descriptor: user-assigned block of text
describing overall file contents
IP8 1 Image palette (768 bytes)
ID8 1 x and y dimensions of the 2D arrays that contain
the raster images (4 bytes)
RI8 1 first 2D array of raster image pixel data (x*y bytes)
RI8 2 second 2D array of pixel data (also x*y bytes)
Assuming, for example, that the size of a DD block is 10 DDs, the
physical organization of the contents of the file might be
described as shown in Figure 1.5.
Figure 1.5 Physical Representation of Data Objects
Offset Contents
0 FH
4 DDH (10 0)
10 DD (FID 1 130 4)
22 DD (FD 1 134 41)
34 DD (IP8 1 175 768)
46 DD (ID8 1 943 4)
58 DD (RI8 1 947 240000)
70 DD (RI8 2 240947 240000)
82 DD (empty)
94 DD (empty)
106 DD (empty)
118 DD (empty)
130 "sw3"
134 "solar wind simulation: third try. 8/8/88"
175 <data for the image palette>
943 <data for the image dimensions>: 400, 600
947 <data for the first raster image>
240947 <data for the second raster image>
In this instance, the file contains two raster images. The two
images have the same dimensions and are to be used with the same
palette. So, the same data objects for the palette (IP8) and
dimension record (ID8) can be used with both images.
Chapter 2 HDF Software Overview
Chapter Overview
Introduction
Software Layers
Organization of HDF Software
Versions and Release Numbers
ANSI C and Portability
Modules and Interfaces
Header Files
The HDF Test Suite and Examples
Some HDF Conventions
Naming and Assigning Tags
Using Reference Numbers to Organize Data Objects
Multiple References and File Compaction
Chapter Overview
This chapter contains a description of how HDF software is
organized. It also contains some guidelines on writing HDF
software.
HDF Software Layers
HDF-based software comes in four basic forms: an HDF interface
library, user programs that store and retrieve data in HDF files,
HDF command-line utilities, and HDF-based software tools.
The HDF interface library has two types of interfaces: (1) sets of
general purpose routines that form the basis of all higher-level
HDF development, and (2) application interfaces that support higher
level views of data.
User programs access HDF files via calls to the HDF library. User
programs are attached to the HDF library when they are compiled and
linked.
The HDF command-line utilities are a group of programs that are
distributed with the HDF library. The functionality of the
command-line utilities ranges from general purpose, such as listing
the contents of an HDF file, to special purpose, such as converting
data between different HDF data types (e.g., raster images to
scientific data sets). In general, the utilities perform data
management tasks.
In contrast, HDF-based software tools usually perform data analysis
tasks and have polished interactive user interfaces. They include
the NCSA Visualization Tool Suite and commercial software packages
that use HDF.
HDF software is implemented in layers, as illustrated in Figure
2.1. At the lowest level are the general purpose modules, which
perform basic I/O. At the next level are interfaces that reflect
commonly used objects such as B-bit raster images (RIS8) and
multidimensional arrays (SDS). At the top layer are users'
programs, utilities, and software tools such as the NCSA
visualization software.
*** INSERT FIGURE HERE ***
The general purpose interfaces are described in detail in this
document. Descriptions of the applications interfaces and
command-line utilities can be found in the manual "HDF Calling
Interfaces and Utilities." Each HDF-based software tool should have
its own manual.
Since the NCSA user community writes programs primarily in C and
Fortran, all of the HDF application interfaces developed at NCSA
are callable from both C and Fortran programs. Since the general
purpose interface is primarily for program development, not for
applications, it provides C routines only.
Organization of Software
Versions and Release Numbers
Since HDF is under continual development, new releases are
periodically made available. An HDF version number looks like
"3.2r1" which means that it is major version 3, minor version 2,
release 1. The three parts of a version number have different
meanings:
* A new major version number implies that there is some fundamental
difference between this code and code with earlier major version
numbers. When a new major version is made available, HDF users
and developers are strongly encouraged to obtain the new source
code and documentation. There will likely be added functionality
in successive major versions.of the library and possibly some
deletion of obsolete code, so some user code may have to be
modified to use the new library.
* The meaning of a new minor version number is somewhat less well
defined. It essentially means that there is some appreciable
difference in the new code which was not deemed drastic enough
to warrant a new major version, but is more substantial than a
new release number would indicate.
* A new release number implies some bug fixes or other small
modifications have been made to the code. Using a new release of
the same version of the library will not usually require
modification of existing user code.
ANSI C and Portability
In order to provide for easy porting of HDF to new platforms, all
versions of the HDF source code from version 3.2 on will be written
in ANSI standard C, with special provisions made for non-ANSI
compilers. For more information about porting HDF and writing
portable HDF-based code, refer to the chapter "Making HDF
Portable."
Modules and Interfaces
The HDF distribution contains many source files or modules which
can be grouped into families according to their root name. For
example, dfp.c, dfpf.c and dfpff.f all share the root name "dfp"
and, therefore, all belong to the "dfp" family. In general, each
family of source modules represents one HDF applications interface.
Thus, the "dfp" family together represent the HDF Palette
Interface. There are a few exceptions to this rule which will be
discussed later in this section.
For each interface, there is necessarily one file that contains the
C Code that provides the basic functionality of that interface. But
some interfaces may have one or two additional code modules that
provide Fortran callability for the interface. So there are three
possible family sizes:
1 file:
Modules of this sort are generally not calling interfaces
themselves, but rather provide useful support functions for
actual calling interfaces. Since they are not meant to be called
by any routine outside the HDF library itself, they do not need
to be callable from Fortran programs. An example of such a module
is hblocks.c.
2 files:
Although there are currently no examples of this situation, it
is conceivable (and desirable) that some future interface may
need only one extra source module to provide Fortran
compatibility. If this were to happen, there would only be two
source modules for the interface. For instance, dfnew.c and
dfnewf.c would make up the "New Interface."
3 files:
Most current implementations of Fortran-callable HDF interfaces
require the passing of character string arguments to some of
their functions. Due to differences in the way C and Fortran
represent strings, the passing of strings requires that there be
a small amount of special purpose Fortran code written for each
function that takes a string argument.
For this reason, most Fortran-callable HDF interfaces consist of
three source modules:
(1) the primary C module,
(2)a Fortran-callable C module, and
(3) a Fortran module.
For example, dfsd.c, dfsdf.c and dfsdff.f make up the Scientific
Data Set Interface. dfsd.c contains the basic functionality of
the interface, dfsdf.c provides the major part of Fortran
callability, and dfsdff.f contains the special purpose Fortran
code that allows the passing of character string arguments.
Header Files
In addition to the source code modules discussed above, some
interfaces also have C header files associated with them that are
meant to be included by C applications programmers with the
"#include" preprocessor directive. They contain some useful
constants and data structures for interaction with the interface
from C programs. The header files can be identified by the same
name as the root name for the rest of the family with the ".h"
extension added. For example, dfsd.h is the header file for the
scientific Data Set Interface.
Of particular importance among the header files are hdf.h and
hdfi.h. hdf.h is the C header file that must be included by any
program that calls the HDF library. It contains all the symbolic
constants and public data structures that are needed to use HDF.
hdfi.h contains specific portability information about each
platform on which HDF is supported. It is automatically included
in programs when hdf.h is included, so programmers need not
explicitly include it. For more information on hdfi.h and other
portability issues, refer to the Chapter "Making HDF Portable.".
Table 2.1 shows all of the source code modules and header files
grouped into families for HDF 3.2.
Table 2.1 HDF 3.2 source code modules
general general grouping utilities Vsets Old
headers purpose (non- general
Vset) purpose
hdf.h hfile.c dfgroup.c dfutil.c vg.c dfstubs.c
hdfi.h hfilef.c dfgroup.h dfutilf.c vgf.c dff.c
hproto.h hfileff.f dfutilff.f vgff.f dfff.f
dfivms.h hkit.c dfutil.h vfp.c df.h
hblocks.c vgi.h dfi.h
hextelt.c vio.c dfstubs.h
herr.c vconv.c
herrf.c vparse.c
hfile.h vrw.c
herr.h vsfld.c
vg.h
vproto.h
8/24 bit general palettes scientifi annotatio special
raster raster c data ns FORTRAN
sets
dfr8.c dfgr.c dfp.c dfsd.c dfan.c constants.f
dfr8f.c dfgr.h dfpf.c dfsdf.c dfanf.c functions.f
dfr8ff.f dfcomp.c dfpff.f dfsdff.f dfanff.f
df24.c dfimcomp.c dfsd.h dfan.h
df24f.c dfrig.h
df24ff.f
The HDF Test Suite and Examples
In addition to the source code for the HDF library, versions 3.2
and higher will have an available suite of test programs There are
at least two test programs for most interfaces: one for the C
version and one for the Fortran-callable version. Some interfaces
have more than two test programs to test special features of that
interface and some have only one test program, since they only
provide C-callability.
Every effort will be made to ensure that the test programs provide
a thorough and accurate assessment of the health of the HDF
library. Although it is hoped that the test suite will greatly
improve the reliability of HDF code, it is almost inevitable that
some parts of the code will be untested. Therefore, no guarantees
can be made on the basis of test suite performance.
There is also a set of example programs to help users write HDF
programs. They illustrate some of the common ways in which users
program with HDF.
Some HDF Conventions
The specification of HDF described in the previous chapter is not
sufficient to guarantee its success. It is also important for users
to adhere to certain conventions in using HDF. Guidelines in the
use of HDF are implicit in many discussions in other sections of
this document, and others are presented in the manual "HDF Calling
Interfaces and Utilities." Guidelines not covered elsewhere are
introduced in this section.
Naming and Assigning Tags
Tags that are to be made available to a general population of HDF
users should be assigned and controlled by NCSA. Tags of this type
are given numbers in the range 1-32,767. If you have an application
that fits this criterion, contact NCSA at the address listed on the
README page at the beginning of this manual and specify the tags
you would like. For each tag, your specifications should include
a suggested name, information about the type and structure of the
data that the tag will refer to, and information about how the tag
will be used. Your specifications should be similar to those
contained in Appendix A. NCSA will assign you a set of tags for
your application and include your tag descriptions in its
documentation.
Tags in the range 32,768-64,999 are user-definable. That is, you
can assign them for any private application. Of course, if you use
tags in this range you need to be aware that they may conflict with
other people's private tags.
Using Reference Numbers to Organize Data Objects
The HDF library itself uses reference numbers solely for the
purpose of distinguishing between different objects with the same
tag. While application programmers may find it convenient to impart
some meaning to reference numbers, they should be forewarned that
the HDF library will be ignorant of any such meaning. In other
words, any meaning attached to reference numbers exists only at the
application program or software tool level.
Some users have used reference numbers to indicate how objects
should be grouped by considering all objects with the same
reference number to be part of the same group. This practice is not
recommended. Instead, if object grouping is desired it is
recommended that you use either the simple grouping procedures used
by the SDS, RIS8, and RIS24 applications (supported by the routines
in dfgroup.c), or the more general (and more complex) Vset
structures.
Another possible use of reference numbers is for keyed access to
HDF objects. An HDF data identifier (tag/ref) provides an unique
identifier for any HDF object within a file, and hence could be
used as a primary key for that object. One could keep a table of
data identifiers as a way of providing random access to HDF
objects.
Reference numbers might also be used to impose an ordering on HDF
objects. Once again, because the assignment scheme for reference
numbers in HDF files does not guarantee any order, caution is
advised in this uses of reference numbers.
Multiple References
Multiple references to a single data element are quite common in
HDF. The general purpose routine Hdupdd generates a new reference
to data that is already pointed to by another DD. If Hdupdd is used
several times, there could be several DDs that point to the same
data element.
It is important to note that when a multiply-referenced data
element is deleted or moved, the various DDs that previously
pointed to the data element are not automatically deleted or
adjusted to point to the data element in its new location.
Consequently, each DD to be deleted or moved should be checked for
multiple references and handled as the programmer sees fit.
Chapter 3 The NCSA HDF General Purpose Interface
Chapter Overview
Introduction
Overview of the Interface
Function Specifications
Opening and Closing Files
Finding Tags, Refs, and Element Lengths
Reading and Writing Entire Data Elements
Reading and Writing Part of a Data Element
Manipulating Data Descriptors (DDs)
Creating Special Data Elements
Development Routines
Error Reporting
Chapter Overview
This chapter contains a detailed description of the routines that
make up the general purpose HDF interface.
Introduction
NCSA supports interfaces for HDF users--both high level interfaces
to support certain application areas, such as image processing, and
low level general purpose interfaces for performing basic
operations on HDF files. These interfaces are written in C only but
most functions are typically accessible from Fortran.
The routines in the general purpose interface enable you to build
and manipulate HDF objects of any type, including those of your own
invention. All HDF applications developed at NCSA use these
routines as their basic building blocks.
The routines described in this chapter represent a second set of
general purpose routines. All HDF applications prior to HDF 3.2
(released in June 1992) used an earlier set of general purpose
routines. These low level general purpose routines have been
changed to allow for better functionality. Old routines will still
be emulated but at a cost of reduced functionality. Users are
strongly advised to use the new interface.
The new lower layer, first used with HDF Version 3.2, incorporates
the following improvements over its predecessor:
* More consistent data and function types.
* An error handling module that supports more meaningful and
extensive reporting of errors.
* Simplification of key lower level functions.
* Simplified techniques for facilitating portability.
* Support for alternate forms of physical storage, such as linked
blocks storage, and storage of the data portion of an object in
an external file.
* A version tag indicating which version of the HDF library last
changed an HDF file.
* Support for simultaneous access to multiple files.
* Support for simultaneous access to multiple objects within a
single file.
The previous lower layer is called the "DF layer", because all
routines began with the letters "DF", as in "DFopen" and "DFclose."
The new layer is called the "H layer" because all routines begin
with the letter "H" (Hopen, Hclose, Hwrite, etc.). The source
modules that implement these changes can be found in files that
begin with the letter "h".
Also, the number of basic source modules has changed, and now
includes:
hfile.c basic I/O
herr.c error-handling
hkit.c general purpose routines
hblocks.c to support linked block physical storage
hextelt.c to support external storage of HDF data
Overview of the interface
Following is a listing of the public functions that can be found
in the general purpose interface. This section provides
specifications and descriptions of these routines.
Opening and Closing HDF Files
These calls are used to open and close HDF files.
Hopen Provides an access path to an HDF file. It also
reads into memory all of the DD blocks in the
file.
Hclose Closes the access path to a file.
Locating Elements for Access and Getting Information
These routines make it possible to locate elements or find out
other information. Except for Hendaccess, they initialize the
element that they locate and return an access id that is used in
later references to the data element. Calls to them can include
wild cards so that one can search for unknown tags and refs.
Hstartread Locates an existing data element with matching
tag/ref and returns an access id for reading it.
Hnextread Continues the search with the same access id.
Hstartwrite Allows writing to the object with the supplied
tag/ref. If the object exists, the object will be
modified, otherwise it is created.
Hendaccess Disposes of access id for tag/ref.
Hinquire Returns access information about a data element.
Hishdf Determines whether a file is an HDF file.
Hnumber Returns the number of occurrences of a specified
data identifier (tag/ref) in a file.
Hgetlibversion Returns version information for the current HDF
library
Hgetfileversion Returns version information for an HDF file
Reading and Writing Entire Data Elements
There are two sets of routines for reading and writing data
elements. The set of routines described here is used to store and
retrieve entire data elements. A second set of routines, described
in the next section, may be used if you wish to access only part
of a data element at a time.
Hputelement Adds or replaces elements in a file.
Hgetelement Obtains the data referred to by the tag/ref
combination that is passed to it.
Reading and Writing Part of a Data Element
The second set of routines for reading and writing data elements
makes it possible to read or write all or part of a data element,
in contrast to the routines described above which can only read or
write an entire element. One of the access routines Hstartread or
Hstartwrite must be called before calling these routines.
Hwrite Appends data to a data element. It starts at the
last position left by a Hwrite or Hseek command,
writes up to a specified number of bytes, then
leaves the access pointer at the end of the data
written.
Hread Reads a portion of a data element. It starts at
the last position left by a Hread or Hseek command
and reads any data that remains in the element up
to a specified number of bytes.
Hseek Sets the access pointer to an offset within a data
element. The next time Hread or Hwrite is called,
the access occurs from the new position. The
location to seek to can be specified as an offset
from the current location or from the start of the
element.
Manipulating Data Descriptors (DDs)
These routines perform operations on DDs without doing anything
with the data to which the DDs refer.
Hdupdd Is used to generate new references to data that
is already referenced from somewhere else.
Hdeldd Deletes a tag/ref from the list of DDs.
Hnewref Returns the next available reference number for
the HDF file.
Creating Special Data Elements
HDF 3.2 introduces two alternate methods of physical storage for
HDF objects. Previously, all of the objects in an HDF "file" had
to be in the same file and any given object had to be contiguous.
This last requirement caused many problems, especially with regard
to appending to existing objects. Objects needed to be deleted and
rewritten to the end of the file in order to append to them.
The two new storage methods are "linked blocks" and "external
elements". Linked blocks allow elements in a single HDF file to be
non-contiguous. External elements allow a single HDF object to be
stored in an external file. It is not currently possible to have
a single object (such as a very large data set) stored in multiple
files. Nor is it possible to have multiple objects stored in an
"external" file.
Special data elements can be accessed with the same routines as for
normal data elements once they are created. These routines create
special data elements.
HLcreate Creates a new linked block special data element.
HXcreate Creates a new external file special data element.
Both of these routines have two modes of operation. For example,
calling HLcreate with a tag and ref which do not exist in a file
will create i new element with the given tag and ref that will be
stored as linked blocks. On the other hand, if the tag/ref pair
already existed in the file, the referenced object is "promoted"
to being stored as linked blocks. All data which had been stored
in the object before the promotion is retained. HXcreate behaves
similarly.
Development Routines
The HDF library provides a number of "developer" level routines
that are meant to simplify the task of writing HDF applications.
most of these routines mirror basic C library functions which are,
unfortunately, not always completely portable in their library
form.
HDgettagname Return a pointer to a text string describing a
given tag.
HDgetapace Allocate space.
HDfreespace Free space.
HDstrncpy Copy a string from one location to another up to
a given number of characters.
Error Reporting
The HDF library now provides a much more robust error reporting
scheme. Previously, only a single error value could be returned to
the user. There is now the notion of an error stack. This allows
for more of the context to be known when trying to decipher a
problem.
HEprint Print out all of the errors on the error stack to
a specified nfile.
HEclear Clear the error stack.
HERROR Macro to report an error. This will push the error
type, file name, line number and name of the
function reporting the error.
HEreport Add a text string to the description of the most
recently reported error. Only a single text string
may be supplied per error.
The only problem with the error module is that standard C does not
have any way for the code inside a function to know the name of the
function. Therefore, in order to use the macro HERROR to report
errors, there must exist a variable FUNC which points to a string
containing the name of the reporting function.
Other
Hsync Synchronize stored version of HDF file with image
in memory.
Function Specifications
Opening and Closing files
Hopen
int32 Hopen(char *path, int access, int16 ndds)
path IN: Name of file to be opened
access IN: DFACC_READ, DFACC_WRITE, DFACC_CREATE or
anybitwise-or of the above
ndds OUT: Number of dds in a block if this file needs to be
created
Purpose: Provides an access path to an HDF file. It also reads
into primary memory all of the DD blocks in the file.
Returns: On success returns file id, on failure returns FAIL.
Description: Opens an HDF file.
Interpretations of access:
HDF provides several constants for use as access
privilege codes. Below is a list of these codes and
their meanings. It is important to note that these
constants are NOT bitflags and should NOT be or'd
together to combine access modes. Doing so may cause odd
behavior and, in some cases, loss of data.
Recommended:
DFACC_READ: Open for read only. If file does not
exist, error.
DFACC_RDWR: Open for read/write. If file does not
exist, create it.
DFACC_CREATE: Force creation. If file exists, delete
it, then open a new file for
read/write. (in the spirit of UNIX
"clobber")
Others:
DFACC_ALL: Same as DFACC_RDWR.
DFACC_WRITE: Same as DFACC_RDWR.
On successful exit,
* File_rec members are filled in.
* File is opened with the relevant permission.
* Information about dd's are set up in memory.
For a new file, in addition,
* The file headers and initial information are set up.
Hclose
intn Hclose(int32 id)
id IN: the file id of the file to be closed
Purpose: Closes the access path to the file.
Returns: SUCCEED (0) if successful and FAIL (-1) if
failed.
Description: Id is first validated. If valid, the function
closes the acces path to the file.
If there are still access elements attached to
the file, the e DFE_OPENAID is returned and the
file is not closed.
This is a fairly common error when developing
new interfaces. the discussion of Hendaccess
below for hints on how to debug problem.
Locating Elements for Access and Getting Information
Hstartread
int32 Hstartread(int fileid, int tag, int ref)
fileid IN: id of file to attach access element to
tag IN: tag to search for
ref IN: ref to search for
Purpose: Locate an existing data element with matching
tag/ref and return a descriptor for reading it.
Returns: On success returns id of access element if
successful, otherwise FAIL (-1).
Description: Searches the DD's for a particular tag/ref
combination. Wildcards can be used for tag or
ref (DFTAG_WILDCARD, DFREF_WILDCARD) and they
match any values. Searching on wildcards begins
from the beginning of the DD list. If the search
is successful, the access element is positioned
to the start of that tag/ref, otherwise it is
an error. An access element is created and
attached to the file.
Hnextread
intn Hnextread(int32 access_id, int16 tag, int16 ref, int origin)
access_id IN: Id of a READ access elt
tag IN: the tag to search for
ref IN: ref to search for
origin IN: from where to start searching
Purpose: Locate and position a read access id on next
occurrence of tag/ref.
Returns: SUCCEED (0) if successful and FAIL (-1)
otherwise.
Description: Searches for the "next" DD that fits the
tag/ref. Wildcards apply. If origin is DF_START,
search from start of DD list, if origin is
DF_CURRENT, search from current position.
Searching from the end of the file via DF_END
is not yet implemented.
If the search is successful, then the access
element is positioned at the start of that
tag/ref, otherwise, the access_id is not
modified.
Hstartwrite
int32 Hstartwrite(int fileid, int tag, int ref, long len)
fileid IN: Id of file to write to
tag IN: tag to write to
ref IN: ref to write to
length IN: the length of the data element
Purpose: Creates or replace data element with matching
tag/ref.
Returns: Id of access element if successful and FAIL
otherwise.
Description: Set up an access element to write out a data
element. DD list of the file is searched first.
If the tag/ref is four the data element is NOT
replaced; rather, it is then possible modify the
existing data. If an object with the
corresponding and ref does not exist, a new one
is created.
Hendaccess
int32 Hendaccess(int access_id)
access-id IN: id of access element to dispose of
Purpose: Disposes of descriptor for tag/ref.
Returns: returns SUCCEED (0) if successful, FAIL (-1)
otherwise.
Description: Used to dispose of an access element. There is
only a finite number of access elements allowed
to be active at a time. Therefore, it is very
important to call Hendaccess whenever you are
done using an element.
When developing new interfaces, we have found
that a fairly common mistake is to not call
Hendaccess for all of the elements accessed.
When this happens, Hclose will return FAIL, and
the dump of the error stack (see HEprint, below)
will tell how many access elements are still
active.
This is a rather difficult problem to debug, as
the low level the HDF library have really no
idea who and where opened an access element and
forgot to release it. It's tedious, but the most
effective means we have found to debug this
problem is to annotate the locations where the
`attached' count of a file record is changed
(there are a couple of places in hfile.c ar few
in hblocks.c and hextelt.c).
Hinquire
intn Hinquire(int access_id, int32 *pfile_id, uint16 *ptag, uintl6
*pref, int32 *plength, int32 *poffset, int32 *pposn, int
*paccess, int *pspecial)
access_id IN: Id of an access elt
pfile_id OUT: file id
ptag OUT: tag of the element pointed to
pref OUT: ref of the element pointed to
plength OUT: length of the element pointed to
poffset OUT: offset of elt in the file
pposn OUT: position pointed to within the data elt
paccess OUT: the access type of this access elt
pspecial OUT: special code
Purpose: Returns access information of a data element.
Returns: Returns SUCCEED (0) if the access elt points to
some data element, otherwise FAIL (-1).
Description: Inquire statistics of the data element pointed
to by access element. If a piece of information
is not needed, it is possible to send NULL in
for that value. There are a set of convenience
macros for calls to Hinquire (HQuerypositon,
HQuerylength, etc ... ) defined in hdf.h.
Hishdf
int32 Hishdf(char *Path)
path IN: name of file
Purpose: Determine if a file is an HDF file.
Returns: Returns TRUE (non-zero) if file is HDF, FALSE
(0) otherwise.
Description: The decision of where a file is and HDF file or
not is based solely on the magic number stored
in the first four bytes of an HDF file. It is
possible that Hishdf will identify a file as an
HDF file but Hopen will be unable to open the
file (for example if the DD list in the file is
corrupted).
Hnumber
int Hnumber(int32 file-id, uint16 tag)
file id IN: file id
tag IN: tag to be counted
Purpose: Find the number of occurrences of tag/ref in
file.
Returns: The number of instances of a tag in a file.
Hgetlibversion
Hgetlibversion--return version info for current HDF library
USAGE
Hgetlibversion(uint32 *majorv, uint32 *minorv, uint32 *release,
char string[])
majorv OUT: majorv version number
minorv OUT: minorv version number
release OUT: release number
string OUT: informational text string (80 chars)
Purpose: Get version information for current HDF library.
Returns: Returns SUCCEED (0).
Description: Returns the version of the HDF library. The
version information is statistically compiled
into the HDF library, so it is not necessary to
have any open files for this function to execute.
Hgetfileversion
Hgetfileversion--return version info for HDF file
USAGE
Hgetfileversion(uint32 file-id, uint32 *majorv, uint32 *minorv,
uint32 *release, char string[])
file_id IN: handle of file
majorv OUT: majorv version number
*minorv OUT: minorv version number
release OUT: release number
string OUT: Informational text string (80 chars)
Purpose: Get version information for an HDF file.
Returns: Returns SUCCEED (0) if successful and FAIL (-1)
if failed.
Description: Returns the HDF version number stored in the
given file. It is still an open question as to
what exactly the version number of a file should
mean, so we recommend that user code not call
this function.
Reading and Writing Entire Data Elements
Hputelement
int Hputelement(int fileid, int tag, int ref,.char *data, long
length)
fileid IN: Id of file
tag IN: tag of data element to put
ref IN: ref of data element to put
data IN: pointer to buffer
length IN: length of data
Purpose: Add or replace element in a file.
Returns: Returns SUCCEED (0) if successful and FAIL (-1)
otherwise.
Description: Writes a data element or replace an existing
data element in a HDF file. Uses Hwrite and its
associated routines.
Hgetelement
int Hgetelement(int file_id, int tag, int ref, char *data)
file_id IN: Id of the file to read from
tag IN: tag of data element to read
ref IN: ref of data element to read
data OUT: buffer to read into
Purpose: Obtains the data referred to by the tag/ref
combination that passed to it.
Returns: Returns SUCCEED (0) if successful, FAIL (-1)
otherwise.
Description: Read in a data element from a HDF file and puts
it into buffer pointed to by data. The space
allocated for buffer is assumed to be large
enough.
Reading and Writing Part of a Data Element
Hread
int32 Hread(int access_id, long length, char *data)
access_id IN: Id of READ access element
length IN: length of segment to read in
data OUT: pointer to data array to read to
Purpose: Read a portion of a data element.
Returns: Returns length of segment actually read in if
successful and FAIL otherwise.
Description: Read in the next segment in the data element
pointed to by .the access element. It starts at
the last position left by a Hread, or Hseek
command and reads any data that remains in the
element up to a specified number of bytes. If
the data element is too short then it only reads
to end of the data element.
Hwrite
int32 Hwrite(int access_id, long len, char *data)
access_id IN: Id of WRITE access element
len IN: length of segment to write
data IN: pointer to data to write
Purpose: Write next data segment to data element.
Returns: Returns length of segment successfully written,
FAIL (-1) otherwise.
Description: Write the data to data element where the last
write or Hseek() stopped. It starts at the last
position left by a Hwrite command, writes up to
a specified number of bytes, then leaves the
write pointer at the end of the element. If the
space reserved is less than the length to write,
then only as much as can fit is written. It is
the responsibility of the user to insure that
no two access elements are writing to the same
data element. It is possible to interlace writes
to more than one data elements in the same file
though.
Hseek
intn Hseek(int32 access_id, long offset, int origin)
access_id IN: Id of access element
offset IN: offset to seek to
origin IN: position to seek from by offset, 0: from
beginning; 1: current position; 2: end of
data element
Purpose: Set the access pointer to an offset within a
data element. The next time Hread or Hwrite is
called, the read or write occurs from the new
position.
Returns: Returns FAIL (-1) if fail, SUCCEED (0)
otherwise.
Description: Sets the position of an access element in a data
element that the next Hread or Hwrite will start
from that position. origin determines the
position from which the offset should be added.
This routine fails if the access element is not
associated with any data element and if the
seeked position is outside c the data element.
Seeking from the end of a data element is not
currently supported.
Manipulating Data Descriptors
Hdupdd
int Hdupdd(int32 file_id, uint16 tag, uint16 ref, uint16 old_tag,
uint16 old_ref)
file id IN: Id of file
tag IN: tag of new data descriptor
ref IN: ref of new data descriptor
old_tag IN: tag of data descriptor to duplicate
old_ref IN: ref of data descriptor to duplicate
Purpose: Generate new references to data that is already
referenced from somewhere else.
Returns: Returns SUCCEED (0) if successful, FAIL (-1)
otherwise.
Description: Duplicates a data descriptor so that the new
tag/ref points to the same data element pointed
to by the old tag/ref.
Hdeldd
int Hdeldd(int file_id, int tag, int ref)
file id IN: Id of file
tag IN: tag of data descriptor to delete
ref IN: ref of data descriptor to delete
Purpose: Delete a tag/ref from the list of DDs.
Returns: Returns SUCCEED (0) if successful, FAIL (-1)
otherwise.
Description: Deletes a data descriptor of tag/ref from the
dd list of the file. This routine is unsafe and
may leave a file in a condition that is not
usable by some routines. Use with care.
Hnewref
uint16 Hnewref(int32 file_id)
file-id IN: id of file
Purpose: Return the next available ref for HDF file.
Returns: Returns the ref number, 0 otherwise.
Description: Returns a ref number that can be used with any
tag to produce a unique tag/ref. Successive
calls to Hnewref will generate a strictly
increasing sequence until the highest possible
ref had been returned, then Hnewref will return
unused ref's starting from 1.
Creating Special Data Elements
HLcreate
int32 HLcreate(int32 file_id, uint16 tag, uint16 ref, int32
block_length, int32 number_blocks)
file_id IN: Id of file
tag IN: tag of new data descriptor
ref IN: ref of new data descriptor
block_length IN: length of blocks to be used
number-blocks IN: number of blocks to use per linked block
record
Purpose: Create a new linked block special data element.
Returns: Access Id for special data element if
successful, otherwise (-1).
Description: Appending to existing elements has been a
problem in HDF in the past as HDF objects were
required to be stored contiguous. When
appending, the HDF library had forced the
use to delete the existing element and move it
to the end. With HDF 3.2 we had added the
concept of linked blocks which allow unlimited
appending to existing elements without copying
over existing data.
Initially, a table is set up to accommodate
numer_blocks linked blocks for this object. Each
block has size block_length bytes. If an
existing object is being promoted, block_length
does not have to be the same size as the
original element.
This routine can be used to either create an
object with the given tag ref as a linked block
element, or promote an existing element to be
stored with linked blocks. This routine will
return an active access id with write permission
to the linked block element.
HXcreate
int32 HXcreate(int32 file_id, uint16 tag, uint16 ref, char
*extern_file_name)
file_id IN: file record id
tag, ref IN: tag/ref of the special data element to
create
extern_file_name
IN: name of external file to use as data element
Purpose: Create a new external file special data element.
Returns: Access id for special data element if
successful, otherwise FAIL (-1).
Description: This routine is used to create a new element in
an external file or promote an existing element
to be in an external file. if an existing
element is to be promoted, it is deleted from
the original file and copied over into the new
external file.
Distributing a single object over multiple
external files is currently not supported. In
addition, it is not possible to place multiple
objects into the same external file. This
routine will return an active access id with
write permission to the external element.
Development Routines
HDgettagname
char *HDgettagname(uint16 tag)
tag IN: tag to look up
Purpose: Get a meaningful description of a tag.
Returns: A pointer to a string describing this tag or
NULL if the tag unknown.
Description: To reduce on the amount of reduplicated code,
this rout can be used to map a tag to a
character string containing the name of the tag.
If the tag is unknown, NULL is returned as
programs may have different ways of dealing with
unknown tags
For formatting purposes, the string returned by
this routine guaranteed to be 30 characters or
less.
HDgetspace
void *HDgetspace(uint32 qty)
qty IN: number of bytes to allocate
Purpose: Allocate space.
Returns: Pointer to space that was allocated.
Description: This routine is very platform-dependent. It uses
an appropriate allocation routine on the local
machine to get space
HDfreespace
void *HDfreespace(void *ptr)
ptr IN: pointer to previously-allocated space to be
freed
Purpose: Free space.
Returns: NULL.
Description: It uses an appropriate routine on the local
machine to space.
HDstrncpy
char *HDstrncpy(register char *dest,register char *source,int32
len)
dest OUT: pointer to area to copy string to
src IN: pointer to area to copy string from
len IN: maximum number.of bytes to copy
Purpose: Copy a string with some maximum length.
Returns: Address of dest.
Description: This function creates a string in dest that is
at most len' characters long. The `len'
characters include the NULL terminator, which
must be added for historical reasons. Hence, if
you have the string 'Foo\0' you must call this
copy function with len = 4
Error Reporting
HEprint
void HEprint(FILE *stream, int level)
stream IN: stream to print error messages on
level IN: level of the error stack to print
Purpose: Print out information on the error stack.
Returns: No return value.
Description: This routine will print out information on
reported errors. If level is zero all of the
errors currently on the error stack are printed.
Output of this function is sent to the file
point to by stream.
Information printed is: an ascii description of
the error, the reporting routine, its file name
and the line at which the error was reported.
In addition, if the programmer has supplied
extra information by means of HEreport, this
information is printed well.
HEclear
void HEclear(void)
Purpose: Clear all information on reported errors off of
the error stack
Returns: No return values.
Description: Clear all of the information off of the error
stack.
HERROR
void HERROR(int number)
number IN: error number
Purpose: Report an error.
Returns: No return value.
Description: HERROR can be used to report an error. Any
function which calls HERROR must have a variable
FUNC which points to a string containing the
name of the function.
HERROR is implemented as a macro.
HEreport
void HEreport(char *format, ... )
format IN: printf style format and arguments
Purpose: Provide extra information to the error reporting
routines.
Returns: No return value.
Description: This routine can be used to provide further
annotation to an error report. Only one such
annotation is remembered for each error report.
The arguments to this routine follow the style
of printf.
An example from hfile.c
char *FUNC = "Hclose";
...
if (file_rec->attach > 0) {
file rec>refcount++;
HERROR(DFE_OPENAID);
HEreport("There are still %d active aids
attached",
file rec->attach)
return FAIL;
Other
Hsync
int Hsync(int32 file id)
file_id IN: id of the file to sync
Purpose: Synchronize on-disk HDF file with image in
memory.
Returns: Returns SUCCEED.
Description: This routine is currently vacuous as the on-disk
representation of an HDF file is always the same
as its in-me representation. However, future
releases of the HDF library n employ buffering
schemes, so this might not always be the case.
Hsync will be provided to force the two
representations to be consistent.
Chapter 4 Sets and Groups
Chapter Overview
Sets
Types of Sets
Calling Interfaces for Sets
Groups
Sample Groups
General Features of Groups
Raster Image Sets
Raster Image Groups
Tags for Raster Image Sets
Compression of Raster Images
Scientific Datasets
Required Tags
Optional Tags
Vsets and Vdatas
Chapter Appendix: Raster-8 Sets
Compatibility between Raster-8 and Raster Image Sets
Chapter Overview
This chapter describes raster image sets, scientific datasets and
Vsets, and explains the role of sets and groups in an HDF file. It
also discusses the programming interfaces available for the three
types of sets.
Sets
Sometimes tags are grouped into sets, where each set is designed
to serve a particular user requirement. For example, the raster
image set that is described in the following sections, contains
several tags that are used for storing information about 8-bit
raster images.
Types of Sets
In the current implementation of HDF there are three kinds of sets:
* A raster image set contains a raster image, along with
descriptive information about the image, such as its dimensions
and (optionally) a color lookup table.
* A scientific data set contains a multidimensional array, along
with descriptive information about the data.
* A Vset is a general grouping structure that can contain any kinds
of HDF objects that a user wishes.
Each HDF set is defined in terms of a minimum collection of data
objects that must be present for the set to make sense when it is
used. For instance, every raster image set must contain at least
the following three data objects:
* an image dimension record, which gives the width and height of
the corresponding image;
* raster image data, which consists of the pixel values that make
up the image;
* a raster image group, which lists all of the members in the set.
In addition to the required objects, there are optional data
objects that may be included in a set. A raster image set, for
instance, often contains a palette, or color lookup table, which
gives the red, green, and blue values to be associated with each
pixel in the raster image data.
Calling Interfaces for Sets
NCSA provides calling interfaces for all the HDF sets that it
supports. The primary purpose of these calling interfaces is to
provide libraries of routines for reading and writing the data that
is associated with each set. The libraries currently supported at
NCSA are callable from either C or Fortran programs.
In addition to the libraries, a growing number of command-line
utility routines are available for working with sets. For example,
a utility called r8tohdf is an HDF command that converts one or
more raw raster images to HDF 8-bit raster image set format.
NCSA supports calling interfaces for the following machines: Cray
(UNICOS), Silicon Graphics (UNIX), Sun (UNIX), Macintosh (MacOS),
and IBM PC (MS-DOS). The calling interfaces that are currently
available are described in the manual NCSA HDF Calling Interfaces
and Utilities.
Groups
An HDF set is a collection of HDF data objects in a file. Unless
some mechanism is used to identify explicitly those objects that
belong to a set, there is often no way to tie them together. This
problem is solved in HDF by means of groups. A group is a data
object that explicitly identifies all of the data objects in a set.
Since a group is a type of data object, its structure is like that
of any other data object. A group data identifier (tag/ref) points
to a data element that consists of the collection of data
identifiers that make up the corresponding set. A group tag can be
defined for any set. For instance, raster image group (RIG) is the
group tag used to group members of raster image sets; RIG data
consists of a list of all data identifiers that belong to a
particular raster image set.
Groups provide a convenient mechanism for. application programs to
locate all of the information that they need about a set.
Application programs that deal with RIGs, for instance, read all
of the elements in a RIG group, using only those that they need for
their application and ignoring the others.
Sample Groups
Suppose that the two images shown in Figure 1.5 are organized into
two sets with group tags. Since they are images, they may be stored
as RIG groups. Figure 4.1 illustrates the type of organization that
incorporates RIG groupings of these images.
Figure 4.1 Physical Organization of Sample RIG Grouping
Offset Contents
0 FH
4 DDH (10 OL)
10 DD (FID 1 130 4)
22 DD (FD 1 134 41)
34 DD (IP8 1 175 768)
46 DD (ID 1 943 4)
58 DD (RI 1 947 240000)
70 DD (ID 2 240947 4)
82 DD (RI 2 240951 240000)
94 DD (RIG 1 480951 12)
106 DD (RIG 2 480963 12)
118 DD (empty)
130 "sw3"
134 "solar wind simulation: third try. 8/8/88"
175 <data for image palette>
943 <data for 1st image dimension rec>: 400, 600
947 <data for 1st raster image>
240947 <data for 2nd image dimension rec>: 400, 600
240951 <data for 2nd raster image>
480951 tag/refs for 1st RIG: IP8/1, ID/1, RI/1
480963 tag/refs for 2nd RIG: IP8/1, ID/2, RI/2
The structure depicted in Figure 4.1 reflects the grouping of
raster image sets. This file contains the same raster image
information as the file in Figure 1.5, but the information is
organized into two sets and groups. Note that there is only one
palette (IP8/1) and it is included in both groups.
General Features of Groups
Figure 4.1 also illustrates a number of important general features
of groups:
* The contents of each set are consistent with one another. Since
the palette (IP8) is designed for use with 8-bit images, the
image must be an 8-bit image, rather than a 24-bit, 12-bit, or
other image.
* An application program can easily process all of the images in
the file by accessing the groups in the file. The non-RIG
information contained in the file can be used or ignored,
depending on the needs and capabilities of the application
program.
* There is usually more than one way to group sets. For example,
an extra copy of the image palette (IP8) could have been stored
in the file, so that each grouping would have its own image
palette. But in this instance that is not necessary because the
same palette is to be used with both images. On the other hand,
in this example there are two image dimension records (one per
group), even though one would suffice.
* Group status does not alter the fundamental role of HDF objects.
They are still accessible as individual data objects, despite the
fact that they also belong to raster image sets. In a very real
sense, the individual data elements are in the file, whether or
not there are groups that contain them.
RIGs provide an index showing what sets exist and what their
members are. There is nothing to prevent the imposition of other
groupings (indexes) that provide a different view of the same
collection of data objects. In fact, HDF is designed to encourage
the addition of alternate views, when appropriate.
Raster Image Sets
The raster image set (RIS) provides a framework for storing images
and any number of optional image descriptors. It provides for a
description of the image data layout, with the optional presence
of color look-up tables, aspect ratio, color correction, associated
matte or other overlay information, or any other data related to
the display of the image.
Raster Image Groups (RIGs)
Tying everything together is the raster image group (RIG), examples
of which were given earlier (Figure 4.1) A RIG contains a list of
data identifiers that point in turn to the data objects that
describe and make up the image.
The number of entries in a RIG is variable and the presence of most
of the description information is optional. Complex applications
can store data identifiers of image-modifying data, such as the
color table and aspect ratio, in the RIG along with the reference
to the image data itself. Simple applications can use simple
application level calls and ignore specialized video production or
film color correction parameters.
NCSA currently supports two calling interfaces, RIS8 and RIS24,
defined for the easy storage and retrieval of raster images using
RIGS. These interfaces are documented in the manual NCSA HDF
Calling interfaces and Utilities
Tags for Raster Image Sets
The tags presented in Table 4.1 must be fully supported by any
raster image set implementation.
Table 4.1 Tags for Raster Image Sets
Tag Contents of Data Element
RIG raster image group
ID image dimension record
RI raster image data
With full support for the above tags, images can be stored and read
from HDF files at any bit depth, with several different component
ordering schemes. As illustrated in Fig. 4.1, the RIG tag points
to a collection of the tag/refs that make up the RIG. The ID data
element identifies the dimensions of the image, the number type of
the elements that make up its pixels, the number of elements per
pixel, the interlace scheme used and the compression scheme used,
if any. The RI data element contains the actual raster image data.
*** INSERT FIGURE HERE ***
In addition to the required tags that define an image dataset, the
tags listed in Table 4.2 define color properties and other image
features. These tags are described fully in Appendix A.
Table 4.2 Additional Tags for Raster Image Sets
Tag Contents of Data Element
XYP XY position of image
LD look-up table dimension record
LUT color look-up table for non true-color Images
MD matte channel dimension record
MA matte channel data
CCN color correction factors
CFM color format designation
AR aspect ratio
MTO machine-type override
Fig. 4.2 illustrates the storage of a RIS that contains an image
palette (IP8), in addition to the required tags.
*** INSERT FIGURE HERE ***
Compression of Raster Images
Tags for two types of compression have been defined for raster
images. They are run-length encoding (RLE) and IMCOMP aerial
averaging (IMC). Others may be added at any time. Each encoding tag
is documented under its specific tag type (see Appendix A). Support
for RIG and RI does not require that all of the compression tag
types be supported. If you find an unknown compression type,
provide a suitable error message to the user.
Scientific Datasets
The scientific dataset (SDS) provides a framework for storing
multidimensional arrays of data, together with descriptive
information about the data. Current specifications support the
following types of numbers in SDS arrays.
* 8-bit, 16-bit and 32-bit signed and unsigned integers
* 32-bit and 64-bit floating point numbers
SDS numbers can be stored either as IEEE Standard integers or
floats or in the format used by the machine from which they were
written ("native mode").
Rank and dimension sizes may vary. A user interface exists for
storing and retrieving SDS. See the NCSA HDF manual for details.
Internal structures
For reasons having to do with backward compatibility, the group
structure that HDF uses for SDS is complicated. HDF 3.1 and
previous versions only supported 32-bit IEEE floating-point numbers
and Cray floating point numbers in' scientific data sets. HDF 3.2
and later releases support 8-bit, 16-bit, and 32-bit signed and
unsigned integers, and 32-bit and 64-bit floating-point numbers.
It also allows data sets to be written to HDF files in the local
machine format ("native mode"). Furthermore, it is anticipated that
later versions of HDF will support new number types and other
variations in the physical storage of scientific data, such as
compressed data.
The internal structure used to store SDS in HDF 3.1 and earlier
versions was not adequate to support the anticipated future changes
to SDS. A new structure had to be developed. At the same time, it
was important to try to retain compatibility with earlier versions
of the HDF library. Earlier versions of the library should be able
to read SDS written by HDF 3.2, if the SDS is "understandable" by
that earlier software, i.e. if the number type of the data is 32-
bit IEEE floating point or Cray floating point. Likewise, new
libraries (HDF 3.2 and beyond) should be able to recognize SDS
written by earlier versions of the library.
This compatibility is achieved by examining every SDS that is
written to an HDF file. If the SDS is compatible with older
libraries, it is written to the file using the old structure used
to represent SDS, as well as the new structure. If it is not
compatible with older libraries, only the newer structure is used.
The old structure for storing SDS is called SDG ("scientific data
group"). The newer structure is called NDG ("numeric data group").
Hence, SDS user interfaces in HDF3.2 and beyond handle three types
of numerical data groups:
1. SDG-created by old libraries and containing floating-point data.
2. NDG-created by the new library and containing non-floating-point
data. This data group should not be recognized by old libraries.
3. SDG-like NDG-created by the new library and containing IEEE
32-bit floating-point data only. The old libraries should be
able to recognize and interpret this kind of numerical data
groups correctly.
In the following sections, we described the SDG and NDG grouping
structures.
SDG structure
Scientific datasets represented internally by the SDG tag must
always contain at least the data objects listed in Table 4.3.
Table 4.3 Required Tags for SDG
Tag Contents of Data Element
SDG scientific data group
SDD scientific data dimension record for array-
stored data. It includes the rank (number of
dimensions) the size of each dimension, the
tag/ref's representing the number types of the
array-stored data and of each dimension.
In the case of SDG, the number types are all
32-bit IEEE floating-point values.
SD scientific data
The data objects presented in Table 4.4 are optional.
NCSA's SDS user interface supports these objects
Table 4.4 Optional Tags for SDG
Tag Contents of Data Element
SDS scales along the different dimensions to be
used when interpreting or displaying the data
(must be of type float32).
SDL labels for all dimensions and for the data.
Each of the dimension labels can be interpreted
as an independent variable, and the data label
as the dependent variable.
SDU units for all dimensions and for the data.
SDF format specifications to be used when
displaying values of the data.
SDM maximum and minimum values of the data (must be
of type float32).
SDC coordinate system to be used when interpreting
or displaying the data.
As illustrated in Fig. 4.3, the SDG tag points to a collection of
the tag/refs that make up the SDG.
*** INSERT FIGURE HERE ***
NDG structure
SDS represented internally by the NDG tag must always contain at
least the data objects listed in Table 4.5
Table 4. 5 Required Tags for NDG
Tag Contents of Data Element
NDG Numerical data group
SDD Scientific data dimension record for array-
stored data. It includes the rank (number of
dimensions), the size of each dimension, the
tag/ref's representing the number types of the
array-stored data and of each dimension.
In HDF 3.2 , the number types of dimension
scales are forced to be the same as the array-
stored data, but in later implementations each
dimension scale will be allowed its own type.
SD Scientific data.
NT Number type of the data set. Default of NT is
the value most recently set by DFSDsetNT(). If
no DFSDsetNT() was called previously, the
default will be set as floating-point.
The data objects presented in Table 4.6 are optional. NCSA's SDS
user interface in HDF 3.2 and later versions supports these
objects. Other optional objects can be added at any time.
Table 4.6 Optional Tags for NDG, HDF 3.2.
Tag Contents of Data Element
SDS scales along the different dimensions to be
used when interpreting or displaying the data..
SDL labels for all dimensions and for the data.
Each of the dimension labels can be interpreted
as an independent variable, and the data label
as the dependent variable.
SDU units for all dimensions and for the data.
SDF format specifications to be used when
displaying values of the data.
SDM maximum and minimum values of the data.
SDC coordinate system to be used when interpreting
or displaying the data.
As illustrated in Fig. 4.4, the NDG is identical to the SDG, except
that the NDG tag is different. This insures that older (pre-HDF
3.2) software cannot recognize this form of SDS.
*** INSERT FIGURE HERE ***
SDG-like NDG structure
An SDS written by HDF 3.2 or later that is compatible with earlier
SDS is represented internally by both an SDG and an NDG. Table 4.7
lists the objects that this group must always contain.
Table 4.7 Required Tags for NDG structure that is compatible with
SDG structure
Tag Contents of Data Element
NDG Numerical data group
SDG Scientific data group
SDLNK The NDG and SDG linked to the scientific data
set in this group.
SDD Scientific data dimension record for array-
stored data. It includes the rank (number of
dimensions), the size of each dimension, the
tag/ref's representing the number types of
the array-stored data and of each dimension.
In an SDG-like NDG the number types are all
32-bit IEEE floating-point values.
SD Scientific data
*** INSERT FIGURE HERE ***
Compatibility with future NDG structures
It is likely that future versions of SDS will support optional
features that are not supported by the current version. These
features fall into two general categories:
* optional-compatible features: optional features that are
compatible with older versions of HDF even though they may not
be supported by older versions of HDF.
* For example, suppose a new attribute such as a time stamp, is
added to SDS. Such an attribute would not be "understood" by
older libraries, but it would not render the SDS data unreadable
by the older libraries.
* Optional-incompatible features: optional new features that might
not be compatible with older versions of HDF in the sense that
they could render the data unreadable by older HDF libraries.
For example, suppose compression is added to SDS. Since some
older HDF libraries contain no compression routines, they would
not be able to read the compressed data correctly.
The scheme that has been developed to address this problem involves
numbering conventions for tags. The following conventions are used:
* Required tags. These tags are described in Tables 4.4 and 4.5.
All SDS must contain all of the tags in at least one of these
sets.
* Optional-compatible tags. These tags can have any valid tag
number except those in the other two categories.
* Optional-incompatible tags. A range of tags is defined for SDS
features that might render the dataset unreadable by older
versions of the library. This range has been specified as tag
numbers 780-799.
Vsets and vdatas
An HDF Vset is a logical grouping of HDF data objects within an
HDF file. Data organization within the file resembles the UNIX file
system in that it is basically hierarchical in structure and also
allows cross-linking of data objects. Unlike Scientific Data Sets
and Raster Image Sets, Vsets have no prespecified content or
structure. Users can use them to create structural relationships
among HDF objects according to their needs. Figure 4.6 illustrates
a Vset.
*** INSERT FIGURE HERE ***
A Vset is represented by a vgroup, an HDF object that contains
information about the members of the Vset. The vgroup tag is
VGDESCTAG. The VGDESCTAG record contains a list of the data
identifiers of its members, an optional user-specified name, an
optional user-specified class, and some fields that enable it to
be extended to contain more information. The VGDESCTAG is described
fully in Appendix A. A full treatment of Vsets can be found in the
manual "NCSA HDF Vset, Version 2.0".
An HDF object that is often used in connection with Vsets is the
vdata. A vdata is a table. The data in a vdata is organized into
fields. Each field is identified by a unique fieldname. The type
of each field may be any of the data types supported by the SDS
interface: 8-, 16-, and 32-bit integers (signed or unsigned), and
32- and 64-bit floats. Several fields of different types may exist
within a vdata. appendix A contains full descriptions of the vdata
tags (VSDESCTAG and VSDATATAG). A full treatment of vdatas can be
found in the manual "NCSA HDF vset, Version 2.0".
Chapter Appendix: The Raster-8 Set
The raster image set (RIS), as described above, is the set
currently supported by HDF for managing raster images. Before the
RIS was added to HDF, a simpler, less flexible set called the
raster-8 set was used for storing 8-bit raster images. This set is
no longer supported in the HDF software, although it may turn up
in some older HDF files. In fact, during the first three years that
RIS was used, the HDF software stored raster images in both RIS
and raster-8 sets.
Raster-8 Sets
The raster-8 set is a set of tags that provide the basic
information necessary to store 8-bit raster images in a data file
and display them accurately without prompting the user to supply
dimensions or color information. The raster-8 set consists of the
tags presented in Table 4.8.
Table 4.8 Tags for Raster-8 Sets
Tag Contents of Data Element
RI8 eight-bit raster image data
CI8 eight-bit raster image data compressed with
run-length encoding
II8 IMCOMP compressed image data
ID8 Image dimension record
IP8 Image palette data
If you develop software for processing raster-8 sets, it must
support RI8, ID8, and IP8. If you do not implement CI8 or II8, then
be sure to provide appropriate error indicators to higher layers
that might expect to find these tags.
Compatibility between Raster-8 and Raster Image Sets
In order to maintain backward compatibility with raster-8 sets,
raster image set interface has stored tag/refs for both types of
sets in HDF raster image files. For example, if an image is stored
as part of a raster image set, there was one copy each of the image
dimension data, image data, and palette data stored, but there were
two sets of tag/refs pointing to each data element, one from each
set. The image data, for instance, was associated with tag RI8 and
RI.
NOTE: Although this policy is continued in the current release (HDF
3.2), future plans call for phasing out the use of the raster-8
structure. Therefore, future software should not expect to find
both raster-8 and RIS structures supporting 8-bit raster images.
Only RIS structures will eventually be used exclusively.
Chapter 5 Annotations
Chapter Overview
Types of Annotations
File Annotations
Object Annotations
Getting Reference Numbers for Object Annotations
Chapter Overview
This chapter introduces and describes HDF objects that can be used
to annotate HDF files and HDF objects..
Types of Annotations
It is often useful to associate in text form information about an
HDF file and its data contents, and to keep that information in the
same file that contains the data. HDF provides this capability in
the form of annotations. An HDF annotation is a sequence of ASCII
characters that is associated its one of three types of objects:
(1) the file itself, (2) the individual HDF data objects in the
file, or (3) the tags that identify the data elements. The current
annotation interface supports only the first two types of
annotation. This interface is described in detail in the manual
NCSA RDF Calling Interfaces and Utilities..
Annotations are optionally supplied by a creator or user of an HDF
file or data object. Annotations come in two forms: labels, which
normally consist of short strings of characters, and descriptions,
which can be long and complex bodies of text.
Table 5.1 shows the types of annotations currently defined for HDF
files and their tag names.
Table 5.1 HDF Annotation tags
"Label" "Description"
File Annotations FID FD
Object Annotations DIL DIA
Tag Annotations TID TD
File Annotations
Any HDF file can have labels (FID) and descriptions (FD)stored in
them.. There are routines in the annotations interface specifically
designed for reading and writing file IDs and file descriptions.
Specifications for the tags FID and FD are given in Appendix A.
Object Annotations
The annotation of HDF data objects is complicated by the fact that
you have to uniquely identify the objects being annotated. Since
a data identifier (tag/ref) for a data object uniquely identifies
that object, the data object that a particular annotation refers
to can be identified by storing the object's tag and reference
number together with the annotation.
Note that an RDF annotation is itself a data object, so it has its
own DD. This DD has a tag and a ref. number, and it points to the
"data" that constitutes the annotation. The "data" that goes with
an annotation consists of three things: (1) the tag of the object
that it is an annotation for, (2) the ref of the object that it is
an annotation for, and (3) the annotation itself.
For example, suppose you have an HDF file that contains three
scientific datasets (SDS). Each SDS has its own DD consisting of
the SDS tag DFTAG-STG, and a unique reference number as illustrated
in Figure 5.1.
*** INSERT FIGURE HERE ***
Suppose you wish to annotate the second SDS by storing the
following annotation with it in the file: "Data from black hole
experiment 8/18/87." This text would be stored in an HDF file as
an annotation, and it would have stored with it the tag DFTAG-SDG
and reference number 4. Figure 5.2 illustrates how the annotation
would look in the file.
*** INSERT FIGURE HERE ***
Getting Reference Numbers for Object Annotations
Note that in order to use annotation routines, you need to know
the tags and reference numbers of the objects you wish to annotate.
Special routines are available for obtaining the reference numbers
of certain tags, including tags for SDSs, Raster Image Sets,
palettes, and annotations. These are: DFSD1astref, DFR81astref,
DFP1astref, and DFAN1astref. They return the most recent reference
number used in either reading or writing the corresponding data
object. Reference numbers for objects other than these can be
obtained with the routine Hfindnextref, a general purpose HDF
routine that searched through an HDF file for reference numbers
that go with a given tag. These routines are described and
illustrated in the manual "NCSA HDF Calling Interfaces and
Utilities."
Chapter 6 NCSA HDF Tags
Chapter Overview
The HDF Tag Space
Physical Storage Methods
Specifications of Supported Tags
Chapter Overview
This chapter addresses issues related to HDF tags and the data they
represent. The first section discusses some general information
about tags and their interpretation. The remainder of the chapter
contains a complete list of HDF tags that have been assigned by
NCSA as of version 3.2 of the library and a detailed discussion of
their specifications.
The HDF Tag Space
As discussed in the chapter entitled "The Basic Structure of HDF
Files," there are 16 bits allotted to an HDF tag number, providing
for 65535 possible tags ranging from 1 to 65535, with zero (0)
unused. This tag space is broken down into three ranges as shown
below.
1--32767 reserved for NCSA-supported tags
32768--64999 user-definable
65000--65535 reserved for expansion of the format
No restrictions are placed on the user-definable tags, but it
should be noted that tags from this range cannot be guaranteed to
be unique across all user-developed HDF applications. The rest of
this chapter will be devoted to the NCSA-supported tags in the
range 1 to 32767.
Physical Storage Methods
In previous versions of HDF, each data element was required to
occupy one contiguous block of space in a single file. But,
beginning with HDF Version 3.2, a mechanism was added to support
different methods of physical storage of data elements. The new
mechanism is called the "extended tag."
Any of the NCSA standard tags can take advantage of the new
features of the extended tags. Extended tags are automatically
recognized by the library and interpreted according to a
description record. The description record is a complete data
element unto itself which identifies the type of extended element
and provides the relevant parameters for retrieval of that element.
Currently, there are two types of extended tags, both of which
offer alternate methods of physical storage: linked block elements
and external elements.
Linked Block Elements
Linked block elements provide a convenient way of adding data to
a pre-existing element. They consist of a series of blocks of data
chained together in a linked list (similar to the DD list). In
general, the data blocks are of a uniform size. However, the first
block is considered a special case and is allowed to have a
different size from the rest of the blocks.
The description record for a linked block element begins with the
constant EXT_LINKED, which identifies the linked block storage
method. It also contains information about the organization of the
linked block element as a whole. Figure 6.1 shows a diagram of a
description record for a linked block element.
*** INSERT FIGURE HERE ***
<extended tag> any NCSA standard tag converted to an
extended tag (16-bit integer)
<ref no> reference number (16-bit integer)
EXT_LINKED constant identifying this as a linked
block description record (32-bit integer)
<length> length of entire element (32-bit integer)
<first len> length of the first data block (32-bit
integer)
<blk len> length of successive data blocks (32-bit
integer)
<num blk> number of blocks per block table (32-bit
integer)
<link ref> reference number of first block table
(16-bit integer)
The <link ref> field of-the description record gives the reference
number of the first linked block table for the element. This table
is identified by the tag DFTAG_LINKED and contains <num blk>
entries. There may be any number of linked block tables chained
together to describe a linked block element. Figure 6.2 shows a
diagram of a linked block table.
*** INSERT FIGURE HERE ***
<link ref> reference number for this block table
(16-bit integer)
<next ref> reference number for next block table
(16-bit integer)
<blk ref n> reference number for data block (16-bit
integer)
The <next ref> field contains the reference number of the next
linked block table. A value of zero (0) in this field indicates
that there are no additional linked block tables associated with
this linked block element.
The <blk ref n> fields of each linked block table contain reference
numbers for the individual data blocks that make up the data
portion of the linked block element. These data blocks are also
identified by the tag DFTAG_LINKED as shown in Figure 6.3. Although
it may seem ambiguous to use the same tag to refer to two different
objects, this ambiguity is alleviated by the context in which the
tags appear.
*** INSERT FIGURE HERE ***
<blk ref n> reference number for this data block
(16-bit integer)
<data block> block of actual data (size given by
<first len> or <blk len> from the
description record)
Linked block elements can be created using the function HLcreate(),
which is discussed in detail in the chapter "The NCSA HDF General
Purpose Interface."
External Elements
External elements allow the data portion of an HDF element to
reside in a separate file. The potential of external data elements
is largely unexplored in the HDF context, although other file
formats (most notably CDF) have used external data elements
apparently to great advantage.
Because there has been little discussion of external elements
within the HDF user community, the structure of these elements is
still not completely defined. Figure 6.4 shows a diagram of the
proposed structure for an external element.
*** INSERT FIGURE HERE ***
<extended tag> any NCSA standard tag converted to
an extended tag (16-bit integer)
<re no> reference number (16-bit integer)
EXT_EXTERN constant identifying this as an external
element description record (16-bit
integer)
<offset> location of the data within the external
file (32-bit integer)
<length> length in bytes of the data in the
external file (32-bit integer)
<filename> non-null terminated ASCII string
containing the name of the external file
in which the data resides (any length)
The description record for an external element begins with the
constant EXT_EXTERN, which identifies the external storage method.
It also contains information about how to find the element.
External elements can be created using the function HXcreate() ,
which is discussed in detail in the chapter "The NCSA HDF General
Purpose Interface."
Specifications of Supported Tags
The following pages contain the specifications of all the tags that
are officially supported as of HDF version 3.2. Each entry is to
be interpreted as follows:
* The word id capital letters on the left is the tag name.
* The three short lines at the beginning of each description
uniquely identify the tag:
The first line is the full name of the tag.
The second line describes the type and (where possible) the
amount of data in the corresponding data element. When the data
element is a variable-sized data structure-such as text, a
string, or a variable-sized array-the amount of data cannot be
specified exactly. Where possible, a formula is given for
estimating the amount of data. If the second line is "? bytes,
it means that neither the size nor the structure of the data
element can be specified.
The third line gives the tag number in decimal and (hexadecimal).
* Next is a diagram showing, as nearly as possible, the structure
of the tag and its associated data.
* Finally, a full specification of the tag is presented, including
a description of the data element and a discussion of its
intended use.
These listings are grouped approximately according to the roles
that the tags play under the headings Utility Tags, Annotation
Tags, Raster Image Tags, and so forth. These groupings imply a
general context for the use of each tag, but are not meant to
restrict the use of the tags to any particular context.
Please note that the subsection under the heading Obsolete Tags
contains the specifications for tags that have fallen out of use
with the continuing development of HDF. These tags are still
recognized by the HDF library, but it is not recommended that users
write out new objects using these tags, since some of them may
eventually be dropped from the HDF specification.
Utility Tags
DFTAG_NULL
No data
0 bytes
1 (0X0001)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer; always 0)
This tag is used for place holding and to fill empty portions of
the data description block. The length and offset fields (not
shown) of a NULL DD must be equal to zero.
DFTAG_VERSION
Library version number
12 bytes plus the length of a string
30 (0x001E)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<major> Major version number (32-bit integer)
<minor> minor version number (32-bit integer)
<release> release number (32-bit integer)
<string> non-null terminated ASCII string (any length)
The data portion of this tag gives the complete version number and
a descriptive string for the latest version of the HDF library to
write to the file.
DFTAG_NT
Number type
4 bytes
106 (0x006A)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<version> version number of NT information (8-bit integer)
<type> unsigned int, signed int, unsigned char, char, float,
double (8-bit code)
<width> number of bits (assumed all significant) (8-bit code)
<class> a generic value, with different interpretations depending
on type: floating-point, integer, or character (8-bit
code)
Some possible :values that may be included for each of the three
types in the field CLASS are listed in Table 6.1.
Table 6.1 Number Type Values
Type Possible Values
floats DFNTF_NONE
DFNTF_IEEE
DFNTF_VAX
DFNTF_CRAY
DFNTF_PC
DFNTF_CONVEX
ints DFNTI_MBO
DFNTI_IBO
DFNTI-VBO
chars ASCII
EBCDIC, BYTE
The number type flag is used by any other element in the file to
indicate specifically what a numeric value looks like other tag
types should contain a reference number pointer to an DFTAG_NT
instead of containing their own number type definitions.
The version field allows expansion of the number type information,
in case some future number types cannot be described using the
fields currently defined. Successive versions of the DFTAG_NT may
be substantially different from the current definition, however,
backward compatibility will be maintained. The current DFTAG_NT.
version number is 1.
DFTAG_MT
Machine type
0 bytes
107 (0x006B)
*** INSERT FIGURE HERE ***
<double> specifies method of encoding double precision floating
point (4-bit code)
<float> specifies method of encoding single precision floating
point (4-bit code)
<int> specifies method of encoding integers (4-bit code)
<char> specifies method of encoding characters (4-bit code)
The DFTAG_MT specifies that all unconstrained or partially
constrained values in this HDF file are of the default type for
that hardware. When the DFTAG_MT is set to VAX, for example, all
integers will be assumed to be in VAX byte order unless
specifically defined otherwise with a DFTAG NT. Note that all of
the headers and many tags, the whole raster image set for example,
are defined with bit-wise precision and will not be overridden by
the DFTAG_MT setting.
For DRTAG_MT, the reference field itself is the encoding of the
DFTAG_MT information. The reference field is 16 bits, taken as four
groups of four bits, specifying the types for double, float, int
and char respectively. This allows 16 generic specifications for
each type.
To the user, these will be defined constants in the header file
hdf.h, specifying the proper descriptive numbers for Sun, VAX,
Cray, Convex, and other computer systems. If there is no DFTAG_MT
in a file, the application may assume that the data in the file has
been written on the local machine--assuming any portability
problems are taken care of by the user. For this reason, we
recommend that all HDF files contain a DFTAG_MT for maximum
portability.
Possible data encodings are shown in Table 6.2.
Table 6.2 Possible Machine Types
Type Possible Encodings
double IEEE64, VAX64, CRAY128
floats IEEE32, VAX32, CRAY64
ints VAX32, Intell6, Intel32, Motorola32, CRAY64
chars ASCII, EBCDIC
New encodings can be added for each data type, as the need arises.
DFTAG_FID
File identifier
string
100 (0x0064)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<character string> non-null terminated ASCII text (any length)
This tag points to a string which the user wants to associate with
this file. The string is not null terminated. The string is
intended to be a user-supplied title for the file.
DFTAG_FD
File description
text
101 (0x0065)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<text block> non-null terminated ASCII text (any length)
This tag points to a block of text describing the overall file
contents. The text can be any length. The block is not null
terminated. The text is intended to be user-supplied comments about
the file.
DFTAG_TID
Tag identifier
string
102 (0x0066)
*** INSERT FIGURE HERE ***
<tag> tag number to which this tag refers (16-bit
integer)
<character string> non-null terminated ASCII text (any length)
The data for this tag is a string that identifies the functionality
of the tag indicated in the space normally used for the reference
number. For example, the tag identifier for DFTAG_TID might point
to data that reads "tag identifier."
Many tags are identified in the HDF specification, so it is usually
unnecessary to include their identifiers in the HDF file. But with
user-defined tags or special-purpose tags, the only way for a human
reader to diagnose what kind of data is stored in a file is to read
tag identifiers. Use tag descriptions to define even more detail
about your user-defined tags.
Note that with this tag you may make use of the user-defined tags
to check for consistency. Although two persons may use the same
user-defined tag, they probably will not use the same tag
identifier.
DFTAG_TD
Tag description
text
103 (0x0067)
*** INSERT FIGURE HERE ***
<tag> tag number to which this tag refers (16-
bit integer)
<text block> non-null terminated ASCII text (any length)
The data for this tag is a text block which describes in relative
detail the functionality and format of the tag which is indicated
in the space normally occupied by the reference number. This tag
is mainly intended to be used with user-defined tags and provides
a medium for users to exchange files that include human-readable
descriptions of the data.
It is important to provide everything that a programmer might need
to know to read the data from your user-defined tag. At the
minimum, you should specify everything you would need to know in
order to retrieve your data at a later date if the original program
were lost.
DFTAG_DIL
Data identifier label
string
104 (0x0068)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<obj tag> tag number of the data to which this label
applies (16-bit integer)
<obj ref no> reference of the data to which this label
applies (16-bit integer)
<character string> non-null terminated ASCII text (any length)
The data for this tag is a data identifier, made up of a tag and
reference number, followed by a string that the user wants to place
in the file. The purpose of this tag is to associate the string
with the data identifier as a label for whatever that data
identifier refers to in turn.
By including DFTAG_DILs, you can give a data object a label for
future reference. For example, DFTAG_DIL is often used to give
titles to images.
DFTAG_DIA
Data identifier annotation
text
105 (0x0069)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<obj tag> tag number of the data to which this annotation
applies (16-bit integer)
<obj ref no> reference of the data to which this annotation
applies (16-bit integer)
<text block> non-null terminated ASCII text (any length)
The data for this tag is a data identifier, which is made up of a
tag and a reference number, followed by a text block that the user
wants to place in the file. Its purpose is to associate the text
block with the data identifier as an annotation for whatever that
data identifier points to in turn.
With DFTAG_DIA, any data object can have a lengthy, user-written
description of why that data is in the file. This can be used to
include user comments about images, datasets, source code, and so
forth.
Compression Tags
DFTAG_RLE
Run length encoded data
0 bytes
11 (0X000B)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
This tag is used in the compression field of a DFTAG_ID and other
places to indicate that an image or section of data is encoded with
a run-length encoding scheme. The RLE method used is byte-wise.
Each run is preceded by a count byte. The low seven bits of the
count byte indicate the number of bytes (n). The high bit of the
count byte indicates whether the next byte should be replicated n
times (high bit=1), or whether the next n bytes should be included
as is (high bit=0).
See also: DFTAG_ID (General Raster Image Tags)
DFTAG_NDG (Scientific Dataset Tags)
DFTAG_IMC
IMCOMP compressed data
0 bytes
12 (0X000C)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
This tag is used in the ID compression field and other places to
indicate that an image or section of data is encoded with an IMCOMP
encoding scheme. This scheme is a 4:1 aerial averaging method which
is easy to decompress. It counts color frequencies in 4x4 squares
to optimize color sampling.
See also: DFTAG_ID (General Raster Image Tags)
DFTAG_NDC (Scientific Dataset Tags)
DFTAG_JPEG
24-bit JPEG compression information
? bytes
13 (0X000D)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
This tag points to header information for 24-bit JPEG compressed
images. The data in this tag is identical to the data stored in a
JFIF (JPEG File Interchange Format) file up to the Start-of-Frame
parameter (see the JFIF format document for further details). The
Start-of-Frame parameter and all further data for the JPEG image
is stored the in associated DFTAG_CI data element which is the
companion to the DFTAG_JPEG element.
DFTAG_GREYJPEG
8-bit JPEG compression information
? bytes
14 (0X000E)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
This tag points to header information for 8-bit JPEG compressed
images. The data in this tag is identical to the data stored in a
JFIF (JPEG File Interchange Format) file up to the Start-of-Frame
parameter (see the JFIF format document for further details). The
Start-of-Frame parameter and all further data for the JPEG image
is stored the in associated DFTAG-CI data element which is the
companion to the DFTAG-JPEG element.
General Raster Image Tags
DFTAG_RIG
Raster image group
n*4 bytes (where n is the number of data objects in the group.)
306 (0x0132)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<tag n> tag number for nth member of the group (16-bit
integer)
<ref n> reference number for nth member of the group
(16-bit integer)
The raster image group (RIG) data is a list of data identifiers
(tag/ref) that describe a raster image. All of the members of the
group are required in order to display the image correctly.
Application programs that deal with RIGs should read all the
elements of a RIG and process those identifiers which it can
display correctly. Even if the application cannot process all of
the tags, the tags that it can process will be usable.
Tag types that may appear in a RIG are listed in Table 6.3.
Table 6.3 Possible Tag Types in an RIG
Tag Description
DFTAG_ID Image dimension
DFTAG_RI raster image
DFTAG_XYP X-Y position
DFTAG_LD LUT dimension
DFTAG_LUT color lookup table
DFTAG_MD matte channel dimension
DFTAG_MA matte channel
DFTAG_CCN color correction
DFTAG_CFM color format
DFTAG_AR aspect ratio
Example
ID, RI, LD, LUT
An image dimension record, the raster image, an LUT dimension and
the LUT go together. The application reads the image dimensions,
then reads the image with those dimensions. It also reads the
lookup table according to its dimensions and displays the
corresponding image.
DFTAG_ID, DFTAG_LD, DFTAG_MD
Image dimension LUT dimension Matte dimension
20 bytes 20 bytes 20 bytes
300 (0x012C) 307 (0x0133) 308 (0x0134)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<x dim> length of x (horizontal) dimension (32-bit integer)
<y dim> length of y (vertical) dimension (32-bit integer)
<NT ref> reference number of number type information for
associated object
<elements> number of elements that comprise one entry (16-bit
integer)
<interlace> defines type of interlacing used (16-bit integer)
<comp tag> tag which tells the type of compression used and any
associated parameters (16-bit integer)
<comp ref> reference number of compression tag (16-bit integer)
The three dimension records have exactly the same format. They
define the dimensions of the 2D array to which they refer. The
diagram above pictures a DFTAG_ID for illustration. A DFTAG_ID
specifies the dimensions of a DFTAG_RI, DFTAG_LD specifies the
dimensions of a DFTAG_LUT, and DFTAG_HD specifies the dimensions
of a DRTAG_MA.
For example, a 512x256 row-wise 24-bit raster image with each pixel
stored as RGB bytes would have the following values:
<x dim>: 512
<y dim>: 256
<NT ref> UINT8
<elements> 3 (3 elements per pixel: e.g., R,G and B)
<interlace> 0 (RGB values not separated)
<comp tag> 0 (no compression is used)
DFTAG_RI
Raster image
xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize
are given by the corresponding DFTAG_ID)
302 (0x012E)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
This tag points to raster image data. It is stored in row-major
order and must be interpreted as specified in a DFTAG_ID:
<interlace>=0 means the components of each pixel are together.
<interlace>=1 means color elements are grouped by scan lines.
<interlace>=2 means color elements are grouped by planes.
DFTAG_LUT
Lookup table
xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize
are given by the corresponding DFTAG_ID)
301 (0x012D)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<Pn m> Mth value of parameter n (size is given by the
DFTAG_NT in the corresponding DFTAG_LD)
The DFTAG-LUT, sometimes called a palette, is used by many kinds
of hardware to assign colors to data values. When a raster image
consists of data values which are going to be interpreted through
hardware with a LUT capability, the DFTAG_LUT should be loaded
along with the image.
The most common lookup table is the RGB lookup table which will
have X dimension-256 and Y dimension-1 with three elements per
entry, one each for red, green, and blue. The interlace will be
either 0, where the LUT values are given RGB, RGB, RGB ..., or 1,
where the LUT values are given as 256 reds, 256 greens, 256 blues.
DFTAG_MA
matte channel
xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize
are given by the corresponding DFTAG_ID)
309 (0x0135)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
The DFTAG_MA contains transparency data which can be used to
facilitate the overlaying of images. The data consist of a
two-dimensional array of unsigned 8-bit integers ranging from 0 to
255. Each point in a DFTAG-MA indicates the transparency of the
corresponding point in a raster image of the same dimensions. A
value of 0 indicates that the data at that point is to be
considered totally transparent, while a value of 255 indicates that
the data at that point is totally opaque. It is assumed that a
linear scale applies to the transparency values, but users may opt
to interpret the data in any way they wish.
DFTAG_CCN
Color correction
52 bytes (usually)
310 (0x0136)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<gamma> gamma parameter (32-bit IEEE float)
<red x/y/z> red x/y/z correction factors (32-bit IEEE floats)
<green x/y/z> green x/y/z correction factors (32-bit IEEE floats)
<blue x/y/z> blue x/y/z correction factors (32-bit IEEE floats)
<white x/y/z> white x/y/z correction factors (32-bit IEEE floats)
Color correction specifies the Gamma correction for the image and
color primaries for the generation of the image.
DFTAG_CFM
Color format
string
311 (0x0137)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<character string> non-null terminated ASCII string (any length)
The color format is a clue to how each element of each pixel in a
raster image can be interpreted. It is defined to be a string which
is in all caps, and is one of the values shown in Table 6.4.
Table 6.4 Color Format String Values
String Description
VALUE pseudo-color, or just a value associated with the pixel
RGB red, green, blue model
XYZ color-space model
HSV hue, saturation, value model
HSI hue, saturation, intensity
SPECTRAL spectral sampling method
DFTAG_AR
Aspect ratio
4 bytes
312 (0x0138)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<ratio> ratio of width to height (32-bit IEEE float)
The data for this tag is the visual aspect ratio for this image.
The image should be visually correct if displayed on a screen with
this aspect ratio. The data consists of one floating-point number
which represents width divided by height. An aspect ratio of 1.0
indicates a display with perfectly square pixels; 1.33 is a
standard aspect ratio used by many monitors.
Composite Image Tags
DFTAG_DRAW
Draw
n*4 bytes (where n is the number of data objects that comprise the
composite image.)
400 (0x0190)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<tag n> tag number of the nth member of the draw list
(16-bit integer)
<ref n> reference number of the nth member of the draw list
(16-bit integer)
The data for this tag is a list of data identifiers (tag/ref pairs)
which define a composite image. Each member of the DRTAG_DRAW data
should be displayed, in order, on the screen. This can be used to
indicate several RIGs which should be displayed simultaneously, or
even include vector overlays, like DRTAG_T14, which should be
placed on top of a RIG.
Some of the elements in a DRAW list may be instructions about how
images are to be composited (XOR, source put, anti-aliasing, etc.).
These are defined as individual tags.
DFTAG_XYP
XY position
8 bytes
500 (0x01F4)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<x> x-coordinate (32-bit integer)
<Y> y-coordinate (32-bit integer)
A DFTAG_XYP is used in composites-and other groups to indicate an
XY position on the screen. For this, (0,0) is the lower left, X is
the number of pixels to the right along the horizontal axis and Y
is the number of pixels on the vertical axis. The X and Y pixel
dimensions are given as two 32-bit integers.
For example, if DFTAG_XYP is present inside a DFTAG_RIG, the
DFTAG_XYP refers to the position of the lower left corner of the
raster image on the screen.
See also: DFTAG_DRAW (this section)
Vector Image Tags
DFTAG_T14
Tektronix 4014
? bytes
602 (0x25A)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
This tag points to a Tektronix 4014 data stream. The bytes in the
data field, when read and sent to a Tektronix 4014 terminal, will
display a vector image. Only the lower seven bits of each byte are
significant. There are no record markings or non-Tektronix codes
in the data.
DFTAG_T105
Tektronix 4105
? bytes
603 (0x25B)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
This tag points to a Tektronix 4105 data stream. The bytes in the
data field, when read and sent to a Tektronix 4105 terminal, will
be displayed as a vector image. Only the lower seven bits of each
byte are significant. Some terminal emulators will not correctly
interpret every feature of the Tektronix 4105 terminal, so you may
wish to use only a subset of the possible Tektronix 4105 vector
commands.
Scientific Dataset Tags
DFTAG_NDG
Numeric data group
n*4 bytes (where n is the number of data objects in the group.)
720 (0x02D0)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<tag n> tag number of nth member of the group (16-bit
integer)
<ref n> reference number of nth member of the group (16-bit
integer)
The numeric data group (NDG) data is a list of data identifiers
(tag/ref pairs) that describe a scientific dataset. It supercedes
the old DFTAG_SDG, which has been obsoleted as of version 3.2 of
the HDF library. A more complete explanation of the relationship
between DFTAG_NDG and DFTAG_SDG can be found in the chapter
entitled "Sets and Groups."
All of the members of the group provide information for correctly
interpreting and displaying the data. Application programs that
deal with NDGs should read all of the elements of a NDG and process
those identifiers which it can use. Even if an application cannot
process all of the tags, the tags that it can understand will be
usable.
Tag types that may appear in a DFTAG_NDG are listed in Table 6.5.
Table 6.5 Possible Tag Types in an NDG
Tag Description
DFTAG_SDD scientific data dimension record (rank and dimensions)
DFTAG_SD scientific data
DFTAG_SDS scales
DFTAG_SDL labels
DFTAG_SDU units
DFTAG_SDF formats
DFTAG_SDM maximum and minimum values
DFTAG_SDC coordinate system
DFTAC_CAL calibration information
DFTAG_FV fill value
DFTAG_LUT color lookup table
DFTAG_LD lookup table dimension record
DFTAG_SDLNK link to old-style DFTAG_SDG (See Sets and Groups)
Example
DFTAG_SDD, DRTAG_SD, DRTAG_SDM
A dimension record, the scientific data, and the maximum and
minimum values of the data go together. The application reads the
rank and dimensions from the dimension record, then reads the data
array with those dimensions. If it needs maximum and minimum, it
also reads them.
See also: Sets and Groups
DFTAG_SDD
Scientific data dimension record
6 + 8*rank bytes
701 (0x02BD)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<rank> number of dimensions (16-bit integer)
<dim n> number of values along the nth dimension
(32-bit integer)
<data NT ref> reference number of DFTAG_NT for data
(16-bit integer)
<scale NT ref n> reference number for DFTAG-NT for the
scale for the nth dimension (16-bit
integer)
This record defines the rank and dimensions of the array in the
scientific dataset. For example, a DFTAG_SDD for a 500X600X3 array
of floating-point numbers would have the following values and
components.
Rank: 3
Dimensions: 500, 600, and 3.
One data NT
Three scale NTs
DFTAG_SD
Scientific data
NTsize*x*y*z* ... bytes (where NTsize is the size of the data NT
given by the corresponding DFTAG_SDD and x, y, z, etc. are the
dimension sizes)
702 (0x02BE)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
This tag points to an array of scientific data. The type of the
data may be specified by an DFTAG_NT included with the SDG. If
there is no DFTAG_NT, the type of the data is floating-point in
standard IEEE 32-bit format. The rank and dimensions must be stored
as specified in the corresponding DFTAG_SDD. The diagram above
shows a three-dimensional data array.
DFTAG_SDS
Scientific data scales
rank + NTsizeO*x + NTsize1*y +NTsize2*z +... bytes (where rank is
the number of dimensions, x, y, z, etc. are the dimension sizes,
and NTsize# are the sizes of each scale NT from the corresponding
DFTAG_SDD.)
703 (0x02BF)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<is n> tells whether a scale exists for the nth dimension
(8-bit integer; 0 or 1)
<scale n> list of scale values for the nth dimension (type is
given by corresponding DFTAG_SDD)
This tag points to the scales for the dataset. The first n bytes
indicate whether there is a scale for the corresponding dimension
(1=yes, 0=no). This is followed by the scale values for each
dimension. The scale consists of a simple series of values, where
the number of values and their types are given by the corresponding
DFTAG_SDD.
DFTAG_SDL
Scientific data labels
? bytes
704 (0x02C0)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<label n> null terminated ASCII string (any length)
This tag points to a list of labels for the data and each dimension
of the dataset. Each label is a string terminated by a null byte
(0).
DFTAG_SDU
scientific data units
? bytes
705 (0x02C1)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<unit n> null terminated ASCII string (any length)
This tag points to a list of strings specifying the units for the
data and each dimension of the dataset. Each unit's string is
terminated by a null byte (0).
DFTAG_SDF
Scientific data format
? bytes
706 (0x02C2)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<format n> null terminated ASCII string (any length)
This tag points to a list of strings specifying an output format
for the data and each dimension of the dataset. Each format string
is terminated by a null byte (0).
DFTAG_SDM
Scientific data max/min
8 bytes
707 (0x02C3)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<max> maximum value (type is given by the data NT in the
corresponding DFTAG_SDD)
<min> minimum value (type is given by the data NT in the
corresponding DFTAG_SDD)
This record contains the maximum and minimum data values in the
dataset. The type of <max> and <min> are given by the data NT of
the corresponding DFTAG_SDD.
DFTAG_SDC
Scientific data coordinates
? bytes
708 (0x02C4)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<string> null terminated ASCII string (any length)
This tag points to a string specifying the coordinate system for
the dataset. The string is terminated by a null byte.
DFTAG_SDLNK
Scientific dataset link
8 bytes
710 (0x02C6)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
DFTAG_NDG NDG tag (16-bit integer)
<NDG ref> reference number of NDG (16-bit integer)
DFTAG_SDG SDG tag (16-bit integer)
<SDG ref> reference number of SDG (16-bit integer)
The purpose of this tag is to link together an old-style DFTAG_SDG
and a DFTAG_NDG in cases where the NDG contains 32-bit floating
point data and is, therefore, equivalent to an old SDG. A complete
description of the use of this tag can be found in the chapter
entitled "Sets and Groups"
See also: Sets and Groups
DFTAG_CAL
Calibration information
36 bytes
731 (0x02DB)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<cal> calibration factor (64-bit IEEE float)
<cal err> error in calibration factor (64-bit IEEE float)
<off> calibration offset (64-bit IEEE float)
<off err> error in calibration offset (64-bit IEEE float)
<data type> constant representing the effective data type of the
calibrated data (32-bit integer)
This tag points to a calibration record for the associated
DFTAG_SD. The data can be calibrated by first multiplying by the
<cal> factor, then adding the <off> value. Also included in the
record are errors for the calibration factor and offset and a
constant indicating the effective data type of the calibrated data.
Possible values of <data type> are shown in Table 6.6.
Table 6.6 Possible calibrated data types
Data Type Description
INT8 signed 8-bit integer
UINT8 unsigned 8-bit integer
INT16 signed 16-bit integer
UINT16 unsigned 16-bit integer
INT32 signed 32-bit integer
UINT32 unsigned 32-bit integer
FLOAT32 32-bit float
FLOAT64 64-bit float
DFTAG_FV
Fill value
? bytes (size given by size of data NT in corresponding DFTAG_SDD)
732 (0x02DC)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<fill value> value representing unset data in the corresponding
DFTAG_SD (size given by size of data NT in
corresponding DFTAG_SDD)
This tag points to a value which has been used to indicate unset
values in the associated DFTAG_SD. The number type of the value
(and, therefore, its size) is given in the corresponding DFTAG_SDD.
Vset DFTAG_VG
Vgroup
14 + 4*nelt + namelen + classlen bytes (where nelt, namelen, and
classlen are given below)
1965 (0x07AD)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<nelt> number of elements in the vgroup (16-bit integer)
<tag n> tag of the nth member of the vgroup (16-bit integer)
<ref n> reference number of the nth member of the vgroup
(16-bit integer)
<namelen> length of the name field (16-bit integer)
<name> non-null terminated ASCII string (length given by
<namelen>)
<classlen> length of the class field (16-bit integer)
<class> non-null terminated ASCII string (length given by
<classlen>)
<extag> extension tag (16-bit integer)
<exref> extension reference number (16-bit integer)
<version> version number of DFTAG_VG information (16-bit
integer)
<more> unused (2 zero bytes)
The DFTAG_VG provides a general-purpose grouping structure which
can be used to impose a hierarchical structure on the tags in the
group. Any HDF tag may be incorporated into a vgroup (including
other DFTAG_VGS).
For more information about Vsets, see the chapter entitled "HDF
Vsets"
DFTAG_VH
Vdata description
22 + 10*nfields + Sfldnmlen n + namelen + classlen bytes (where
nfields, fldnmlen n, namelen, and classlen are given below)
1962 (0x07AA)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<interlace> constant indicating interlace scheme used (16-bit
integer)
<nvert> number of entries in vdata (32-bit integer)
<ivsize> size of one vdata entry (16-bit integer)
<nfields> number of fields per entry in the vdata (16-bit
integer)
<type n> constant indicating the data type of the nth field
of the vdata (16-bit integer)
<isize n> size in bytes of the nth field of the vdata (16-bit
integer)
<offset n> offset of the nth field within the vdata (16-bit
integer)
<order n> ??? of the nth field of the vdata (16-bit integer)
<fldnmlen n> length of the nth field name string (16-bit integer)
<fldnm n> non-null terminated ASCII string (length given by
corresponding <fldnmlen>)
<namelen> length of the name field (16-bit integer)
<name> non-null terminated ASCII string (length given by
<namelen>)
<classlen> length of the class field (16-bit integer)
<class> non-null terminated ASCII string (length given by
<classlen>)
<extag> extension tag (16-bit integer)
<exref> extension reference number (16-bit integer)
<version> version number of DFTAG_VH information (16-bit
integer)
<more> unused (2 zero bytes)
DFTAG_VE provides all the information necessary to process a
DFTAG_VS.
For more information on Vsets, see the chapter entitled "HDF
Vsets."
See also: DFTAG_VS (this section)
DFTAG_VS
Vdata
nvert * Sisize n bytes (where nvert, and isize n are given by the
corresponding DFTAG_VH)
1963 (0x07AB)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<vdata> data block interpreted according to the
corresponding DFTAG_VH (nvert * Sisize n bytes,
where nvert, and isize are given by the
corresponding DFTAG_VH)
DFTAG_VS contains a block of data which is to be interpreted
according to the information in the corresponding DFTAG_VR.
For more information on Vsets, see the chapter entitled "HDF
Vsets."
See also: DFTAG_VE (this section)
Obsolete Tags
DFTAG_ID8
Image dimension-8
4 bytes
200 (0x00C8)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<x dim> length of x dimension (16-bit integer)
<y dim> length of y dimension (16-bit integer)
The data for this tag consists of two 16-bit integers representing
the width and height of an 8-bit raster image in bytes.
This tag has been superceded by DFTAG_ID.
DFTAG_IP8
Image palette-8
768 bytes
201 (0x00C9)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
table entries 256 triples of 8-bit integers.
The data for this tag can be thought of as a table of 256 entries,
each containing one value for red, green, and blue. The first
triple is palette entry 0 and the last is palette entry 255.
This tag has been superceded by DFTAG_LUT.
DFTAG_RI8
Raster image-8
xdim*ydim bytes (where xdim and ydim are the dimensions given by
the corresponding DFTAG_ID8.)
202 (0X00CA)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
image data 2-d array of 8-bit integers
The data for this tag is a row-wise representation of the
elementary 8-bit image data. The data is stored width-first (hence
row-wise) and is 8 bits per pixel. The first byte of data
represents the pixel in the upper-left hand corner of the image.
This tag has been superceded by DFTAG_RI.
DFTAG_CI8
Compressed image-8
? bytes
203 (0X00CB)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<compressed image> series of run-length encoded bytes
The data for this tag is a row-wise representation of the
elementary 8-bit image data. Each row is compressed using the
following run-length encoding where n is the lower seven bits of
the byte. The high bit represents whether the following n character
will be reproduced exactly (high bit-0) or whether the following
character will be reproduced n times (high bit=1) . Since DFTAG_CI8
and DFTAG_Rl8 are basically interchangeable, it is suggested that
you not have a DFTAG_CI8 and a DFTAG_RI8 that have the same
reference number.
This tag has been superceded by DFTAG_RLE.
DFTAG_II8
IMCOMP image-8
? bytes
204 (0X00CC)
*** INSERT FIGURE HERE ***
The data for this tag is a 4:1 compressed 8-bit image, using the
IMCOMP compression scheme.
This tag has been superceded by DFTAG_IMC.
DFTAG_SDG
Scientific data group
n*4 bytes (where n is the number of data objects in the group.)
700 (0x02BC)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
<tag n> tag number of nth member of the group (16-bit
integer)
<ref n> reference number of nth member of the group (16-bit
integer)
The scientific data group (SDG) data is a list of data identifiers
(tag/ref pairs) that describe a scientific dataset. All of the
members of the group provide information for correctly interpreting
and displaying the data. Application programs that deal with SDGs
should read all of the elements of a SDG and process those
identifiers which it can use. Even if an application cannot process
all of the tags, the tags that it can understand will be usable.
Tag types that may appear in a DFTAG-SDG are listed in Table 6.7.
Table 6.7 Possible Tag Types in an SDG
Tag Description
DFTAG_SDD scientific data dimension record (rank and dimensions)
DFTAG_SD scientific data
DFTAG_SDS scales
DFTAG_SDL labels
DFTAG_SDU units
DFTAG_SDF formats
DFTAG_SDM maximum and minimum values
DFTAG_SDC coordinate system
DFTAG_SDT transposition (obsolete)
DFTAG_SDLNK link to new DFTAG_NDG (see Sets and Groups)
Example
DFTAG_SDD, DFTAG_SD, DFTAG_SDM
A dimension record, the scientific data, and the maximum and
minimum values of the data go together. The application reads the
rank and dimensions from the dimension record, then reads the data
array with those dimensions. If it needs maximum and minimum, it
also reads them.
This tag has been superceded by DFTAG_NDG.
See also: Sets and Groups
DFTAG_SDT
Scientific data transpose
0 bytes
709 (0x02C5)
*** INSERT FIGURE HERE ***
<ref no> reference number (16-bit integer)
The presence of this tag in a group indicates that the data pointed
to by the corresponding DFTAG_SD is in column-major order, instead
of the default row-major order. No data is associated with this
tag.
This tag will no longer be written by the HDF library, but if it
is encountered in an old file it will be interpreted as originally
intended.
Chapter 7 Making HDF Portable
Chapter Overview
The HDF Environment
Machines Supported
Language Standards
Organization of Source Files
Header Files
Source Code Files
Passing Strings Between FORTRAN and C
Passing Strings from FORTRAN to C
Passing Strings from C to FORTRAN
Function Return Values between FORTRAN and C
Differences in Acceptable Routine Names
Case Sensitivity
How HDF Deals with "All-Upper Case" Compilers
Appended Underscore
How HDF Specifies the Appended (and Prepended) Underscore
Short Names vs. Long Names
ANSI C vs. Old C
Type Differences
Size Differences
Number Representation
Byte-order and Structure Representations
Access to Library Functions
Chapter Overview
The NCSA implementation of HDF is accessible to both C and FORTRAN
programs and is implemented on many different machines and several
operating systems. There are important differences between C and
FORTRAN, as well as between different implementations of each
language, especially FORTRAN. There are also important differences
between the different machines and operating systems that HDF
supports. This chapter describes many of these differences,
problems and issues associated with them, and methods employed in
the HDF source code to deal with them.
The HDF Environment
The list of machines and operating systems on which HDF is
implemented is steadily growing. For reasons that should soon be
clear, the number of platforms on which HDF is officially supported
is growing slowly. Every time a new platform is added to the list
of those that HDF supports, additional code must be written that
takes into account the way memory is organized, the way the
operating system works, the way numbers are represented, the way
the file system works, and the way FORTRAN and C works on that
system.
Machines Supported
As of this writing, the following platforms are supported by NCSA's
HDF group:
Cray X-MP and Cray 2 (UNICOS)
Sun Systems' Sun 3, Sun 386, and Sparcstation (Unix)
Convex (Unix),
Macintosh (MPW Shell)
IBM PC (MS-DOS)
Silicon Graphics (Unix)
Vax (VMS)
HP 9000 (HPUX)
DecStation (Ultrix)
IBM RT (Unix)
In addition to these platforms, HDF has been ported to many other
platforms for which support cannot currently be provided. These
include Alliant, Apollo (Domain), HP 3000, Stellar, Amiga,
Symbolics, NeXT, and IBM 3090 (MVS).
Language Standards
Unfortunately, not all compilers are the same. FORTRAN compilers
often differ in the ways they pass parameters, in the identifier
naming conventions they employ, and in the number types that they
support. Similarly, though generally not as drastically, compilers
differ in the number types that they support and in their adherence
to the ANSI C standard.
In order to keep these differences to a minimum, the primary
dialects used for the source code in the NCSA implementation of RDF
FORTRAN 77, ANSI C, and "old style C"(1), hereafter referred to as
"old C". There are very few platforms whose C and FORTRAN compilers
do not adhere to at least one of these standards. When time and
resources permit, attempts are also made to support features or
variations in other dialects of C and FORTRAN, particularly on
those platforms that are important to NCSA users. Much of the
remainder of this Chapter speaks to these differences.
Follow these guidelines
To all future HDF developers, we cannot overstress the importance
of following the guidelines outlined in this Chapter. It may take
longer to write code, and it may be considerably more difficult to
adapt your coding style to that given here, but the long-term
benefits in terms of portability and maintenance costs are well
worth the effort.
Organization of Source Files
There are three types of files in the HDF source code directory:
header files, source code files, and a makefile. Header files and
source code files are organized by application area. All of the
functions that apply to a particular application area are stored
in three source files, and all definitions and declarations that
apply are stored in a corresponding header file. The makefile
describes the dependencies among the source and header files, and
also provides the commands required to compile the corresponding
libraries and utilities.
Header Files
There is one header file for each application area. The HDF Raster
Image Set interface, for example, has the header file dfr8.h. It
contains definitions and declarations that are unique to the
interface.
(1) "old style C" refers to the version of C described in the first
edition of The C Programming Language, by Brian Kernighan and
Dennis Ritchie, published by Prentice-Hall.
Other header files include:
hdf.h
hdfi.h
hproto.h
constants.f
functions.f
hdf.h and hdfi.h.(1) The file hdf.h contains declarations and
definitions for the common data structures used throughout HDF,
definitions of the HDF tags, definitions of error numbers, and
definitions and declarations specific to the general purpose
interface. Since hdf.h depends on hdfi.h, it includes (via
#include) hdfi.h.
The file hdfi.h contains a large amount of information specific to
the various computing environments supported by HDF. Those
environmental parameters that need to be set to particular values
when compiling the HDF library are contained in hdfi.h. Machine
dependent definitions of such things as number types and macros
for reading and writing numbers are also included in hdfi.h.
When porting HDF to a new system, only hdfi.h and the Makefile need
to be modified.
Normally it is a good idea to include hdf.h (and therefore
indirectly hdfi.h) in user programs, though users usually need not
be aware of their contents.
hproto.h. This file contains ANSI C prototypes for all HDF C
routines, and must be include in ANSI-conforming C programs that
make calls to HDF routines.
constants.f. This file is for use in FORTRAN programs. It contains
important constants, such as tag values, that are defined in hdf.h.
Systems that have FORTRAN preprocessors might be able to include
these files via #include statements or their equivalent.
functions.f. This file is for use in FORTRAN programs. It contains
declarations of all HDF FORTRAN-callable functions. Systems that
have FORTRAN preprocessors might be able to include these files via
#include statements or their equivalent.
Source Code Files
All HDF operations are performed by routines written in C. Hence,
even FORTRAN calls to HDF result in calls to the corresponding C
routines. However, because of the problems described below the
relationships between the C routines and the corresponding FORTRAN
routines can be very confusing. Before looking at the specific
problems, we first describe the C and FORTRAN source file
organization.
(1)In earlier implementations of HDF, these files were called df.h
and dfi.h. Starting with HDF 3.2 the general purpose layer of HDF
was completely rewritten, and all routine names changed from "df
... " to "h ...".
Each HDF interface typically has four files associated with it. The
HDF Raster Image Set interface, for example, has four associated
source files: dfr8.h, dfr8.c, dfr8f.c, dfr8ff.f. The suffixes on
the filenames indicate their functions, as we describe next.
The ".h" file is the header file. The other three files, which
contain the C and FORTRAN functions, are:
(1) The "normal" C routines. These routines do all of the actual
HDF work. The others have the job of transferring control and
data from a FORTRAN environment to a C environment.
These routines are stored in files whose names end with ".c",
as in "dfr8.c". Every call to HDF, whether it is a C call or a
FORTRAN call, ultimately results in a call to one of these
routines.
(2) C routines that are compatible with FORTRAN and therefore
directly callable from FORTRAN. The primary function of these
routines is to provide recognizable function names to the
linker. They may also perform operations on data they receive
from the FORTRAN routines that call them, such as transferring
a FORTRAN string to a local C data area. Examples of how they
perform these operations are given below.
These routines are stored in files whose names end with "f.c",
as in "dfr8f.c" for the raster image interface. The "f" means
that the routines are meant to be called from FORTRAN; the "c"
means that they are C source code.
(3) FORTRAN routines that perform some operation on the parameters
that C is unable to perform, before and/or after calling the
corresponding C routine. These routines are required, for
example, when one of the parameters is a string. The
corresponding C routine has no way of knowing the length of
the string unless it is explicitly given the length by the
FORTRAN routine.
These routines are stored in files whose names end with "ff.f",
as in "dfr8ff.f" for the raster image interface. The "f' means
that the routines are to be called from FORTRAN; the first "f"
means that they perform some FORTRAN operation that C cannot
perform; the second "f" means that they are FORTRAN source code.
The roles of these different types of source file types will become
clearer as we look at some of the problems that arise in
interfacing C and many different implementations of FORTRAN.
File naming conventions
The naming conventions for HDF library source code files are
complicated by several factors. Because of the wide variety of
platforms which HDF must accommodate, all files that will compile
to object modules in the HDF library must have names that are
unique in the first 8 characters, ignoring case. The difficulties
involved in maintaining a Fortran-callable interface to a library
that is primarily written in C further complicate the naming of
source code files.
Passing Strings between FORTRAN and C
One of the most important differences between FORTRAN and C
compilers is in the way strings are represented. Different
compilers use different data structures for strings, and supply
string length information in different ways.
Passing Strings from FORTRAN to C
When strings are passed between FORTRAN and C routines, they may
need to be converted from one representation to the other. C
compilers store strings in an array of type char, terminated by a
NULL byte ('\0'). The name of a string variable is a pointer to the
address of the first character in the string. FORTRAN compilers are
not consistent in the ways that they store strings.
Two pieces of information are needed in order to pass a string from
FORTRAN to C: its length and its address.
The first problem is solved by invoking the standard FORTRAN
function len(), which returns the length of a string. Since C
expects a '\0' (NULL) byte at the end of strings, care must be
taken that this NULL byte does not overwrite useful information in
the FORTRAN string.
The second problem is more difficult because of the different ways
that different FORTRANs store string.
To solve this, a macro_fcdtocp ("FORTRAN character descriptor to
C pointer) is used. _fcdtocp is defined differently, depending on
the machine on which it is compiled. Here are some different ways
that _fcdtocp works:
There are three different ways that a FORTRAN string's address can
be passed to C:
* UNICOS FORTRAN stores strings in a structure called '_fcd"
(FORTRAN character descriptor). '_fcdtocp' is a built-in function
in UNICOS that returns the address of the string.
* VMS FORTRAN stores strings by means of a string descriptor
structure that provides information about where the string is
stored and its length. When compiled under VMS, the function
_fcdtocp extracts the string's address and returns that value.
* Most other FORTRAN compilers supported by HDF store strings just
as C does, in character arrays with the array name identifying
the array's address. For these compilers nothing special need be
done in passing a string from FORTRAN to C.
In HDF, a FORTRAN call that involves passing a string results in
the following sequences of actions:
(1) A FORTRAN "stub" determines the length and address in memory
of the string. Since this is a FORTRAN routine, it can be found
in the file.
(2) The FORTRAN stub then calls a C routine, which it passes all
parameters from the initial call, plus one extra parameter: the
string's length.
(3) The C routine converts the FORTRAN string to a C string by
copying it to a C array of type char, and appending a '\0'
byte. Since this C routine serves as a link between a FORTRAN
stub and the corresponding C interface call, it can be found
in the " ... f.c" file.
(4) This C routine then calls the HDF C routine that performs the
actual function.
This process is illustrated in Figure 7.1
*** INSERT FIGURE HERE ***
Passing Strings from C to FORTRAN
When strings are passed from C to FORTRAN, the reverse procedure
is followed. First, a string pointer is obtained within the FORTRAN
routine's data area. (It is assumed that the space pointed to has
already been allocated, and is sufficiently large to hold the
string.) The string is then copied from the C data area to the
FORTRAN data area. Finally, if necessary the FORTRAN string's data
area is padded with blanks.
Function Return Values between FORTRAN and C
When a FORTRAN routine calls a C function, it always expects a
return value from that function. Unfortunately, the form in which
C functions return arguments is not always compatible with the form
in with FORTRAN expects them.
To solve this problem, some C compilers offer the option of
controlling the form of the return value from a function. For
example, Language Systems FORTRAN for the Macintosh requires that
all C function declarations be prepended by the word "pascal" so
that the return value can be recognized by a FORTRAN routine that
calls it, as in:
pascal int dsgrang(void *pmax, void *pmin)
Since C always expects return values to be passed "by value" rather
than, say, "by reference," it is important to coerce FORTRAN
functions to do the same. This is accomplished by defining a macro
FRETVAL that is prepended to the declaration of every FORTRAN-
callable C function. For example:
FRETVAL (int)
dsgrang(void *pmax, void *pmin)
If Language Systems FORTRAN is to be used, then FRETVAL is defined
(in hdfi.h) as follows:
#if defined(MAC) /* with LS FORTRAN */
# define FRETVAL(X) pascal x
#endif
Differences in Acceptable Routine Names
Different FORTRAN compilers impose different restrictions on the
length, character set, and form of identifiers. In general, HDF
uses C conventions for naming routines, and this means that
measures must be taken to accommodate those compilers which have
different conventions than C.
The method used in HDF is to name routines differently, depending
on the particular conventions of the FORTRAN compiler being used.
This is done by defining certain flags for the preprocessor via
#define statements in the hdfi.h file. Then conditional
compilation--via #ifdef statements in.the source code files--is
used to compile the routines that are called from FORTRAN with
names that that particular FORTRAN can understand.
Case Sensitivity
C compilers are case sensitive. That is, upper and lower case
letters are different. Many FORTRAN compilers allow users to use
upper and lower case letters in naming routines, but the symbol
table names that they produce in object modules are all in upper
case or all in lower case. These compilers are not case sensitive.
If routines compiled by a case-sensitive compiler are to be linked
with routines compiled by a compiler that is not case sensitive,
they might not recognize one another's routines.
For example, the UNICOS FORTRAN compiler allows you to name
routines without regard to case, but produces object modules with
all routine names converted to upper case. UNICOS C, on the other
hand, performs no such conversion. Consider how the HDF routine
Hopen is treated by the two compilers.
Hopen is written in C, so the HDF library has the name 'Hopen', a
mixed-case name, in its symbol table. Suppose you make the
following call in your UNICOS FORTRAN program:
file_id = Hopen('myfile', ... )
The FORTRAN compiler will create an object module with the routine
name "HOPEN" (all upper case) in its symbol table. When you link
it to the HDF library, it will find "Hopen", but not "HOPEN", and
will generate an "unsatisfied external reference" error.
So far there are three FORTRAN compilers supported by HDF that
convert names to upper case in the symbol table:
VMS FORTRAN
UNICOS FORTRAN
Language Systems FORTRAN.
How HDF Deals with "All-Upper Case" Compilers
The solution to this problem is to name C functions entirely in
upper case whenever they are called by all-upper case FORTRAN
routines. This is done as follows: For FORTRAN compilers that
produce all upper case symbol table entries a flag "DF_CAPFNAMES"
is defined via a #define in hdfi.h. Then conditional compilation
is used in the source code files to compile the routines that are
called from FORTRAN with all-upper case names.
For example, since UNICOS FORTRAN produces all-upper case symbol
table entries, there is in the UNICOS section of hdfi.h. the
following line:
#define DF_CAPFNAMES
Correspondingly, there are conditional compilations in the "..f.c"
files that produce all-upper case routine names. For example, the
function name "Fun" can be redefined at "FUN" as follows:
#ifdef DF_CAPFNAMES
define Fun FUN
#endif /* DF_CAPFNAMES */
Appended Underscore
A similar problem occurs with respect to the underscore character.
When compilers generate object module symbol tables from source
code, they commonly prepend an underscore ('_') to all external
symbols. C generally does this. Then, when linking occurs, the
linker looks for external symbols in the symbol table with the
prefix.
Unfortunately, many FORTRAN compilers also append an underscore to
identify external symbols. Since C does not generally do this,
external references in FORTRAN-generated object modules will not
recognize externals with the same names in C-generated modules.
For example, the FORTRAN compiler on the CONVEX, places an
underscore at the end of routine names, while the C compiler only
places an underscore at the front. Consider how a C function called
FUN would be treated in this context.
Since FUN is a C function, the object module containing FUN has it
stored under the name "_FUN". Suppose you make the following call
in a FORTRAN program:
x = FUN (y)
The FORTRAN compiler creates an object module with the routine name
"_ FUN_" in its symbol table. When you link it to the C module, it
will find " FUN", but not "_FUN_", and will generate an
"unsatisified external reference error."
How HDF Specifies Appended (and Prepended) Underscores
The solution to this problem is to name C functions with an
appended underscore whenever one is expected by FORTRAN calling
routines. For instance, if the name of FUN had been "FUN_" in the
example, its name in the C object module would have been "_FUN_",
which is exactly what FORTRAN put into its symbol table.
This is done as follows: For every machine whose FORTRAN compiler
requires appended underscores, a flag "FNAME_POST_UNDERSCORE" is
defined via a #define in hdfi.h in the section associated with that
machine. Similarly, for those that require a prepended underscore
a flag "FNAME_PRE_UNDERSCORE" is defined. Then, in a section of
code in hdfi.h, conditional compilation is used to define a macro
called "FNAME" that appends and/or prepends underscores as
required.
In the modules in which routines are actually defined (including
in hptroto.h), the FNAME macro is then applied to each routine,
causing the appropriate underscores to be added.
Hence, in the example above in which "Fun" was caused to be
uppercase, the actual definition would be as follows:
#ifdef DF_CAPFNAMES
define Fun FNAME(FUN)
#endif /* DF_CAPFNAMES */
Short Names vs. Long Names
In the C implementations supported by HDF, identifiers may be any
length, with at least the first 31 characters having significance.
FORTRAN compilers differ in the maximum lengths of identifiers that
they allow, but all of those supported by HDF allow identifiers to
have at least seven characters.
To deal with the discrepancies between identifier lengths allowed
by C and those allowed by the various FORTRAN compilers, a set of
equivalent short names has been devised that can be used when
programming in FORTRAN. For all HDF routines with names that are
more than seven characters long, there is an identical routine
whose name is eight or fewer characters long.
For example, for the routine "DFSDgetdims" in the file dfsd.c there
is a corresponding routine "dsgdims" in the file dfsdff.f with
exactly the same functionality.
ANSI C vs. Old C
Both ANSI and old C compilers are supported in the current
implementation of HDF (HDF 3.2). ANSI C is preferred, because it
has many features that help insure portability, but unfortunately
many important platforms do not support full ANSI C. The HDF code
determines whether or not ANSI C is available from the flag _STDC_.
If ANSI C is available, then _STDC_ is defined.(1)
The most noticeable difference between ANSI and old C is in the way
functions are declared. For example, in ANSI C the function
DFSDsetdims() is declared with
int DFSDsetdims(intn rank, int32 dimsizes[])
(1) Some C compilers are not entirely ANSI-conforming, yet they
conform well enough that the HDF implementation can treat them as
if they were. In such cases, it is considered permissible to
"#define" _STDC_ when compiling.
In old C the same function is declared with
int DFSDsetdims(rank, dimsizes)
intn rank;
int32 dimsizes[];
The NCSA implementation of HDF accommodates these differences by
defining in hdfi.h a flag called PROTOTYPE, which is used for every
function declaration, as in the following example.
#ifdef PROTOTYPE
int DFSDsetdims(intn rank, int32 dimsizes[])
#else
int DFSDsetdims(rank, dimsizes)
intn rank;
int32 dimsizes[];
#endif /* PROTOTYPE */
Another big difference between K&R and ANSI C is that ANSI C allows
the use of function prototypes that include arguments, which helps
enormously in detecting errors in the number and types of
arguments. Old C also allows the use of function prototypes, but
without the argument list. This difference occurs whenever
PROTOTYPE is defined, it is handled by means of a macro called
PROTO, which is defined as follows:
#ifdef PROTOTYPE
#define PROTO(x) x
#else
#define PROTO(x) ()
#endif
This macro is applied as in the following example:
extern int32 Hopen
PROTO((char *path, intn access, int16 ndds));
When PROTOTYPE is defined, PROTO causes the argument list to stay
as it is. When PROTOTYPE is not defined, PROTO causes the argument
list to disappear.
Type Differences
Different machines and compilers differ in the sizes of numbers
that they assign to different data types, in their representations
of different number types, and in the way they organize aggregates
of numbers (especially structures).
Size differences
The same number type can be different sizes on different machines.
Type int, for example, is 16 bits to many IBM PC compilers, 48 bits
to some supercomputer compilers, and 32 bits on most others. These
differences can cause insidious problems in code like the HDF code
that depends in so many places on numbers being the right size.
This problem is handled in HDF by insisting in the code that all
variables and functions must use a typedef'ed type which fully
defines their type, including the number of bits that they occupy.
This includes all parameters, members of structures, and static,
automatic, and external variables.
Hence, the data types used in HDF include the following. (The
prefix "u" stands for "unsigned".)
int8
uint8
int16
uint16
int32
uint32
float32
float64
intn
uintn
So, for example, on Sun's C compiler uint32 is defined with
typedef long int int32;
Hence, for each machine, typedefs are declared that map all of the
data types used into the best available types.
Unfortunately, it is not always possible to find a local data type
that maps exactly to one of these types. For example, the Cray
UNICOS C compiler does not support a 16-bit data type. In such
instances, we do the best we can and try to be on the lookout for
potential problems with number sizes.
The data types "intn" and uintn are to be used whenever it can be
determined that number type size is of no consequence, and that a
16-bit integer is large enough to hold any value the number can
have. In such cases, the native int (or unsigned int) type of the
host machine is used. Experience has shown that substantial
performance gains can be achieved by using intn or uintn in certain
circumstances.
Number Representation
One of the keys to producing a portable file format is insuring
that numbers that are represented differently on different machines
somehow get converted correctly when moved from machine to machine.
The approach taken to this in the NCSA implementation is to provide
conversion routines to convert between local representations and
a standard representation that is stored in HDF files. Details of
this process will be included in a later edition of this manual.
Byte-order and Structure Representations
Even when the basic bit-representation of constants or aggregates
like structures is the same between machines, the ways that the
bits are packed into a word, and the order in which the bits are
layed out, can differ among machines. For example, Digital machines
and Intel-based machines generally order bytes differently from
most others. And the C compiler on a Cray, whose word size is 64
bits, packs structures differently from one on machines whose word
size is 32 bits.
Differences in byte order among machines are handled in two ways.
when the data to be written (or read) consists of non-integer data
and/or a large array or any type of data, a conversion routine
(mentioned in the previous section, "Number Representation") is
invoked. When an individual integer is to be written (or read), an
"ENCODE" or "DECODE" macro is used.
There are ENCODE and DECODE macros for 16-bit and 32-bit integers:
INT16ENCODE
UINT16ENCODE
INT32ENCODE
UINT32ENCODE
INT16DECODE
UINT16DECODE
INT32DECODE
UINT32DECODE
The ENCODE macros are written in such a way that they write
integers to an HDF file in a standard way, no matter what the
corresponding word-size and byte order are of the host machine.
Likewise, Tthe DECODE macros are written in such a way that they
read integers stored in a standard way in an HDF file and store the
integers in the required byte order and word size on the host
machine.
Since the ENCODE and DECODE macros deal with both byte order and
word size, they are also used to handle the reading and writing of
record-like structures. For example, the structure for an HDF data
descriptor consists of two 16-bit fields, followed by two 32-bit
fields, as implied by the following C declaration:
struct {
uint16 tag;
uint16 ref;
uint32 offset;
uint32 length;
}
In an HDF file this structure must occupy exactly 12 bytes. On one
computer it might occupy 12 bytes of storage, and on another, such
as the Cray, it might occupy 32 bytes. Furthermore some machines
might represent the numbers internally in different byte orders
than others. By using the ENCODE and DECODE macros we are able to
insure that these values are represented correctly in all machines
and in HDF files.
Access to Library Functions
Despite efforts to standardize them, function libraries often
differ in significant ways. There are at least three types of
functions that need special treatment in the HDF implementation:
(1) All file I/0 access. Both the stream and system level functions
need this (i.e. the functions associated with the fopen() call,
and the functions associated with the open() call). This is
generally a 16-bit vs. 32-bit problem, because some machines
use 16-bit values for the size of and the number of elements
to write/read, and others use 32-bit values.
(2) All memory allocation and releasing. There are two different
problems associated with this. The first is that on a 16-bit
machine, a 16-bit value is used for the number of bytes to
allocate at one time. The second is that certain operating
systems (notably MS-Windows and MAC/OS) don't have malloc() and
free() calls. These operating systems use handles for
allocating memory and require different function calls for
memory allocation.
(3) Memory and string manipulation. These functions (such as
memcpy(), memcmp(), strcpy(), strlen(), etc.) require slightly
different function names under different memory models in
MS-DOS and under MS-Windows than on most other systems.
These differences are dealt with by defining macros for the
relevant functions, and defining them appropriately in the
machine-specific sections of hdfi.h.