Encyclopedia of Graphics File Formats Companion

home *** CD-ROM | disk | FTP | other *** search

/ Encyclopedia of Graphics File Formats Companion / GFF_CD.ISO / formats / hdf / spec / hdf_03.txt < prev next >

Wrap

Text File | 1994-06-01 | 181KB | 4,809 lines

NCSA HDF Specifications DRAFT January 1993 University of Illinois at Urbana--Champaign Introduction Overview The Hierarchical Data Format (HDF) was designed to make the sharing of scientific data between different people, different projects, and different types of computers easy and self-describing. An extensible header, along with carefully crafted internal layers, provides a system that can grow along with the software that NCSA develops. This chapter provides a brief overview of HDF capabilities and design. Why HDF? A fundamental requirement of scientific data management is the ability to access as much information in as many ways and as quickly and easily as possible. To make this possible, there needs to be a data storage and retrieval system that facilitates these capabilities. Specific needs of such a system include the following. * Support for scientific data and metadata. Scientific data is characterized by a variety of different data types and representations, data sets (including images) that can be extremely large and complex, and the need to attach accompanying attributes, parameters, notebooks, and other metadata. * Support for a range of hardware platforms. Data can originate on one machine, only to be used later on many different machines. Scientists must be able to access data and metadata on as many hardware platforms as possible * Support for a range of software tools. Scientists need a variety of software tools and utilities for easily searching, analyzing, archiving, and transporting the data and metadata. These tools range from a library of routines for reading and writing data and metadata to small utilities that simply display an image on a console, to full-blown database retrieval systems that provide multiple views of thousands of sets of data and metadata. * Rapid data transfer. Both the size and the dispersion of scientific data sets require that mechanisms must exist to get the data from place to place rapidly. * Extendibility. As new types of information are generated and new kinds of science are done, a means must be provided to support them. What is HDF? The structure of HDF. HDF is a self-describing extensible file format based on the use of tagged objects that have standard meanings. The idea is to store both a known format description and the data in the same file. HDF tags describe the format of the data in the sense that each tag is assigned a specific meaning--one tag is assigned to "Color Palette," another is assigned to "Raster Image," and so on (see Figure 1). A program that has been written to understand a certain list of tag types can scan the file for those tag types and process the data. This program also can ignore any data that is beyond its scope. The set of available data objects encompasses both primary and secondary data (metadata). Most HDF objects are machine- and medium-independent, physical representations of data and metadata. HDF Tags. HDF is designed with the assumption that we cannot know a priori what types of data objects will be needed in the future, nor can we know how scientists will want to view their data. As new science is done, new types of data objects are needed, and new tags must be created. In order to avoid unnecessary proliferation of tags, and to insure that all tags are available to potential users who need to share data, a portable public domain library is available that interprets all public tags. The library contains user interfaces designed to provide views of the data that are most natural for users. As we learn more about the way scientists need to view their data, we can add user interfaces that reflect data models consistent with those views. Types of data and structures. HDF currently supports the most common types of data and metadata that scientists use, including multidimensional gridded data, 2d and 3d raster images, polygonal mesh data, multivariate datasets, sparse matrices, finite-element data, splines, non-Cartesian coordinate data, and text. In the future there will almost certainly be a need to incorporate new types of data, such as voice and video, some of which might actually be stored on other media than the central file itself. In this sense, it may be desirable to employ the concept of a "virtual file", which functions like a file, but doesn't fit our normal notion of a file as a monolithic sequence of bits stored entirely on a disk or tape somewhere. HDF also makes it possible for the user to include annotations, titles, and specific descriptions of the data in the file, so that files can be archived with human-readable information about the data and its origins. One collection of HDF tags supports a hierarchical grouping structure called vset that allows scientists to organize data objects within HDF files to fit their views of how the objects go together, much as a person in an office or laboratory organizes information in folders, drawers, journal boxes, and on their desktops. *** INSERT FIGURE HERE *** Backward and forward compatibility. An important goal of HDF is to maximize backward and forward compatibility among its interfaces. This is not always achievable, because changes sometimes have to be made to the way data is organized in order to enhance performance, to correct errors, or for other reasons. However, whenever possible, HDF files should not become out of date. For example, suppose a site falls far behind in the HDF standard, so its users can only work with the portions of the specification that are three years old. Users at this site might produce files with their old HDF software, then read them with newer software designed to work with more advanced data files. The newer software should still be able to read the old files. Conversely, if the site receives files that contain objects that its HDF software does not understand, it should still be able to list the types of data in the file, and it should still be able to access all of the older types of data objects that it understands, despite the fact that the older types of data objects are mixed in with new kinds of data. In addition, if the more advanced site uses the text annotation facilities of HDF effectively, the files will arrive Appendix A, "NCSA HDF Tags," presents a list of brief descriptions of the tags assigned at NCSA for general use. Appendix B, "Header Files," includes the general header files used in compiling all HDF libraries. Form of Presentation The material in this manual is presented in text or Presentation screen displays. Text In explaining various features and commands, this manual often presents a word within a paragraph in italics to indicate that the word is defined within the paragraph. Portions of this manual refer to other portions of the manual where the other portions explain related topics. These cross references usually mention the title of sections or chapters enclosed in quotation marks, such as, See Chapter 1, "The Basic Structure of HDF Files." Screen Displays. Screen displays in this manual are presented in Courier type. long process of redesigning the lower layers of HDF began. As of this writing, in Summer 1982, we are about to release the first version of HDF that incorporates the new lower layers of HDF. Use of This Manual This manual is designed for software developers who are designing applications or routines for use with HDF files and for users who need detailed information about HDF. Users who are interested in using HDF to store or manipulate their data do not normally need the kind of detail presented in this manual. They should instead consult a user manual, such as "HDF Calling Interfaces and Utilities," "HDF Vset", or perhaps a manual having to do with software that uses HDF. Manual Contents The manual is organized into the following chapters: Chapter 1, "The Basic Structure of HDF Files," introduces and describes the components and organization of Hierarchical Data Format files. Chapter 2, "HDF Software Overview," describes the organization of the software layers that make up the basic HDF library. Chapter 3, "The NCSA HDF General Purpose Interface," describes the HDF modules that make up the general purpose HDF routines, sometimes referred to as the lower layer of HDF. Chapter 4, "Sets and Groups," explains the role of sets and groups in an HDF file. It contains descriptions of raster image sets, scientific datasets, and Vsets. Vsets are covered in more detail in another chapter. Chapter 5, "Annotations," explains how annotations are currently organized in HDF files. Chapter 6, "Number Conversion," describes the HDF module that is used for number conversion. Chapter 7, "Vsets," describes the structure and functioning of the Vset module. Chapter 8, "Portability," describes techniques and conventions used in the HDF code to achieve portability. Chapter 9, "HDF Conventions," presents guidelines regarding the use of HDF that are not discussed elsewhere. Table of Contents Introduction Overview vii Why HDF vii What Is HDF viii Some History x Use of This Manual x Chapter 1 The Basic Structure of HDF Files Chapter Overview 1.1 File Header 1.1 Data Object 1.1 Physical Organization of HDF Files 1.4 Sample HDF File 1.5 Chapter 2 Software Overview Chapter Overview 2.1 Software Layers 2.1 Organization of HDF Software 2.2 Some HDF Conventions 2.5 Chapter 3 The NCSA HDF General Purpose Interface Chapter Overview 3.1 Introduction 3.1 Overview of the interface 3.2 Function Specifications 3.6 Chapter 4 Sets and Groups Chapter Overview 4.1 Sets 4.1 Groups 4.2 Raster Image Sets 4.4 Scientific Datasets 4.6 Vsets and Vdatas 4.12 Appendix: The Raster-8 Set 4.13 Chapter 5 Annotations Chapter Overview 5.1 Types of Annotations 5.1 File Annotations 5.1 Object Annotations 5.1 Getting Reference Numbers for Object Annotations 5.2 Chapter 6 Tag Specifications Overview 6.1 The HDF Tag Space 6.1 Physical Storage Methods 6.1 Specifications for Supported Tags 6.4 Chapter 7 Making HDF Portable Chapter Overview 7.1 The HDF Environment 7.1 Organization of Source Files 7.2 Passing Strings Between.FORTRAN and C 7.5 Function Return Values between FORTRAN and C 7.7 Differences in Acceptable Routine Names 7.8 ANSI C vs. Old C 7.11 Type Differences 7.12 Access to Library Functions 7.15 Figures and Tables Figure 0.1 Raster Image Sets in an HDF File viii Figure 1.1 Three Data Objects 1.1 Figure 1.2 A Data Descriptor 1.2 Figure 1.3 Model of a Data Descriptor Block 1.3 Figure 1.4 Sample Data Descriptor Block 1.4 Figure 1.5 Physical Representation of Data Objects 1.5 Figure 2.1 HDF software layers 2.1 Figure 4.1 Physical organization of Sample RIG Groupings 4.3 Figure 5.1 Three SDS Tags with Their Ref Numbers 5.1 Figure 5.2 Displayed Example of SDS, Ref #, and Annotation 5.2 Figure 6.1 Description Record for a Linked Block Element 6.2 Figure 6.2 A Linked Block Table 6.3 Figure 6.3 A Data Block 6.3 Figure 6.4 Description Record for an External Element 6.4 Figure 7.1 Illustration of the sequence of actions Involved when a FORTRAN call includes a string as a parameter 7.7 Table 1.1 Parts of a Data Descriptor 1.2 Table 1.2 Summary of the Relationships among Parts of an HDF File 1.4 Table 1.3 Sample Data Objects in an HDF File 1.5 Table 2.1 HDF 3.2 source code modules 2.5 Table 4.1 Tags for Raster Image Sets 4.5 Table 4.2 Additional tags for Raster Image Sets 4.5 Table 4.3 Required tags for SDG 4.8 Table 4.4 Optional Tags for SDG 4. Table 4.5 Required tags for NDG 4.9 Table 4.6 Optional Tags for NDG 4.10 Table 4.7 Required Tags for NDG structure that is compatible with SDG structure 4.10 Table 4.8 Tags for Raster-8 Sets 4.14 Table 5.1 HDF Annotation tags 5.1 Table 6.1 Number Type Values 6.7 Table 6.2 Possible Machine Types 6.8 Table 6.3 Possible Tag Types in an RIG 6.12 Table 6.4 Color Format String Values 6.16 Table 6.5 Possible Tag Types in an NDG 6.21 Table 6.6 Possible calibrated data types 6.28 Table 6.7 Possible Tag Types in an SDG 6.34 Table 6.9 Scientific Data Dimension Record Fields 6.12 Chapter 1 The Basic Structure of HDF Files Chapter Overview File Header Data Object Data Descriptor DD Blocks Data Element Naming and Assigning Tags Physical Organization of HDF Files Sample HDF File Chapter Overview This chapter introduces and describes the components and organization of Hierarchical Data Format (HDF) files. File Header The first component of an HDF file is the file header (FH), which takes up the first four bytes in an HDF file. The file header is a signature that indicates that the file is an HDF file. Specifically, it is the 32-bit magic number with the 32-bit hexadecimal value 0e031301. NOTE: HDF assumes big-endian order in reading and writing files. On some machines the order of bytes in the file header might be swapped when the header is written to an HDF file, causing these characters to be written in little endian. To maintain portability of HDF files when developing software for such machines, you should counteract this byte-swapping by making sure the characters are read and written in the exact order shown. Data Object The basic building block in an HDF file is the data object, which contains both data and information about the data. A data object has two parts: a 12-byte data descriptor (DD) and a data element. Figure 1.1 shows three examples of data objects. As the names imply, the data descriptor gives information about the data, and the data element it the data itself. In other words, all data in an HDF file has attached to it information about itself. In this sense, HDF files are examples of self-describing files. ED. NOTE: Figures are not available in this plain text version of the specification. Figure 1.1 Three Data Objects Data Descriptor (DD) A data descriptor (DD) has four fields: a 16-bit tag, a 16-bit reference number, a 32-bit data offset, and 32-bit data length. These parts of a DD are depicted in Figure 1.2 and are briefly described in Table 1.1. Explanations of each part appear in the paragraphs following Table 1.1. *** INSERT FIGURE HERE *** Table 1.1 Parts of a Data Descriptor Part Description tag designates the type of data in a data element reference number uniquely distinguishes corresponding data element from others with the same tag data identifier tag/ref; uniquely identifies data element offset byte offset of corresponding data element length length of data element Tag A tag is the part of a data descriptor that tells what kind of data is contained in the corresponding data element. A tag is actually a 16-bit unsigned integer between 1 and 65535, but every tag is also usually given a name that programs can refer to instead of the number. If a DD has no corresponding data element, the value of its tag is DFTAG_NULL, indicating that no data is present.. A tag may never be zero. Tags are assigned by NCSA as part of the specification of HDF. The following ranges are to be used to guide tag assignment: 00001 - 32767 reserved for NCSA use 32768 - 64999 user-definable 65000 - 65535 reserved for expansion of the format Appendix A contains full specifications for all currently supported NCSA HDF tags. Appendix B, "Assigned Tag Numbers," contains the current number assignments. See the section 'Some HDF Conventions" in the chapter "Software Overview" for more information on allocating tags. Reference Number For each occurrence of a tag in an HDF file, a unique reference number is stored with the tag in the data descriptor. Reference numbers are 16-bit unsigned integers. Reference numbers are not necessarily assigned consecutively, so you cannot assume that the actual value of a reference number has any meaning beyond providing a way of distinguishing among objects with the same tag. Data Identifier The combination of a tag and its reference number uniquely identifies the corresponding data object in the file. For this reason, the tag/ref combination is sometimes referred to as a data identifier. Data Offset and Length The data offset reflects the byte position of the corresponding data element from the start of the file. The length gives the number of bytes occupied by the data element. Offset and length are both 32-bit unsigned integers. DD Blocks Data descriptors are stored physically in a linked list of blocks called data descriptor blocks, or DD blocks. The individual components of a data descriptor block are depicted in Figure 1.3. All of the DDs in a DD block are assumed to contain significant data unless they have a tag that is equal to DFTAG NULL (no data). In addition to its DDs, each data descriptor block has a data descriptor header (DDH). The DDH has two fields--a block size field and a next block field. The block size field is a 16-bit unsigned integer that indicates the number of DDS in the following DD block. The next block field is a 32-bit unsigned integer giving the offset of the next DD block, if there is one. The last DDH in the list contains a 0 in its next block field. *** INSERT FIGURE HERE *** Data Element A data element is the raw data part of a data object. Its basic data type is determined by its tag, but other interpretive information may be required before it can be processed properly. Each data element is stored as a set of contiguous bytes starting at the offset given in the corresponding DD (see Figure 1.4).(1) *** INSERT FIGURE HERE *** Physical Organization of HDF Files Physically, the file header, DD blocks, and data elements are organized as follows. The file header is followed by the first DD block, which is followed by data elements and, if necessary, more DD blocks. These relationships are summarized in Table 1.2. There are no rules governing the distribution of DD blocks and data elements within a file, except that the first DD block must follow immediately after the file header. The pointers in the DD headers connect the DD blocks in a linked list, and the offsets in the individual DDs connect the DDS to the data elements. Beyond this basic structure there is no assumed order among the objects in an HDF file. Table 1.2 Summary of the Relationships among Parts of an HDF File Part Constituents HDF File FH, DD-block, data, DD-block, data, DD-block, data ... F H oxOe031301 (32 bit magic number) DD-block DDH, DD, DD, DD ... DDH number-of-DDs (16 bits], offset-to-next-DD block (32 bits) DD tag (16 bits), ref [16 bits], offset (32 bits),length (32 bits) (1) Some HDF software provides the capability of storing objects as a series of linked blocks or external elements, but this occurs at a higher level. At the lowest level each object with a tag/ref is stored contiguously. Sample HDF File Consider an HDF file that contains two 400-by-600 8-bit raster images. Typically, such a file might contain the objects described in Table 1.3. Table 1.3 Sample Data Objects in an HDF File Tag Ref Data FID 1 file identifier: user-assigned title for file FD 1 file descriptor: user-assigned block of text describing overall file contents IP8 1 Image palette (768 bytes) ID8 1 x and y dimensions of the 2D arrays that contain the raster images (4 bytes) RI8 1 first 2D array of raster image pixel data (x*y bytes) RI8 2 second 2D array of pixel data (also x*y bytes) Assuming, for example, that the size of a DD block is 10 DDs, the physical organization of the contents of the file might be described as shown in Figure 1.5. Figure 1.5 Physical Representation of Data Objects Offset Contents 0 FH 4 DDH (10 0) 10 DD (FID 1 130 4) 22 DD (FD 1 134 41) 34 DD (IP8 1 175 768) 46 DD (ID8 1 943 4) 58 DD (RI8 1 947 240000) 70 DD (RI8 2 240947 240000) 82 DD (empty) 94 DD (empty) 106 DD (empty) 118 DD (empty) 130 "sw3" 134 "solar wind simulation: third try. 8/8/88" 175 <data for the image palette> 943 <data for the image dimensions>: 400, 600 947 <data for the first raster image> 240947 <data for the second raster image> In this instance, the file contains two raster images. The two images have the same dimensions and are to be used with the same palette. So, the same data objects for the palette (IP8) and dimension record (ID8) can be used with both images. Chapter 2 HDF Software Overview Chapter Overview Introduction Software Layers Organization of HDF Software Versions and Release Numbers ANSI C and Portability Modules and Interfaces Header Files The HDF Test Suite and Examples Some HDF Conventions Naming and Assigning Tags Using Reference Numbers to Organize Data Objects Multiple References and File Compaction Chapter Overview This chapter contains a description of how HDF software is organized. It also contains some guidelines on writing HDF software. HDF Software Layers HDF-based software comes in four basic forms: an HDF interface library, user programs that store and retrieve data in HDF files, HDF command-line utilities, and HDF-based software tools. The HDF interface library has two types of interfaces: (1) sets of general purpose routines that form the basis of all higher-level HDF development, and (2) application interfaces that support higher level views of data. User programs access HDF files via calls to the HDF library. User programs are attached to the HDF library when they are compiled and linked. The HDF command-line utilities are a group of programs that are distributed with the HDF library. The functionality of the command-line utilities ranges from general purpose, such as listing the contents of an HDF file, to special purpose, such as converting data between different HDF data types (e.g., raster images to scientific data sets). In general, the utilities perform data management tasks. In contrast, HDF-based software tools usually perform data analysis tasks and have polished interactive user interfaces. They include the NCSA Visualization Tool Suite and commercial software packages that use HDF. HDF software is implemented in layers, as illustrated in Figure 2.1. At the lowest level are the general purpose modules, which perform basic I/O. At the next level are interfaces that reflect commonly used objects such as B-bit raster images (RIS8) and multidimensional arrays (SDS). At the top layer are users' programs, utilities, and software tools such as the NCSA visualization software. *** INSERT FIGURE HERE *** The general purpose interfaces are described in detail in this document. Descriptions of the applications interfaces and command-line utilities can be found in the manual "HDF Calling Interfaces and Utilities." Each HDF-based software tool should have its own manual. Since the NCSA user community writes programs primarily in C and Fortran, all of the HDF application interfaces developed at NCSA are callable from both C and Fortran programs. Since the general purpose interface is primarily for program development, not for applications, it provides C routines only. Organization of Software Versions and Release Numbers Since HDF is under continual development, new releases are periodically made available. An HDF version number looks like "3.2r1" which means that it is major version 3, minor version 2, release 1. The three parts of a version number have different meanings: * A new major version number implies that there is some fundamental difference between this code and code with earlier major version numbers. When a new major version is made available, HDF users and developers are strongly encouraged to obtain the new source code and documentation. There will likely be added functionality in successive major versions.of the library and possibly some deletion of obsolete code, so some user code may have to be modified to use the new library. * The meaning of a new minor version number is somewhat less well defined. It essentially means that there is some appreciable difference in the new code which was not deemed drastic enough to warrant a new major version, but is more substantial than a new release number would indicate. * A new release number implies some bug fixes or other small modifications have been made to the code. Using a new release of the same version of the library will not usually require modification of existing user code. ANSI C and Portability In order to provide for easy porting of HDF to new platforms, all versions of the HDF source code from version 3.2 on will be written in ANSI standard C, with special provisions made for non-ANSI compilers. For more information about porting HDF and writing portable HDF-based code, refer to the chapter "Making HDF Portable." Modules and Interfaces The HDF distribution contains many source files or modules which can be grouped into families according to their root name. For example, dfp.c, dfpf.c and dfpff.f all share the root name "dfp" and, therefore, all belong to the "dfp" family. In general, each family of source modules represents one HDF applications interface. Thus, the "dfp" family together represent the HDF Palette Interface. There are a few exceptions to this rule which will be discussed later in this section. For each interface, there is necessarily one file that contains the C Code that provides the basic functionality of that interface. But some interfaces may have one or two additional code modules that provide Fortran callability for the interface. So there are three possible family sizes: 1 file: Modules of this sort are generally not calling interfaces themselves, but rather provide useful support functions for actual calling interfaces. Since they are not meant to be called by any routine outside the HDF library itself, they do not need to be callable from Fortran programs. An example of such a module is hblocks.c. 2 files: Although there are currently no examples of this situation, it is conceivable (and desirable) that some future interface may need only one extra source module to provide Fortran compatibility. If this were to happen, there would only be two source modules for the interface. For instance, dfnew.c and dfnewf.c would make up the "New Interface." 3 files: Most current implementations of Fortran-callable HDF interfaces require the passing of character string arguments to some of their functions. Due to differences in the way C and Fortran represent strings, the passing of strings requires that there be a small amount of special purpose Fortran code written for each function that takes a string argument. For this reason, most Fortran-callable HDF interfaces consist of three source modules: (1) the primary C module, (2)a Fortran-callable C module, and (3) a Fortran module. For example, dfsd.c, dfsdf.c and dfsdff.f make up the Scientific Data Set Interface. dfsd.c contains the basic functionality of the interface, dfsdf.c provides the major part of Fortran callability, and dfsdff.f contains the special purpose Fortran code that allows the passing of character string arguments. Header Files In addition to the source code modules discussed above, some interfaces also have C header files associated with them that are meant to be included by C applications programmers with the "#include" preprocessor directive. They contain some useful constants and data structures for interaction with the interface from C programs. The header files can be identified by the same name as the root name for the rest of the family with the ".h" extension added. For example, dfsd.h is the header file for the scientific Data Set Interface. Of particular importance among the header files are hdf.h and hdfi.h. hdf.h is the C header file that must be included by any program that calls the HDF library. It contains all the symbolic constants and public data structures that are needed to use HDF. hdfi.h contains specific portability information about each platform on which HDF is supported. It is automatically included in programs when hdf.h is included, so programmers need not explicitly include it. For more information on hdfi.h and other portability issues, refer to the Chapter "Making HDF Portable.". Table 2.1 shows all of the source code modules and header files grouped into families for HDF 3.2. Table 2.1 HDF 3.2 source code modules general general grouping utilities Vsets Old headers purpose (non- general Vset) purpose hdf.h hfile.c dfgroup.c dfutil.c vg.c dfstubs.c hdfi.h hfilef.c dfgroup.h dfutilf.c vgf.c dff.c hproto.h hfileff.f dfutilff.f vgff.f dfff.f dfivms.h hkit.c dfutil.h vfp.c df.h hblocks.c vgi.h dfi.h hextelt.c vio.c dfstubs.h herr.c vconv.c herrf.c vparse.c hfile.h vrw.c herr.h vsfld.c vg.h vproto.h 8/24 bit general palettes scientifi annotatio special raster raster c data ns FORTRAN sets dfr8.c dfgr.c dfp.c dfsd.c dfan.c constants.f dfr8f.c dfgr.h dfpf.c dfsdf.c dfanf.c functions.f dfr8ff.f dfcomp.c dfpff.f dfsdff.f dfanff.f df24.c dfimcomp.c dfsd.h dfan.h df24f.c dfrig.h df24ff.f The HDF Test Suite and Examples In addition to the source code for the HDF library, versions 3.2 and higher will have an available suite of test programs There are at least two test programs for most interfaces: one for the C version and one for the Fortran-callable version. Some interfaces have more than two test programs to test special features of that interface and some have only one test program, since they only provide C-callability. Every effort will be made to ensure that the test programs provide a thorough and accurate assessment of the health of the HDF library. Although it is hoped that the test suite will greatly improve the reliability of HDF code, it is almost inevitable that some parts of the code will be untested. Therefore, no guarantees can be made on the basis of test suite performance. There is also a set of example programs to help users write HDF programs. They illustrate some of the common ways in which users program with HDF. Some HDF Conventions The specification of HDF described in the previous chapter is not sufficient to guarantee its success. It is also important for users to adhere to certain conventions in using HDF. Guidelines in the use of HDF are implicit in many discussions in other sections of this document, and others are presented in the manual "HDF Calling Interfaces and Utilities." Guidelines not covered elsewhere are introduced in this section. Naming and Assigning Tags Tags that are to be made available to a general population of HDF users should be assigned and controlled by NCSA. Tags of this type are given numbers in the range 1-32,767. If you have an application that fits this criterion, contact NCSA at the address listed on the README page at the beginning of this manual and specify the tags you would like. For each tag, your specifications should include a suggested name, information about the type and structure of the data that the tag will refer to, and information about how the tag will be used. Your specifications should be similar to those contained in Appendix A. NCSA will assign you a set of tags for your application and include your tag descriptions in its documentation. Tags in the range 32,768-64,999 are user-definable. That is, you can assign them for any private application. Of course, if you use tags in this range you need to be aware that they may conflict with other people's private tags. Using Reference Numbers to Organize Data Objects The HDF library itself uses reference numbers solely for the purpose of distinguishing between different objects with the same tag. While application programmers may find it convenient to impart some meaning to reference numbers, they should be forewarned that the HDF library will be ignorant of any such meaning. In other words, any meaning attached to reference numbers exists only at the application program or software tool level. Some users have used reference numbers to indicate how objects should be grouped by considering all objects with the same reference number to be part of the same group. This practice is not recommended. Instead, if object grouping is desired it is recommended that you use either the simple grouping procedures used by the SDS, RIS8, and RIS24 applications (supported by the routines in dfgroup.c), or the more general (and more complex) Vset structures. Another possible use of reference numbers is for keyed access to HDF objects. An HDF data identifier (tag/ref) provides an unique identifier for any HDF object within a file, and hence could be used as a primary key for that object. One could keep a table of data identifiers as a way of providing random access to HDF objects. Reference numbers might also be used to impose an ordering on HDF objects. Once again, because the assignment scheme for reference numbers in HDF files does not guarantee any order, caution is advised in this uses of reference numbers. Multiple References Multiple references to a single data element are quite common in HDF. The general purpose routine Hdupdd generates a new reference to data that is already pointed to by another DD. If Hdupdd is used several times, there could be several DDs that point to the same data element. It is important to note that when a multiply-referenced data element is deleted or moved, the various DDs that previously pointed to the data element are not automatically deleted or adjusted to point to the data element in its new location. Consequently, each DD to be deleted or moved should be checked for multiple references and handled as the programmer sees fit. Chapter 3 The NCSA HDF General Purpose Interface Chapter Overview Introduction Overview of the Interface Function Specifications Opening and Closing Files Finding Tags, Refs, and Element Lengths Reading and Writing Entire Data Elements Reading and Writing Part of a Data Element Manipulating Data Descriptors (DDs) Creating Special Data Elements Development Routines Error Reporting Chapter Overview This chapter contains a detailed description of the routines that make up the general purpose HDF interface. Introduction NCSA supports interfaces for HDF users--both high level interfaces to support certain application areas, such as image processing, and low level general purpose interfaces for performing basic operations on HDF files. These interfaces are written in C only but most functions are typically accessible from Fortran. The routines in the general purpose interface enable you to build and manipulate HDF objects of any type, including those of your own invention. All HDF applications developed at NCSA use these routines as their basic building blocks. The routines described in this chapter represent a second set of general purpose routines. All HDF applications prior to HDF 3.2 (released in June 1992) used an earlier set of general purpose routines. These low level general purpose routines have been changed to allow for better functionality. Old routines will still be emulated but at a cost of reduced functionality. Users are strongly advised to use the new interface. The new lower layer, first used with HDF Version 3.2, incorporates the following improvements over its predecessor: * More consistent data and function types. * An error handling module that supports more meaningful and extensive reporting of errors. * Simplification of key lower level functions. * Simplified techniques for facilitating portability. * Support for alternate forms of physical storage, such as linked blocks storage, and storage of the data portion of an object in an external file. * A version tag indicating which version of the HDF library last changed an HDF file. * Support for simultaneous access to multiple files. * Support for simultaneous access to multiple objects within a single file. The previous lower layer is called the "DF layer", because all routines began with the letters "DF", as in "DFopen" and "DFclose." The new layer is called the "H layer" because all routines begin with the letter "H" (Hopen, Hclose, Hwrite, etc.). The source modules that implement these changes can be found in files that begin with the letter "h". Also, the number of basic source modules has changed, and now includes: hfile.c basic I/O herr.c error-handling hkit.c general purpose routines hblocks.c to support linked block physical storage hextelt.c to support external storage of HDF data Overview of the interface Following is a listing of the public functions that can be found in the general purpose interface. This section provides specifications and descriptions of these routines. Opening and Closing HDF Files These calls are used to open and close HDF files. Hopen Provides an access path to an HDF file. It also reads into memory all of the DD blocks in the file. Hclose Closes the access path to a file. Locating Elements for Access and Getting Information These routines make it possible to locate elements or find out other information. Except for Hendaccess, they initialize the element that they locate and return an access id that is used in later references to the data element. Calls to them can include wild cards so that one can search for unknown tags and refs. Hstartread Locates an existing data element with matching tag/ref and returns an access id for reading it. Hnextread Continues the search with the same access id. Hstartwrite Allows writing to the object with the supplied tag/ref. If the object exists, the object will be modified, otherwise it is created. Hendaccess Disposes of access id for tag/ref. Hinquire Returns access information about a data element. Hishdf Determines whether a file is an HDF file. Hnumber Returns the number of occurrences of a specified data identifier (tag/ref) in a file. Hgetlibversion Returns version information for the current HDF library Hgetfileversion Returns version information for an HDF file Reading and Writing Entire Data Elements There are two sets of routines for reading and writing data elements. The set of routines described here is used to store and retrieve entire data elements. A second set of routines, described in the next section, may be used if you wish to access only part of a data element at a time. Hputelement Adds or replaces elements in a file. Hgetelement Obtains the data referred to by the tag/ref combination that is passed to it. Reading and Writing Part of a Data Element The second set of routines for reading and writing data elements makes it possible to read or write all or part of a data element, in contrast to the routines described above which can only read or write an entire element. One of the access routines Hstartread or Hstartwrite must be called before calling these routines. Hwrite Appends data to a data element. It starts at the last position left by a Hwrite or Hseek command, writes up to a specified number of bytes, then leaves the access pointer at the end of the data written. Hread Reads a portion of a data element. It starts at the last position left by a Hread or Hseek command and reads any data that remains in the element up to a specified number of bytes. Hseek Sets the access pointer to an offset within a data element. The next time Hread or Hwrite is called, the access occurs from the new position. The location to seek to can be specified as an offset from the current location or from the start of the element. Manipulating Data Descriptors (DDs) These routines perform operations on DDs without doing anything with the data to which the DDs refer. Hdupdd Is used to generate new references to data that is already referenced from somewhere else. Hdeldd Deletes a tag/ref from the list of DDs. Hnewref Returns the next available reference number for the HDF file. Creating Special Data Elements HDF 3.2 introduces two alternate methods of physical storage for HDF objects. Previously, all of the objects in an HDF "file" had to be in the same file and any given object had to be contiguous. This last requirement caused many problems, especially with regard to appending to existing objects. Objects needed to be deleted and rewritten to the end of the file in order to append to them. The two new storage methods are "linked blocks" and "external elements". Linked blocks allow elements in a single HDF file to be non-contiguous. External elements allow a single HDF object to be stored in an external file. It is not currently possible to have a single object (such as a very large data set) stored in multiple files. Nor is it possible to have multiple objects stored in an "external" file. Special data elements can be accessed with the same routines as for normal data elements once they are created. These routines create special data elements. HLcreate Creates a new linked block special data element. HXcreate Creates a new external file special data element. Both of these routines have two modes of operation. For example, calling HLcreate with a tag and ref which do not exist in a file will create i new element with the given tag and ref that will be stored as linked blocks. On the other hand, if the tag/ref pair already existed in the file, the referenced object is "promoted" to being stored as linked blocks. All data which had been stored in the object before the promotion is retained. HXcreate behaves similarly. Development Routines The HDF library provides a number of "developer" level routines that are meant to simplify the task of writing HDF applications. most of these routines mirror basic C library functions which are, unfortunately, not always completely portable in their library form. HDgettagname Return a pointer to a text string describing a given tag. HDgetapace Allocate space. HDfreespace Free space. HDstrncpy Copy a string from one location to another up to a given number of characters. Error Reporting The HDF library now provides a much more robust error reporting scheme. Previously, only a single error value could be returned to the user. There is now the notion of an error stack. This allows for more of the context to be known when trying to decipher a problem. HEprint Print out all of the errors on the error stack to a specified nfile. HEclear Clear the error stack. HERROR Macro to report an error. This will push the error type, file name, line number and name of the function reporting the error. HEreport Add a text string to the description of the most recently reported error. Only a single text string may be supplied per error. The only problem with the error module is that standard C does not have any way for the code inside a function to know the name of the function. Therefore, in order to use the macro HERROR to report errors, there must exist a variable FUNC which points to a string containing the name of the reporting function. Other Hsync Synchronize stored version of HDF file with image in memory. Function Specifications Opening and Closing files Hopen int32 Hopen(char *path, int access, int16 ndds) path IN: Name of file to be opened access IN: DFACC_READ, DFACC_WRITE, DFACC_CREATE or anybitwise-or of the above ndds OUT: Number of dds in a block if this file needs to be created Purpose: Provides an access path to an HDF file. It also reads into primary memory all of the DD blocks in the file. Returns: On success returns file id, on failure returns FAIL. Description: Opens an HDF file. Interpretations of access: HDF provides several constants for use as access privilege codes. Below is a list of these codes and their meanings. It is important to note that these constants are NOT bitflags and should NOT be or'd together to combine access modes. Doing so may cause odd behavior and, in some cases, loss of data. Recommended: DFACC_READ: Open for read only. If file does not exist, error. DFACC_RDWR: Open for read/write. If file does not exist, create it. DFACC_CREATE: Force creation. If file exists, delete it, then open a new file for read/write. (in the spirit of UNIX "clobber") Others: DFACC_ALL: Same as DFACC_RDWR. DFACC_WRITE: Same as DFACC_RDWR. On successful exit, * File_rec members are filled in. * File is opened with the relevant permission. * Information about dd's are set up in memory. For a new file, in addition, * The file headers and initial information are set up. Hclose intn Hclose(int32 id) id IN: the file id of the file to be closed Purpose: Closes the access path to the file. Returns: SUCCEED (0) if successful and FAIL (-1) if failed. Description: Id is first validated. If valid, the function closes the acces path to the file. If there are still access elements attached to the file, the e DFE_OPENAID is returned and the file is not closed. This is a fairly common error when developing new interfaces. the discussion of Hendaccess below for hints on how to debug problem. Locating Elements for Access and Getting Information Hstartread int32 Hstartread(int fileid, int tag, int ref) fileid IN: id of file to attach access element to tag IN: tag to search for ref IN: ref to search for Purpose: Locate an existing data element with matching tag/ref and return a descriptor for reading it. Returns: On success returns id of access element if successful, otherwise FAIL (-1). Description: Searches the DD's for a particular tag/ref combination. Wildcards can be used for tag or ref (DFTAG_WILDCARD, DFREF_WILDCARD) and they match any values. Searching on wildcards begins from the beginning of the DD list. If the search is successful, the access element is positioned to the start of that tag/ref, otherwise it is an error. An access element is created and attached to the file. Hnextread intn Hnextread(int32 access_id, int16 tag, int16 ref, int origin) access_id IN: Id of a READ access elt tag IN: the tag to search for ref IN: ref to search for origin IN: from where to start searching Purpose: Locate and position a read access id on next occurrence of tag/ref. Returns: SUCCEED (0) if successful and FAIL (-1) otherwise. Description: Searches for the "next" DD that fits the tag/ref. Wildcards apply. If origin is DF_START, search from start of DD list, if origin is DF_CURRENT, search from current position. Searching from the end of the file via DF_END is not yet implemented. If the search is successful, then the access element is positioned at the start of that tag/ref, otherwise, the access_id is not modified. Hstartwrite int32 Hstartwrite(int fileid, int tag, int ref, long len) fileid IN: Id of file to write to tag IN: tag to write to ref IN: ref to write to length IN: the length of the data element Purpose: Creates or replace data element with matching tag/ref. Returns: Id of access element if successful and FAIL otherwise. Description: Set up an access element to write out a data element. DD list of the file is searched first. If the tag/ref is four the data element is NOT replaced; rather, it is then possible modify the existing data. If an object with the corresponding and ref does not exist, a new one is created. Hendaccess int32 Hendaccess(int access_id) access-id IN: id of access element to dispose of Purpose: Disposes of descriptor for tag/ref. Returns: returns SUCCEED (0) if successful, FAIL (-1) otherwise. Description: Used to dispose of an access element. There is only a finite number of access elements allowed to be active at a time. Therefore, it is very important to call Hendaccess whenever you are done using an element. When developing new interfaces, we have found that a fairly common mistake is to not call Hendaccess for all of the elements accessed. When this happens, Hclose will return FAIL, and the dump of the error stack (see HEprint, below) will tell how many access elements are still active. This is a rather difficult problem to debug, as the low level the HDF library have really no idea who and where opened an access element and forgot to release it. It's tedious, but the most effective means we have found to debug this problem is to annotate the locations where the `attached' count of a file record is changed (there are a couple of places in hfile.c ar few in hblocks.c and hextelt.c). Hinquire intn Hinquire(int access_id, int32 *pfile_id, uint16 *ptag, uintl6 *pref, int32 *plength, int32 *poffset, int32 *pposn, int *paccess, int *pspecial) access_id IN: Id of an access elt pfile_id OUT: file id ptag OUT: tag of the element pointed to pref OUT: ref of the element pointed to plength OUT: length of the element pointed to poffset OUT: offset of elt in the file pposn OUT: position pointed to within the data elt paccess OUT: the access type of this access elt pspecial OUT: special code Purpose: Returns access information of a data element. Returns: Returns SUCCEED (0) if the access elt points to some data element, otherwise FAIL (-1). Description: Inquire statistics of the data element pointed to by access element. If a piece of information is not needed, it is possible to send NULL in for that value. There are a set of convenience macros for calls to Hinquire (HQuerypositon, HQuerylength, etc ... ) defined in hdf.h. Hishdf int32 Hishdf(char *Path) path IN: name of file Purpose: Determine if a file is an HDF file. Returns: Returns TRUE (non-zero) if file is HDF, FALSE (0) otherwise. Description: The decision of where a file is and HDF file or not is based solely on the magic number stored in the first four bytes of an HDF file. It is possible that Hishdf will identify a file as an HDF file but Hopen will be unable to open the file (for example if the DD list in the file is corrupted). Hnumber int Hnumber(int32 file-id, uint16 tag) file id IN: file id tag IN: tag to be counted Purpose: Find the number of occurrences of tag/ref in file. Returns: The number of instances of a tag in a file. Hgetlibversion Hgetlibversion--return version info for current HDF library USAGE Hgetlibversion(uint32 *majorv, uint32 *minorv, uint32 *release, char string[]) majorv OUT: majorv version number minorv OUT: minorv version number release OUT: release number string OUT: informational text string (80 chars) Purpose: Get version information for current HDF library. Returns: Returns SUCCEED (0). Description: Returns the version of the HDF library. The version information is statistically compiled into the HDF library, so it is not necessary to have any open files for this function to execute. Hgetfileversion Hgetfileversion--return version info for HDF file USAGE Hgetfileversion(uint32 file-id, uint32 *majorv, uint32 *minorv, uint32 *release, char string[]) file_id IN: handle of file majorv OUT: majorv version number *minorv OUT: minorv version number release OUT: release number string OUT: Informational text string (80 chars) Purpose: Get version information for an HDF file. Returns: Returns SUCCEED (0) if successful and FAIL (-1) if failed. Description: Returns the HDF version number stored in the given file. It is still an open question as to what exactly the version number of a file should mean, so we recommend that user code not call this function. Reading and Writing Entire Data Elements Hputelement int Hputelement(int fileid, int tag, int ref,.char *data, long length) fileid IN: Id of file tag IN: tag of data element to put ref IN: ref of data element to put data IN: pointer to buffer length IN: length of data Purpose: Add or replace element in a file. Returns: Returns SUCCEED (0) if successful and FAIL (-1) otherwise. Description: Writes a data element or replace an existing data element in a HDF file. Uses Hwrite and its associated routines. Hgetelement int Hgetelement(int file_id, int tag, int ref, char *data) file_id IN: Id of the file to read from tag IN: tag of data element to read ref IN: ref of data element to read data OUT: buffer to read into Purpose: Obtains the data referred to by the tag/ref combination that passed to it. Returns: Returns SUCCEED (0) if successful, FAIL (-1) otherwise. Description: Read in a data element from a HDF file and puts it into buffer pointed to by data. The space allocated for buffer is assumed to be large enough. Reading and Writing Part of a Data Element Hread int32 Hread(int access_id, long length, char *data) access_id IN: Id of READ access element length IN: length of segment to read in data OUT: pointer to data array to read to Purpose: Read a portion of a data element. Returns: Returns length of segment actually read in if successful and FAIL otherwise. Description: Read in the next segment in the data element pointed to by .the access element. It starts at the last position left by a Hread, or Hseek command and reads any data that remains in the element up to a specified number of bytes. If the data element is too short then it only reads to end of the data element. Hwrite int32 Hwrite(int access_id, long len, char *data) access_id IN: Id of WRITE access element len IN: length of segment to write data IN: pointer to data to write Purpose: Write next data segment to data element. Returns: Returns length of segment successfully written, FAIL (-1) otherwise. Description: Write the data to data element where the last write or Hseek() stopped. It starts at the last position left by a Hwrite command, writes up to a specified number of bytes, then leaves the write pointer at the end of the element. If the space reserved is less than the length to write, then only as much as can fit is written. It is the responsibility of the user to insure that no two access elements are writing to the same data element. It is possible to interlace writes to more than one data elements in the same file though. Hseek intn Hseek(int32 access_id, long offset, int origin) access_id IN: Id of access element offset IN: offset to seek to origin IN: position to seek from by offset, 0: from beginning; 1: current position; 2: end of data element Purpose: Set the access pointer to an offset within a data element. The next time Hread or Hwrite is called, the read or write occurs from the new position. Returns: Returns FAIL (-1) if fail, SUCCEED (0) otherwise. Description: Sets the position of an access element in a data element that the next Hread or Hwrite will start from that position. origin determines the position from which the offset should be added. This routine fails if the access element is not associated with any data element and if the seeked position is outside c the data element. Seeking from the end of a data element is not currently supported. Manipulating Data Descriptors Hdupdd int Hdupdd(int32 file_id, uint16 tag, uint16 ref, uint16 old_tag, uint16 old_ref) file id IN: Id of file tag IN: tag of new data descriptor ref IN: ref of new data descriptor old_tag IN: tag of data descriptor to duplicate old_ref IN: ref of data descriptor to duplicate Purpose: Generate new references to data that is already referenced from somewhere else. Returns: Returns SUCCEED (0) if successful, FAIL (-1) otherwise. Description: Duplicates a data descriptor so that the new tag/ref points to the same data element pointed to by the old tag/ref. Hdeldd int Hdeldd(int file_id, int tag, int ref) file id IN: Id of file tag IN: tag of data descriptor to delete ref IN: ref of data descriptor to delete Purpose: Delete a tag/ref from the list of DDs. Returns: Returns SUCCEED (0) if successful, FAIL (-1) otherwise. Description: Deletes a data descriptor of tag/ref from the dd list of the file. This routine is unsafe and may leave a file in a condition that is not usable by some routines. Use with care. Hnewref uint16 Hnewref(int32 file_id) file-id IN: id of file Purpose: Return the next available ref for HDF file. Returns: Returns the ref number, 0 otherwise. Description: Returns a ref number that can be used with any tag to produce a unique tag/ref. Successive calls to Hnewref will generate a strictly increasing sequence until the highest possible ref had been returned, then Hnewref will return unused ref's starting from 1. Creating Special Data Elements HLcreate int32 HLcreate(int32 file_id, uint16 tag, uint16 ref, int32 block_length, int32 number_blocks) file_id IN: Id of file tag IN: tag of new data descriptor ref IN: ref of new data descriptor block_length IN: length of blocks to be used number-blocks IN: number of blocks to use per linked block record Purpose: Create a new linked block special data element. Returns: Access Id for special data element if successful, otherwise (-1). Description: Appending to existing elements has been a problem in HDF in the past as HDF objects were required to be stored contiguous. When appending, the HDF library had forced the use to delete the existing element and move it to the end. With HDF 3.2 we had added the concept of linked blocks which allow unlimited appending to existing elements without copying over existing data. Initially, a table is set up to accommodate numer_blocks linked blocks for this object. Each block has size block_length bytes. If an existing object is being promoted, block_length does not have to be the same size as the original element. This routine can be used to either create an object with the given tag ref as a linked block element, or promote an existing element to be stored with linked blocks. This routine will return an active access id with write permission to the linked block element. HXcreate int32 HXcreate(int32 file_id, uint16 tag, uint16 ref, char *extern_file_name) file_id IN: file record id tag, ref IN: tag/ref of the special data element to create extern_file_name IN: name of external file to use as data element Purpose: Create a new external file special data element. Returns: Access id for special data element if successful, otherwise FAIL (-1). Description: This routine is used to create a new element in an external file or promote an existing element to be in an external file. if an existing element is to be promoted, it is deleted from the original file and copied over into the new external file. Distributing a single object over multiple external files is currently not supported. In addition, it is not possible to place multiple objects into the same external file. This routine will return an active access id with write permission to the external element. Development Routines HDgettagname char *HDgettagname(uint16 tag) tag IN: tag to look up Purpose: Get a meaningful description of a tag. Returns: A pointer to a string describing this tag or NULL if the tag unknown. Description: To reduce on the amount of reduplicated code, this rout can be used to map a tag to a character string containing the name of the tag. If the tag is unknown, NULL is returned as programs may have different ways of dealing with unknown tags For formatting purposes, the string returned by this routine guaranteed to be 30 characters or less. HDgetspace void *HDgetspace(uint32 qty) qty IN: number of bytes to allocate Purpose: Allocate space. Returns: Pointer to space that was allocated. Description: This routine is very platform-dependent. It uses an appropriate allocation routine on the local machine to get space HDfreespace void *HDfreespace(void *ptr) ptr IN: pointer to previously-allocated space to be freed Purpose: Free space. Returns: NULL. Description: It uses an appropriate routine on the local machine to space. HDstrncpy char *HDstrncpy(register char *dest,register char *source,int32 len) dest OUT: pointer to area to copy string to src IN: pointer to area to copy string from len IN: maximum number.of bytes to copy Purpose: Copy a string with some maximum length. Returns: Address of dest. Description: This function creates a string in dest that is at most len' characters long. The `len' characters include the NULL terminator, which must be added for historical reasons. Hence, if you have the string 'Foo\0' you must call this copy function with len = 4 Error Reporting HEprint void HEprint(FILE *stream, int level) stream IN: stream to print error messages on level IN: level of the error stack to print Purpose: Print out information on the error stack. Returns: No return value. Description: This routine will print out information on reported errors. If level is zero all of the errors currently on the error stack are printed. Output of this function is sent to the file point to by stream. Information printed is: an ascii description of the error, the reporting routine, its file name and the line at which the error was reported. In addition, if the programmer has supplied extra information by means of HEreport, this information is printed well. HEclear void HEclear(void) Purpose: Clear all information on reported errors off of the error stack Returns: No return values. Description: Clear all of the information off of the error stack. HERROR void HERROR(int number) number IN: error number Purpose: Report an error. Returns: No return value. Description: HERROR can be used to report an error. Any function which calls HERROR must have a variable FUNC which points to a string containing the name of the function. HERROR is implemented as a macro. HEreport void HEreport(char *format, ... ) format IN: printf style format and arguments Purpose: Provide extra information to the error reporting routines. Returns: No return value. Description: This routine can be used to provide further annotation to an error report. Only one such annotation is remembered for each error report. The arguments to this routine follow the style of printf. An example from hfile.c char *FUNC = "Hclose"; ... if (file_rec->attach > 0) { file rec>refcount++; HERROR(DFE_OPENAID); HEreport("There are still %d active aids attached", file rec->attach) return FAIL; Other Hsync int Hsync(int32 file id) file_id IN: id of the file to sync Purpose: Synchronize on-disk HDF file with image in memory. Returns: Returns SUCCEED. Description: This routine is currently vacuous as the on-disk representation of an HDF file is always the same as its in-me representation. However, future releases of the HDF library n employ buffering schemes, so this might not always be the case. Hsync will be provided to force the two representations to be consistent. Chapter 4 Sets and Groups Chapter Overview Sets Types of Sets Calling Interfaces for Sets Groups Sample Groups General Features of Groups Raster Image Sets Raster Image Groups Tags for Raster Image Sets Compression of Raster Images Scientific Datasets Required Tags Optional Tags Vsets and Vdatas Chapter Appendix: Raster-8 Sets Compatibility between Raster-8 and Raster Image Sets Chapter Overview This chapter describes raster image sets, scientific datasets and Vsets, and explains the role of sets and groups in an HDF file. It also discusses the programming interfaces available for the three types of sets. Sets Sometimes tags are grouped into sets, where each set is designed to serve a particular user requirement. For example, the raster image set that is described in the following sections, contains several tags that are used for storing information about 8-bit raster images. Types of Sets In the current implementation of HDF there are three kinds of sets: * A raster image set contains a raster image, along with descriptive information about the image, such as its dimensions and (optionally) a color lookup table. * A scientific data set contains a multidimensional array, along with descriptive information about the data. * A Vset is a general grouping structure that can contain any kinds of HDF objects that a user wishes. Each HDF set is defined in terms of a minimum collection of data objects that must be present for the set to make sense when it is used. For instance, every raster image set must contain at least the following three data objects: * an image dimension record, which gives the width and height of the corresponding image; * raster image data, which consists of the pixel values that make up the image; * a raster image group, which lists all of the members in the set. In addition to the required objects, there are optional data objects that may be included in a set. A raster image set, for instance, often contains a palette, or color lookup table, which gives the red, green, and blue values to be associated with each pixel in the raster image data. Calling Interfaces for Sets NCSA provides calling interfaces for all the HDF sets that it supports. The primary purpose of these calling interfaces is to provide libraries of routines for reading and writing the data that is associated with each set. The libraries currently supported at NCSA are callable from either C or Fortran programs. In addition to the libraries, a growing number of command-line utility routines are available for working with sets. For example, a utility called r8tohdf is an HDF command that converts one or more raw raster images to HDF 8-bit raster image set format. NCSA supports calling interfaces for the following machines: Cray (UNICOS), Silicon Graphics (UNIX), Sun (UNIX), Macintosh (MacOS), and IBM PC (MS-DOS). The calling interfaces that are currently available are described in the manual NCSA HDF Calling Interfaces and Utilities. Groups An HDF set is a collection of HDF data objects in a file. Unless some mechanism is used to identify explicitly those objects that belong to a set, there is often no way to tie them together. This problem is solved in HDF by means of groups. A group is a data object that explicitly identifies all of the data objects in a set. Since a group is a type of data object, its structure is like that of any other data object. A group data identifier (tag/ref) points to a data element that consists of the collection of data identifiers that make up the corresponding set. A group tag can be defined for any set. For instance, raster image group (RIG) is the group tag used to group members of raster image sets; RIG data consists of a list of all data identifiers that belong to a particular raster image set. Groups provide a convenient mechanism for. application programs to locate all of the information that they need about a set. Application programs that deal with RIGs, for instance, read all of the elements in a RIG group, using only those that they need for their application and ignoring the others. Sample Groups Suppose that the two images shown in Figure 1.5 are organized into two sets with group tags. Since they are images, they may be stored as RIG groups. Figure 4.1 illustrates the type of organization that incorporates RIG groupings of these images. Figure 4.1 Physical Organization of Sample RIG Grouping Offset Contents 0 FH 4 DDH (10 OL) 10 DD (FID 1 130 4) 22 DD (FD 1 134 41) 34 DD (IP8 1 175 768) 46 DD (ID 1 943 4) 58 DD (RI 1 947 240000) 70 DD (ID 2 240947 4) 82 DD (RI 2 240951 240000) 94 DD (RIG 1 480951 12) 106 DD (RIG 2 480963 12) 118 DD (empty) 130 "sw3" 134 "solar wind simulation: third try. 8/8/88" 175 <data for image palette> 943 <data for 1st image dimension rec>: 400, 600 947 <data for 1st raster image> 240947 <data for 2nd image dimension rec>: 400, 600 240951 <data for 2nd raster image> 480951 tag/refs for 1st RIG: IP8/1, ID/1, RI/1 480963 tag/refs for 2nd RIG: IP8/1, ID/2, RI/2 The structure depicted in Figure 4.1 reflects the grouping of raster image sets. This file contains the same raster image information as the file in Figure 1.5, but the information is organized into two sets and groups. Note that there is only one palette (IP8/1) and it is included in both groups. General Features of Groups Figure 4.1 also illustrates a number of important general features of groups: * The contents of each set are consistent with one another. Since the palette (IP8) is designed for use with 8-bit images, the image must be an 8-bit image, rather than a 24-bit, 12-bit, or other image. * An application program can easily process all of the images in the file by accessing the groups in the file. The non-RIG information contained in the file can be used or ignored, depending on the needs and capabilities of the application program. * There is usually more than one way to group sets. For example, an extra copy of the image palette (IP8) could have been stored in the file, so that each grouping would have its own image palette. But in this instance that is not necessary because the same palette is to be used with both images. On the other hand, in this example there are two image dimension records (one per group), even though one would suffice. * Group status does not alter the fundamental role of HDF objects. They are still accessible as individual data objects, despite the fact that they also belong to raster image sets. In a very real sense, the individual data elements are in the file, whether or not there are groups that contain them. RIGs provide an index showing what sets exist and what their members are. There is nothing to prevent the imposition of other groupings (indexes) that provide a different view of the same collection of data objects. In fact, HDF is designed to encourage the addition of alternate views, when appropriate. Raster Image Sets The raster image set (RIS) provides a framework for storing images and any number of optional image descriptors. It provides for a description of the image data layout, with the optional presence of color look-up tables, aspect ratio, color correction, associated matte or other overlay information, or any other data related to the display of the image. Raster Image Groups (RIGs) Tying everything together is the raster image group (RIG), examples of which were given earlier (Figure 4.1) A RIG contains a list of data identifiers that point in turn to the data objects that describe and make up the image. The number of entries in a RIG is variable and the presence of most of the description information is optional. Complex applications can store data identifiers of image-modifying data, such as the color table and aspect ratio, in the RIG along with the reference to the image data itself. Simple applications can use simple application level calls and ignore specialized video production or film color correction parameters. NCSA currently supports two calling interfaces, RIS8 and RIS24, defined for the easy storage and retrieval of raster images using RIGS. These interfaces are documented in the manual NCSA HDF Calling interfaces and Utilities Tags for Raster Image Sets The tags presented in Table 4.1 must be fully supported by any raster image set implementation. Table 4.1 Tags for Raster Image Sets Tag Contents of Data Element RIG raster image group ID image dimension record RI raster image data With full support for the above tags, images can be stored and read from HDF files at any bit depth, with several different component ordering schemes. As illustrated in Fig. 4.1, the RIG tag points to a collection of the tag/refs that make up the RIG. The ID data element identifies the dimensions of the image, the number type of the elements that make up its pixels, the number of elements per pixel, the interlace scheme used and the compression scheme used, if any. The RI data element contains the actual raster image data. *** INSERT FIGURE HERE *** In addition to the required tags that define an image dataset, the tags listed in Table 4.2 define color properties and other image features. These tags are described fully in Appendix A. Table 4.2 Additional Tags for Raster Image Sets Tag Contents of Data Element XYP XY position of image LD look-up table dimension record LUT color look-up table for non true-color Images MD matte channel dimension record MA matte channel data CCN color correction factors CFM color format designation AR aspect ratio MTO machine-type override Fig. 4.2 illustrates the storage of a RIS that contains an image palette (IP8), in addition to the required tags. *** INSERT FIGURE HERE *** Compression of Raster Images Tags for two types of compression have been defined for raster images. They are run-length encoding (RLE) and IMCOMP aerial averaging (IMC). Others may be added at any time. Each encoding tag is documented under its specific tag type (see Appendix A). Support for RIG and RI does not require that all of the compression tag types be supported. If you find an unknown compression type, provide a suitable error message to the user. Scientific Datasets The scientific dataset (SDS) provides a framework for storing multidimensional arrays of data, together with descriptive information about the data. Current specifications support the following types of numbers in SDS arrays. * 8-bit, 16-bit and 32-bit signed and unsigned integers * 32-bit and 64-bit floating point numbers SDS numbers can be stored either as IEEE Standard integers or floats or in the format used by the machine from which they were written ("native mode"). Rank and dimension sizes may vary. A user interface exists for storing and retrieving SDS. See the NCSA HDF manual for details. Internal structures For reasons having to do with backward compatibility, the group structure that HDF uses for SDS is complicated. HDF 3.1 and previous versions only supported 32-bit IEEE floating-point numbers and Cray floating point numbers in' scientific data sets. HDF 3.2 and later releases support 8-bit, 16-bit, and 32-bit signed and unsigned integers, and 32-bit and 64-bit floating-point numbers. It also allows data sets to be written to HDF files in the local machine format ("native mode"). Furthermore, it is anticipated that later versions of HDF will support new number types and other variations in the physical storage of scientific data, such as compressed data. The internal structure used to store SDS in HDF 3.1 and earlier versions was not adequate to support the anticipated future changes to SDS. A new structure had to be developed. At the same time, it was important to try to retain compatibility with earlier versions of the HDF library. Earlier versions of the library should be able to read SDS written by HDF 3.2, if the SDS is "understandable" by that earlier software, i.e. if the number type of the data is 32- bit IEEE floating point or Cray floating point. Likewise, new libraries (HDF 3.2 and beyond) should be able to recognize SDS written by earlier versions of the library. This compatibility is achieved by examining every SDS that is written to an HDF file. If the SDS is compatible with older libraries, it is written to the file using the old structure used to represent SDS, as well as the new structure. If it is not compatible with older libraries, only the newer structure is used. The old structure for storing SDS is called SDG ("scientific data group"). The newer structure is called NDG ("numeric data group"). Hence, SDS user interfaces in HDF3.2 and beyond handle three types of numerical data groups: 1. SDG-created by old libraries and containing floating-point data. 2. NDG-created by the new library and containing non-floating-point data. This data group should not be recognized by old libraries. 3. SDG-like NDG-created by the new library and containing IEEE 32-bit floating-point data only. The old libraries should be able to recognize and interpret this kind of numerical data groups correctly. In the following sections, we described the SDG and NDG grouping structures. SDG structure Scientific datasets represented internally by the SDG tag must always contain at least the data objects listed in Table 4.3. Table 4.3 Required Tags for SDG Tag Contents of Data Element SDG scientific data group SDD scientific data dimension record for array- stored data. It includes the rank (number of dimensions) the size of each dimension, the tag/ref's representing the number types of the array-stored data and of each dimension. In the case of SDG, the number types are all 32-bit IEEE floating-point values. SD scientific data The data objects presented in Table 4.4 are optional. NCSA's SDS user interface supports these objects Table 4.4 Optional Tags for SDG Tag Contents of Data Element SDS scales along the different dimensions to be used when interpreting or displaying the data (must be of type float32). SDL labels for all dimensions and for the data. Each of the dimension labels can be interpreted as an independent variable, and the data label as the dependent variable. SDU units for all dimensions and for the data. SDF format specifications to be used when displaying values of the data. SDM maximum and minimum values of the data (must be of type float32). SDC coordinate system to be used when interpreting or displaying the data. As illustrated in Fig. 4.3, the SDG tag points to a collection of the tag/refs that make up the SDG. *** INSERT FIGURE HERE *** NDG structure SDS represented internally by the NDG tag must always contain at least the data objects listed in Table 4.5 Table 4. 5 Required Tags for NDG Tag Contents of Data Element NDG Numerical data group SDD Scientific data dimension record for array- stored data. It includes the rank (number of dimensions), the size of each dimension, the tag/ref's representing the number types of the array-stored data and of each dimension. In HDF 3.2 , the number types of dimension scales are forced to be the same as the array- stored data, but in later implementations each dimension scale will be allowed its own type. SD Scientific data. NT Number type of the data set. Default of NT is the value most recently set by DFSDsetNT(). If no DFSDsetNT() was called previously, the default will be set as floating-point. The data objects presented in Table 4.6 are optional. NCSA's SDS user interface in HDF 3.2 and later versions supports these objects. Other optional objects can be added at any time. Table 4.6 Optional Tags for NDG, HDF 3.2. Tag Contents of Data Element SDS scales along the different dimensions to be used when interpreting or displaying the data.. SDL labels for all dimensions and for the data. Each of the dimension labels can be interpreted as an independent variable, and the data label as the dependent variable. SDU units for all dimensions and for the data. SDF format specifications to be used when displaying values of the data. SDM maximum and minimum values of the data. SDC coordinate system to be used when interpreting or displaying the data. As illustrated in Fig. 4.4, the NDG is identical to the SDG, except that the NDG tag is different. This insures that older (pre-HDF 3.2) software cannot recognize this form of SDS. *** INSERT FIGURE HERE *** SDG-like NDG structure An SDS written by HDF 3.2 or later that is compatible with earlier SDS is represented internally by both an SDG and an NDG. Table 4.7 lists the objects that this group must always contain. Table 4.7 Required Tags for NDG structure that is compatible with SDG structure Tag Contents of Data Element NDG Numerical data group SDG Scientific data group SDLNK The NDG and SDG linked to the scientific data set in this group. SDD Scientific data dimension record for array- stored data. It includes the rank (number of dimensions), the size of each dimension, the tag/ref's representing the number types of the array-stored data and of each dimension. In an SDG-like NDG the number types are all 32-bit IEEE floating-point values. SD Scientific data *** INSERT FIGURE HERE *** Compatibility with future NDG structures It is likely that future versions of SDS will support optional features that are not supported by the current version. These features fall into two general categories: * optional-compatible features: optional features that are compatible with older versions of HDF even though they may not be supported by older versions of HDF. * For example, suppose a new attribute such as a time stamp, is added to SDS. Such an attribute would not be "understood" by older libraries, but it would not render the SDS data unreadable by the older libraries. * Optional-incompatible features: optional new features that might not be compatible with older versions of HDF in the sense that they could render the data unreadable by older HDF libraries. For example, suppose compression is added to SDS. Since some older HDF libraries contain no compression routines, they would not be able to read the compressed data correctly. The scheme that has been developed to address this problem involves numbering conventions for tags. The following conventions are used: * Required tags. These tags are described in Tables 4.4 and 4.5. All SDS must contain all of the tags in at least one of these sets. * Optional-compatible tags. These tags can have any valid tag number except those in the other two categories. * Optional-incompatible tags. A range of tags is defined for SDS features that might render the dataset unreadable by older versions of the library. This range has been specified as tag numbers 780-799. Vsets and vdatas An HDF Vset is a logical grouping of HDF data objects within an HDF file. Data organization within the file resembles the UNIX file system in that it is basically hierarchical in structure and also allows cross-linking of data objects. Unlike Scientific Data Sets and Raster Image Sets, Vsets have no prespecified content or structure. Users can use them to create structural relationships among HDF objects according to their needs. Figure 4.6 illustrates a Vset. *** INSERT FIGURE HERE *** A Vset is represented by a vgroup, an HDF object that contains information about the members of the Vset. The vgroup tag is VGDESCTAG. The VGDESCTAG record contains a list of the data identifiers of its members, an optional user-specified name, an optional user-specified class, and some fields that enable it to be extended to contain more information. The VGDESCTAG is described fully in Appendix A. A full treatment of Vsets can be found in the manual "NCSA HDF Vset, Version 2.0". An HDF object that is often used in connection with Vsets is the vdata. A vdata is a table. The data in a vdata is organized into fields. Each field is identified by a unique fieldname. The type of each field may be any of the data types supported by the SDS interface: 8-, 16-, and 32-bit integers (signed or unsigned), and 32- and 64-bit floats. Several fields of different types may exist within a vdata. appendix A contains full descriptions of the vdata tags (VSDESCTAG and VSDATATAG). A full treatment of vdatas can be found in the manual "NCSA HDF vset, Version 2.0". Chapter Appendix: The Raster-8 Set The raster image set (RIS), as described above, is the set currently supported by HDF for managing raster images. Before the RIS was added to HDF, a simpler, less flexible set called the raster-8 set was used for storing 8-bit raster images. This set is no longer supported in the HDF software, although it may turn up in some older HDF files. In fact, during the first three years that RIS was used, the HDF software stored raster images in both RIS and raster-8 sets. Raster-8 Sets The raster-8 set is a set of tags that provide the basic information necessary to store 8-bit raster images in a data file and display them accurately without prompting the user to supply dimensions or color information. The raster-8 set consists of the tags presented in Table 4.8. Table 4.8 Tags for Raster-8 Sets Tag Contents of Data Element RI8 eight-bit raster image data CI8 eight-bit raster image data compressed with run-length encoding II8 IMCOMP compressed image data ID8 Image dimension record IP8 Image palette data If you develop software for processing raster-8 sets, it must support RI8, ID8, and IP8. If you do not implement CI8 or II8, then be sure to provide appropriate error indicators to higher layers that might expect to find these tags. Compatibility between Raster-8 and Raster Image Sets In order to maintain backward compatibility with raster-8 sets, raster image set interface has stored tag/refs for both types of sets in HDF raster image files. For example, if an image is stored as part of a raster image set, there was one copy each of the image dimension data, image data, and palette data stored, but there were two sets of tag/refs pointing to each data element, one from each set. The image data, for instance, was associated with tag RI8 and RI. NOTE: Although this policy is continued in the current release (HDF 3.2), future plans call for phasing out the use of the raster-8 structure. Therefore, future software should not expect to find both raster-8 and RIS structures supporting 8-bit raster images. Only RIS structures will eventually be used exclusively. Chapter 5 Annotations Chapter Overview Types of Annotations File Annotations Object Annotations Getting Reference Numbers for Object Annotations Chapter Overview This chapter introduces and describes HDF objects that can be used to annotate HDF files and HDF objects.. Types of Annotations It is often useful to associate in text form information about an HDF file and its data contents, and to keep that information in the same file that contains the data. HDF provides this capability in the form of annotations. An HDF annotation is a sequence of ASCII characters that is associated its one of three types of objects: (1) the file itself, (2) the individual HDF data objects in the file, or (3) the tags that identify the data elements. The current annotation interface supports only the first two types of annotation. This interface is described in detail in the manual NCSA RDF Calling Interfaces and Utilities.. Annotations are optionally supplied by a creator or user of an HDF file or data object. Annotations come in two forms: labels, which normally consist of short strings of characters, and descriptions, which can be long and complex bodies of text. Table 5.1 shows the types of annotations currently defined for HDF files and their tag names. Table 5.1 HDF Annotation tags "Label" "Description" File Annotations FID FD Object Annotations DIL DIA Tag Annotations TID TD File Annotations Any HDF file can have labels (FID) and descriptions (FD)stored in them.. There are routines in the annotations interface specifically designed for reading and writing file IDs and file descriptions. Specifications for the tags FID and FD are given in Appendix A. Object Annotations The annotation of HDF data objects is complicated by the fact that you have to uniquely identify the objects being annotated. Since a data identifier (tag/ref) for a data object uniquely identifies that object, the data object that a particular annotation refers to can be identified by storing the object's tag and reference number together with the annotation. Note that an RDF annotation is itself a data object, so it has its own DD. This DD has a tag and a ref. number, and it points to the "data" that constitutes the annotation. The "data" that goes with an annotation consists of three things: (1) the tag of the object that it is an annotation for, (2) the ref of the object that it is an annotation for, and (3) the annotation itself. For example, suppose you have an HDF file that contains three scientific datasets (SDS). Each SDS has its own DD consisting of the SDS tag DFTAG-STG, and a unique reference number as illustrated in Figure 5.1. *** INSERT FIGURE HERE *** Suppose you wish to annotate the second SDS by storing the following annotation with it in the file: "Data from black hole experiment 8/18/87." This text would be stored in an HDF file as an annotation, and it would have stored with it the tag DFTAG-SDG and reference number 4. Figure 5.2 illustrates how the annotation would look in the file. *** INSERT FIGURE HERE *** Getting Reference Numbers for Object Annotations Note that in order to use annotation routines, you need to know the tags and reference numbers of the objects you wish to annotate. Special routines are available for obtaining the reference numbers of certain tags, including tags for SDSs, Raster Image Sets, palettes, and annotations. These are: DFSD1astref, DFR81astref, DFP1astref, and DFAN1astref. They return the most recent reference number used in either reading or writing the corresponding data object. Reference numbers for objects other than these can be obtained with the routine Hfindnextref, a general purpose HDF routine that searched through an HDF file for reference numbers that go with a given tag. These routines are described and illustrated in the manual "NCSA HDF Calling Interfaces and Utilities." Chapter 6 NCSA HDF Tags Chapter Overview The HDF Tag Space Physical Storage Methods Specifications of Supported Tags Chapter Overview This chapter addresses issues related to HDF tags and the data they represent. The first section discusses some general information about tags and their interpretation. The remainder of the chapter contains a complete list of HDF tags that have been assigned by NCSA as of version 3.2 of the library and a detailed discussion of their specifications. The HDF Tag Space As discussed in the chapter entitled "The Basic Structure of HDF Files," there are 16 bits allotted to an HDF tag number, providing for 65535 possible tags ranging from 1 to 65535, with zero (0) unused. This tag space is broken down into three ranges as shown below. 1--32767 reserved for NCSA-supported tags 32768--64999 user-definable 65000--65535 reserved for expansion of the format No restrictions are placed on the user-definable tags, but it should be noted that tags from this range cannot be guaranteed to be unique across all user-developed HDF applications. The rest of this chapter will be devoted to the NCSA-supported tags in the range 1 to 32767. Physical Storage Methods In previous versions of HDF, each data element was required to occupy one contiguous block of space in a single file. But, beginning with HDF Version 3.2, a mechanism was added to support different methods of physical storage of data elements. The new mechanism is called the "extended tag." Any of the NCSA standard tags can take advantage of the new features of the extended tags. Extended tags are automatically recognized by the library and interpreted according to a description record. The description record is a complete data element unto itself which identifies the type of extended element and provides the relevant parameters for retrieval of that element. Currently, there are two types of extended tags, both of which offer alternate methods of physical storage: linked block elements and external elements. Linked Block Elements Linked block elements provide a convenient way of adding data to a pre-existing element. They consist of a series of blocks of data chained together in a linked list (similar to the DD list). In general, the data blocks are of a uniform size. However, the first block is considered a special case and is allowed to have a different size from the rest of the blocks. The description record for a linked block element begins with the constant EXT_LINKED, which identifies the linked block storage method. It also contains information about the organization of the linked block element as a whole. Figure 6.1 shows a diagram of a description record for a linked block element. *** INSERT FIGURE HERE *** <extended tag> any NCSA standard tag converted to an extended tag (16-bit integer) <ref no> reference number (16-bit integer) EXT_LINKED constant identifying this as a linked block description record (32-bit integer) <length> length of entire element (32-bit integer) <first len> length of the first data block (32-bit integer) <blk len> length of successive data blocks (32-bit integer) <num blk> number of blocks per block table (32-bit integer) <link ref> reference number of first block table (16-bit integer) The <link ref> field of-the description record gives the reference number of the first linked block table for the element. This table is identified by the tag DFTAG_LINKED and contains <num blk> entries. There may be any number of linked block tables chained together to describe a linked block element. Figure 6.2 shows a diagram of a linked block table. *** INSERT FIGURE HERE *** <link ref> reference number for this block table (16-bit integer) <next ref> reference number for next block table (16-bit integer) <blk ref n> reference number for data block (16-bit integer) The <next ref> field contains the reference number of the next linked block table. A value of zero (0) in this field indicates that there are no additional linked block tables associated with this linked block element. The <blk ref n> fields of each linked block table contain reference numbers for the individual data blocks that make up the data portion of the linked block element. These data blocks are also identified by the tag DFTAG_LINKED as shown in Figure 6.3. Although it may seem ambiguous to use the same tag to refer to two different objects, this ambiguity is alleviated by the context in which the tags appear. *** INSERT FIGURE HERE *** <blk ref n> reference number for this data block (16-bit integer) <data block> block of actual data (size given by <first len> or <blk len> from the description record) Linked block elements can be created using the function HLcreate(), which is discussed in detail in the chapter "The NCSA HDF General Purpose Interface." External Elements External elements allow the data portion of an HDF element to reside in a separate file. The potential of external data elements is largely unexplored in the HDF context, although other file formats (most notably CDF) have used external data elements apparently to great advantage. Because there has been little discussion of external elements within the HDF user community, the structure of these elements is still not completely defined. Figure 6.4 shows a diagram of the proposed structure for an external element. *** INSERT FIGURE HERE *** <extended tag> any NCSA standard tag converted to an extended tag (16-bit integer) <re no> reference number (16-bit integer) EXT_EXTERN constant identifying this as an external element description record (16-bit integer) <offset> location of the data within the external file (32-bit integer) <length> length in bytes of the data in the external file (32-bit integer) <filename> non-null terminated ASCII string containing the name of the external file in which the data resides (any length) The description record for an external element begins with the constant EXT_EXTERN, which identifies the external storage method. It also contains information about how to find the element. External elements can be created using the function HXcreate() , which is discussed in detail in the chapter "The NCSA HDF General Purpose Interface." Specifications of Supported Tags The following pages contain the specifications of all the tags that are officially supported as of HDF version 3.2. Each entry is to be interpreted as follows: * The word id capital letters on the left is the tag name. * The three short lines at the beginning of each description uniquely identify the tag: The first line is the full name of the tag. The second line describes the type and (where possible) the amount of data in the corresponding data element. When the data element is a variable-sized data structure-such as text, a string, or a variable-sized array-the amount of data cannot be specified exactly. Where possible, a formula is given for estimating the amount of data. If the second line is "? bytes, it means that neither the size nor the structure of the data element can be specified. The third line gives the tag number in decimal and (hexadecimal). * Next is a diagram showing, as nearly as possible, the structure of the tag and its associated data. * Finally, a full specification of the tag is presented, including a description of the data element and a discussion of its intended use. These listings are grouped approximately according to the roles that the tags play under the headings Utility Tags, Annotation Tags, Raster Image Tags, and so forth. These groupings imply a general context for the use of each tag, but are not meant to restrict the use of the tags to any particular context. Please note that the subsection under the heading Obsolete Tags contains the specifications for tags that have fallen out of use with the continuing development of HDF. These tags are still recognized by the HDF library, but it is not recommended that users write out new objects using these tags, since some of them may eventually be dropped from the HDF specification. Utility Tags DFTAG_NULL No data 0 bytes 1 (0X0001) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer; always 0) This tag is used for place holding and to fill empty portions of the data description block. The length and offset fields (not shown) of a NULL DD must be equal to zero. DFTAG_VERSION Library version number 12 bytes plus the length of a string 30 (0x001E) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <major> Major version number (32-bit integer) <minor> minor version number (32-bit integer) <release> release number (32-bit integer) <string> non-null terminated ASCII string (any length) The data portion of this tag gives the complete version number and a descriptive string for the latest version of the HDF library to write to the file. DFTAG_NT Number type 4 bytes 106 (0x006A) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <version> version number of NT information (8-bit integer) <type> unsigned int, signed int, unsigned char, char, float, double (8-bit code) <width> number of bits (assumed all significant) (8-bit code) <class> a generic value, with different interpretations depending on type: floating-point, integer, or character (8-bit code) Some possible :values that may be included for each of the three types in the field CLASS are listed in Table 6.1. Table 6.1 Number Type Values Type Possible Values floats DFNTF_NONE DFNTF_IEEE DFNTF_VAX DFNTF_CRAY DFNTF_PC DFNTF_CONVEX ints DFNTI_MBO DFNTI_IBO DFNTI-VBO chars ASCII EBCDIC, BYTE The number type flag is used by any other element in the file to indicate specifically what a numeric value looks like other tag types should contain a reference number pointer to an DFTAG_NT instead of containing their own number type definitions. The version field allows expansion of the number type information, in case some future number types cannot be described using the fields currently defined. Successive versions of the DFTAG_NT may be substantially different from the current definition, however, backward compatibility will be maintained. The current DFTAG_NT. version number is 1. DFTAG_MT Machine type 0 bytes 107 (0x006B) *** INSERT FIGURE HERE *** <double> specifies method of encoding double precision floating point (4-bit code) <float> specifies method of encoding single precision floating point (4-bit code) <int> specifies method of encoding integers (4-bit code) <char> specifies method of encoding characters (4-bit code) The DFTAG_MT specifies that all unconstrained or partially constrained values in this HDF file are of the default type for that hardware. When the DFTAG_MT is set to VAX, for example, all integers will be assumed to be in VAX byte order unless specifically defined otherwise with a DFTAG NT. Note that all of the headers and many tags, the whole raster image set for example, are defined with bit-wise precision and will not be overridden by the DFTAG_MT setting. For DRTAG_MT, the reference field itself is the encoding of the DFTAG_MT information. The reference field is 16 bits, taken as four groups of four bits, specifying the types for double, float, int and char respectively. This allows 16 generic specifications for each type. To the user, these will be defined constants in the header file hdf.h, specifying the proper descriptive numbers for Sun, VAX, Cray, Convex, and other computer systems. If there is no DFTAG_MT in a file, the application may assume that the data in the file has been written on the local machine--assuming any portability problems are taken care of by the user. For this reason, we recommend that all HDF files contain a DFTAG_MT for maximum portability. Possible data encodings are shown in Table 6.2. Table 6.2 Possible Machine Types Type Possible Encodings double IEEE64, VAX64, CRAY128 floats IEEE32, VAX32, CRAY64 ints VAX32, Intell6, Intel32, Motorola32, CRAY64 chars ASCII, EBCDIC New encodings can be added for each data type, as the need arises. DFTAG_FID File identifier string 100 (0x0064) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <character string> non-null terminated ASCII text (any length) This tag points to a string which the user wants to associate with this file. The string is not null terminated. The string is intended to be a user-supplied title for the file. DFTAG_FD File description text 101 (0x0065) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <text block> non-null terminated ASCII text (any length) This tag points to a block of text describing the overall file contents. The text can be any length. The block is not null terminated. The text is intended to be user-supplied comments about the file. DFTAG_TID Tag identifier string 102 (0x0066) *** INSERT FIGURE HERE *** <tag> tag number to which this tag refers (16-bit integer) <character string> non-null terminated ASCII text (any length) The data for this tag is a string that identifies the functionality of the tag indicated in the space normally used for the reference number. For example, the tag identifier for DFTAG_TID might point to data that reads "tag identifier." Many tags are identified in the HDF specification, so it is usually unnecessary to include their identifiers in the HDF file. But with user-defined tags or special-purpose tags, the only way for a human reader to diagnose what kind of data is stored in a file is to read tag identifiers. Use tag descriptions to define even more detail about your user-defined tags. Note that with this tag you may make use of the user-defined tags to check for consistency. Although two persons may use the same user-defined tag, they probably will not use the same tag identifier. DFTAG_TD Tag description text 103 (0x0067) *** INSERT FIGURE HERE *** <tag> tag number to which this tag refers (16- bit integer) <text block> non-null terminated ASCII text (any length) The data for this tag is a text block which describes in relative detail the functionality and format of the tag which is indicated in the space normally occupied by the reference number. This tag is mainly intended to be used with user-defined tags and provides a medium for users to exchange files that include human-readable descriptions of the data. It is important to provide everything that a programmer might need to know to read the data from your user-defined tag. At the minimum, you should specify everything you would need to know in order to retrieve your data at a later date if the original program were lost. DFTAG_DIL Data identifier label string 104 (0x0068) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <obj tag> tag number of the data to which this label applies (16-bit integer) <obj ref no> reference of the data to which this label applies (16-bit integer) <character string> non-null terminated ASCII text (any length) The data for this tag is a data identifier, made up of a tag and reference number, followed by a string that the user wants to place in the file. The purpose of this tag is to associate the string with the data identifier as a label for whatever that data identifier refers to in turn. By including DFTAG_DILs, you can give a data object a label for future reference. For example, DFTAG_DIL is often used to give titles to images. DFTAG_DIA Data identifier annotation text 105 (0x0069) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <obj tag> tag number of the data to which this annotation applies (16-bit integer) <obj ref no> reference of the data to which this annotation applies (16-bit integer) <text block> non-null terminated ASCII text (any length) The data for this tag is a data identifier, which is made up of a tag and a reference number, followed by a text block that the user wants to place in the file. Its purpose is to associate the text block with the data identifier as an annotation for whatever that data identifier points to in turn. With DFTAG_DIA, any data object can have a lengthy, user-written description of why that data is in the file. This can be used to include user comments about images, datasets, source code, and so forth. Compression Tags DFTAG_RLE Run length encoded data 0 bytes 11 (0X000B) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) This tag is used in the compression field of a DFTAG_ID and other places to indicate that an image or section of data is encoded with a run-length encoding scheme. The RLE method used is byte-wise. Each run is preceded by a count byte. The low seven bits of the count byte indicate the number of bytes (n). The high bit of the count byte indicates whether the next byte should be replicated n times (high bit=1), or whether the next n bytes should be included as is (high bit=0). See also: DFTAG_ID (General Raster Image Tags) DFTAG_NDG (Scientific Dataset Tags) DFTAG_IMC IMCOMP compressed data 0 bytes 12 (0X000C) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) This tag is used in the ID compression field and other places to indicate that an image or section of data is encoded with an IMCOMP encoding scheme. This scheme is a 4:1 aerial averaging method which is easy to decompress. It counts color frequencies in 4x4 squares to optimize color sampling. See also: DFTAG_ID (General Raster Image Tags) DFTAG_NDC (Scientific Dataset Tags) DFTAG_JPEG 24-bit JPEG compression information ? bytes 13 (0X000D) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) This tag points to header information for 24-bit JPEG compressed images. The data in this tag is identical to the data stored in a JFIF (JPEG File Interchange Format) file up to the Start-of-Frame parameter (see the JFIF format document for further details). The Start-of-Frame parameter and all further data for the JPEG image is stored the in associated DFTAG_CI data element which is the companion to the DFTAG_JPEG element. DFTAG_GREYJPEG 8-bit JPEG compression information ? bytes 14 (0X000E) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) This tag points to header information for 8-bit JPEG compressed images. The data in this tag is identical to the data stored in a JFIF (JPEG File Interchange Format) file up to the Start-of-Frame parameter (see the JFIF format document for further details). The Start-of-Frame parameter and all further data for the JPEG image is stored the in associated DFTAG-CI data element which is the companion to the DFTAG-JPEG element. General Raster Image Tags DFTAG_RIG Raster image group n*4 bytes (where n is the number of data objects in the group.) 306 (0x0132) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <tag n> tag number for nth member of the group (16-bit integer) <ref n> reference number for nth member of the group (16-bit integer) The raster image group (RIG) data is a list of data identifiers (tag/ref) that describe a raster image. All of the members of the group are required in order to display the image correctly. Application programs that deal with RIGs should read all the elements of a RIG and process those identifiers which it can display correctly. Even if the application cannot process all of the tags, the tags that it can process will be usable. Tag types that may appear in a RIG are listed in Table 6.3. Table 6.3 Possible Tag Types in an RIG Tag Description DFTAG_ID Image dimension DFTAG_RI raster image DFTAG_XYP X-Y position DFTAG_LD LUT dimension DFTAG_LUT color lookup table DFTAG_MD matte channel dimension DFTAG_MA matte channel DFTAG_CCN color correction DFTAG_CFM color format DFTAG_AR aspect ratio Example ID, RI, LD, LUT An image dimension record, the raster image, an LUT dimension and the LUT go together. The application reads the image dimensions, then reads the image with those dimensions. It also reads the lookup table according to its dimensions and displays the corresponding image. DFTAG_ID, DFTAG_LD, DFTAG_MD Image dimension LUT dimension Matte dimension 20 bytes 20 bytes 20 bytes 300 (0x012C) 307 (0x0133) 308 (0x0134) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <x dim> length of x (horizontal) dimension (32-bit integer) <y dim> length of y (vertical) dimension (32-bit integer) <NT ref> reference number of number type information for associated object <elements> number of elements that comprise one entry (16-bit integer) <interlace> defines type of interlacing used (16-bit integer) <comp tag> tag which tells the type of compression used and any associated parameters (16-bit integer) <comp ref> reference number of compression tag (16-bit integer) The three dimension records have exactly the same format. They define the dimensions of the 2D array to which they refer. The diagram above pictures a DFTAG_ID for illustration. A DFTAG_ID specifies the dimensions of a DFTAG_RI, DFTAG_LD specifies the dimensions of a DFTAG_LUT, and DFTAG_HD specifies the dimensions of a DRTAG_MA. For example, a 512x256 row-wise 24-bit raster image with each pixel stored as RGB bytes would have the following values: <x dim>: 512 <y dim>: 256 <NT ref> UINT8 <elements> 3 (3 elements per pixel: e.g., R,G and B) <interlace> 0 (RGB values not separated) <comp tag> 0 (no compression is used) DFTAG_RI Raster image xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize are given by the corresponding DFTAG_ID) 302 (0x012E) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) This tag points to raster image data. It is stored in row-major order and must be interpreted as specified in a DFTAG_ID: <interlace>=0 means the components of each pixel are together. <interlace>=1 means color elements are grouped by scan lines. <interlace>=2 means color elements are grouped by planes. DFTAG_LUT Lookup table xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize are given by the corresponding DFTAG_ID) 301 (0x012D) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <Pn m> Mth value of parameter n (size is given by the DFTAG_NT in the corresponding DFTAG_LD) The DFTAG-LUT, sometimes called a palette, is used by many kinds of hardware to assign colors to data values. When a raster image consists of data values which are going to be interpreted through hardware with a LUT capability, the DFTAG_LUT should be loaded along with the image. The most common lookup table is the RGB lookup table which will have X dimension-256 and Y dimension-1 with three elements per entry, one each for red, green, and blue. The interlace will be either 0, where the LUT values are given RGB, RGB, RGB ..., or 1, where the LUT values are given as 256 reds, 256 greens, 256 blues. DFTAG_MA matte channel xdim*ydim*elements*NTsize bytes (xdim, ydim, elements, and NTsize are given by the corresponding DFTAG_ID) 309 (0x0135) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) The DFTAG_MA contains transparency data which can be used to facilitate the overlaying of images. The data consist of a two-dimensional array of unsigned 8-bit integers ranging from 0 to 255. Each point in a DFTAG-MA indicates the transparency of the corresponding point in a raster image of the same dimensions. A value of 0 indicates that the data at that point is to be considered totally transparent, while a value of 255 indicates that the data at that point is totally opaque. It is assumed that a linear scale applies to the transparency values, but users may opt to interpret the data in any way they wish. DFTAG_CCN Color correction 52 bytes (usually) 310 (0x0136) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <gamma> gamma parameter (32-bit IEEE float) <red x/y/z> red x/y/z correction factors (32-bit IEEE floats) <green x/y/z> green x/y/z correction factors (32-bit IEEE floats) <blue x/y/z> blue x/y/z correction factors (32-bit IEEE floats) <white x/y/z> white x/y/z correction factors (32-bit IEEE floats) Color correction specifies the Gamma correction for the image and color primaries for the generation of the image. DFTAG_CFM Color format string 311 (0x0137) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <character string> non-null terminated ASCII string (any length) The color format is a clue to how each element of each pixel in a raster image can be interpreted. It is defined to be a string which is in all caps, and is one of the values shown in Table 6.4. Table 6.4 Color Format String Values String Description VALUE pseudo-color, or just a value associated with the pixel RGB red, green, blue model XYZ color-space model HSV hue, saturation, value model HSI hue, saturation, intensity SPECTRAL spectral sampling method DFTAG_AR Aspect ratio 4 bytes 312 (0x0138) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <ratio> ratio of width to height (32-bit IEEE float) The data for this tag is the visual aspect ratio for this image. The image should be visually correct if displayed on a screen with this aspect ratio. The data consists of one floating-point number which represents width divided by height. An aspect ratio of 1.0 indicates a display with perfectly square pixels; 1.33 is a standard aspect ratio used by many monitors. Composite Image Tags DFTAG_DRAW Draw n*4 bytes (where n is the number of data objects that comprise the composite image.) 400 (0x0190) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <tag n> tag number of the nth member of the draw list (16-bit integer) <ref n> reference number of the nth member of the draw list (16-bit integer) The data for this tag is a list of data identifiers (tag/ref pairs) which define a composite image. Each member of the DRTAG_DRAW data should be displayed, in order, on the screen. This can be used to indicate several RIGs which should be displayed simultaneously, or even include vector overlays, like DRTAG_T14, which should be placed on top of a RIG. Some of the elements in a DRAW list may be instructions about how images are to be composited (XOR, source put, anti-aliasing, etc.). These are defined as individual tags. DFTAG_XYP XY position 8 bytes 500 (0x01F4) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <x> x-coordinate (32-bit integer) <Y> y-coordinate (32-bit integer) A DFTAG_XYP is used in composites-and other groups to indicate an XY position on the screen. For this, (0,0) is the lower left, X is the number of pixels to the right along the horizontal axis and Y is the number of pixels on the vertical axis. The X and Y pixel dimensions are given as two 32-bit integers. For example, if DFTAG_XYP is present inside a DFTAG_RIG, the DFTAG_XYP refers to the position of the lower left corner of the raster image on the screen. See also: DFTAG_DRAW (this section) Vector Image Tags DFTAG_T14 Tektronix 4014 ? bytes 602 (0x25A) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) This tag points to a Tektronix 4014 data stream. The bytes in the data field, when read and sent to a Tektronix 4014 terminal, will display a vector image. Only the lower seven bits of each byte are significant. There are no record markings or non-Tektronix codes in the data. DFTAG_T105 Tektronix 4105 ? bytes 603 (0x25B) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) This tag points to a Tektronix 4105 data stream. The bytes in the data field, when read and sent to a Tektronix 4105 terminal, will be displayed as a vector image. Only the lower seven bits of each byte are significant. Some terminal emulators will not correctly interpret every feature of the Tektronix 4105 terminal, so you may wish to use only a subset of the possible Tektronix 4105 vector commands. Scientific Dataset Tags DFTAG_NDG Numeric data group n*4 bytes (where n is the number of data objects in the group.) 720 (0x02D0) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <tag n> tag number of nth member of the group (16-bit integer) <ref n> reference number of nth member of the group (16-bit integer) The numeric data group (NDG) data is a list of data identifiers (tag/ref pairs) that describe a scientific dataset. It supercedes the old DFTAG_SDG, which has been obsoleted as of version 3.2 of the HDF library. A more complete explanation of the relationship between DFTAG_NDG and DFTAG_SDG can be found in the chapter entitled "Sets and Groups." All of the members of the group provide information for correctly interpreting and displaying the data. Application programs that deal with NDGs should read all of the elements of a NDG and process those identifiers which it can use. Even if an application cannot process all of the tags, the tags that it can understand will be usable. Tag types that may appear in a DFTAG_NDG are listed in Table 6.5. Table 6.5 Possible Tag Types in an NDG Tag Description DFTAG_SDD scientific data dimension record (rank and dimensions) DFTAG_SD scientific data DFTAG_SDS scales DFTAG_SDL labels DFTAG_SDU units DFTAG_SDF formats DFTAG_SDM maximum and minimum values DFTAG_SDC coordinate system DFTAC_CAL calibration information DFTAG_FV fill value DFTAG_LUT color lookup table DFTAG_LD lookup table dimension record DFTAG_SDLNK link to old-style DFTAG_SDG (See Sets and Groups) Example DFTAG_SDD, DRTAG_SD, DRTAG_SDM A dimension record, the scientific data, and the maximum and minimum values of the data go together. The application reads the rank and dimensions from the dimension record, then reads the data array with those dimensions. If it needs maximum and minimum, it also reads them. See also: Sets and Groups DFTAG_SDD Scientific data dimension record 6 + 8*rank bytes 701 (0x02BD) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <rank> number of dimensions (16-bit integer) <dim n> number of values along the nth dimension (32-bit integer) <data NT ref> reference number of DFTAG_NT for data (16-bit integer) <scale NT ref n> reference number for DFTAG-NT for the scale for the nth dimension (16-bit integer) This record defines the rank and dimensions of the array in the scientific dataset. For example, a DFTAG_SDD for a 500X600X3 array of floating-point numbers would have the following values and components. Rank: 3 Dimensions: 500, 600, and 3. One data NT Three scale NTs DFTAG_SD Scientific data NTsize*x*y*z* ... bytes (where NTsize is the size of the data NT given by the corresponding DFTAG_SDD and x, y, z, etc. are the dimension sizes) 702 (0x02BE) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) This tag points to an array of scientific data. The type of the data may be specified by an DFTAG_NT included with the SDG. If there is no DFTAG_NT, the type of the data is floating-point in standard IEEE 32-bit format. The rank and dimensions must be stored as specified in the corresponding DFTAG_SDD. The diagram above shows a three-dimensional data array. DFTAG_SDS Scientific data scales rank + NTsizeO*x + NTsize1*y +NTsize2*z +... bytes (where rank is the number of dimensions, x, y, z, etc. are the dimension sizes, and NTsize# are the sizes of each scale NT from the corresponding DFTAG_SDD.) 703 (0x02BF) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <is n> tells whether a scale exists for the nth dimension (8-bit integer; 0 or 1) <scale n> list of scale values for the nth dimension (type is given by corresponding DFTAG_SDD) This tag points to the scales for the dataset. The first n bytes indicate whether there is a scale for the corresponding dimension (1=yes, 0=no). This is followed by the scale values for each dimension. The scale consists of a simple series of values, where the number of values and their types are given by the corresponding DFTAG_SDD. DFTAG_SDL Scientific data labels ? bytes 704 (0x02C0) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <label n> null terminated ASCII string (any length) This tag points to a list of labels for the data and each dimension of the dataset. Each label is a string terminated by a null byte (0). DFTAG_SDU scientific data units ? bytes 705 (0x02C1) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <unit n> null terminated ASCII string (any length) This tag points to a list of strings specifying the units for the data and each dimension of the dataset. Each unit's string is terminated by a null byte (0). DFTAG_SDF Scientific data format ? bytes 706 (0x02C2) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <format n> null terminated ASCII string (any length) This tag points to a list of strings specifying an output format for the data and each dimension of the dataset. Each format string is terminated by a null byte (0). DFTAG_SDM Scientific data max/min 8 bytes 707 (0x02C3) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <max> maximum value (type is given by the data NT in the corresponding DFTAG_SDD) <min> minimum value (type is given by the data NT in the corresponding DFTAG_SDD) This record contains the maximum and minimum data values in the dataset. The type of <max> and <min> are given by the data NT of the corresponding DFTAG_SDD. DFTAG_SDC Scientific data coordinates ? bytes 708 (0x02C4) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <string> null terminated ASCII string (any length) This tag points to a string specifying the coordinate system for the dataset. The string is terminated by a null byte. DFTAG_SDLNK Scientific dataset link 8 bytes 710 (0x02C6) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) DFTAG_NDG NDG tag (16-bit integer) <NDG ref> reference number of NDG (16-bit integer) DFTAG_SDG SDG tag (16-bit integer) <SDG ref> reference number of SDG (16-bit integer) The purpose of this tag is to link together an old-style DFTAG_SDG and a DFTAG_NDG in cases where the NDG contains 32-bit floating point data and is, therefore, equivalent to an old SDG. A complete description of the use of this tag can be found in the chapter entitled "Sets and Groups" See also: Sets and Groups DFTAG_CAL Calibration information 36 bytes 731 (0x02DB) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <cal> calibration factor (64-bit IEEE float) <cal err> error in calibration factor (64-bit IEEE float) <off> calibration offset (64-bit IEEE float) <off err> error in calibration offset (64-bit IEEE float) <data type> constant representing the effective data type of the calibrated data (32-bit integer) This tag points to a calibration record for the associated DFTAG_SD. The data can be calibrated by first multiplying by the <cal> factor, then adding the <off> value. Also included in the record are errors for the calibration factor and offset and a constant indicating the effective data type of the calibrated data. Possible values of <data type> are shown in Table 6.6. Table 6.6 Possible calibrated data types Data Type Description INT8 signed 8-bit integer UINT8 unsigned 8-bit integer INT16 signed 16-bit integer UINT16 unsigned 16-bit integer INT32 signed 32-bit integer UINT32 unsigned 32-bit integer FLOAT32 32-bit float FLOAT64 64-bit float DFTAG_FV Fill value ? bytes (size given by size of data NT in corresponding DFTAG_SDD) 732 (0x02DC) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <fill value> value representing unset data in the corresponding DFTAG_SD (size given by size of data NT in corresponding DFTAG_SDD) This tag points to a value which has been used to indicate unset values in the associated DFTAG_SD. The number type of the value (and, therefore, its size) is given in the corresponding DFTAG_SDD. Vset DFTAG_VG Vgroup 14 + 4*nelt + namelen + classlen bytes (where nelt, namelen, and classlen are given below) 1965 (0x07AD) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <nelt> number of elements in the vgroup (16-bit integer) <tag n> tag of the nth member of the vgroup (16-bit integer) <ref n> reference number of the nth member of the vgroup (16-bit integer) <namelen> length of the name field (16-bit integer) <name> non-null terminated ASCII string (length given by <namelen>) <classlen> length of the class field (16-bit integer) <class> non-null terminated ASCII string (length given by <classlen>) <extag> extension tag (16-bit integer) <exref> extension reference number (16-bit integer) <version> version number of DFTAG_VG information (16-bit integer) <more> unused (2 zero bytes) The DFTAG_VG provides a general-purpose grouping structure which can be used to impose a hierarchical structure on the tags in the group. Any HDF tag may be incorporated into a vgroup (including other DFTAG_VGS). For more information about Vsets, see the chapter entitled "HDF Vsets" DFTAG_VH Vdata description 22 + 10*nfields + Sfldnmlen n + namelen + classlen bytes (where nfields, fldnmlen n, namelen, and classlen are given below) 1962 (0x07AA) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <interlace> constant indicating interlace scheme used (16-bit integer) <nvert> number of entries in vdata (32-bit integer) <ivsize> size of one vdata entry (16-bit integer) <nfields> number of fields per entry in the vdata (16-bit integer) <type n> constant indicating the data type of the nth field of the vdata (16-bit integer) <isize n> size in bytes of the nth field of the vdata (16-bit integer) <offset n> offset of the nth field within the vdata (16-bit integer) <order n> ??? of the nth field of the vdata (16-bit integer) <fldnmlen n> length of the nth field name string (16-bit integer) <fldnm n> non-null terminated ASCII string (length given by corresponding <fldnmlen>) <namelen> length of the name field (16-bit integer) <name> non-null terminated ASCII string (length given by <namelen>) <classlen> length of the class field (16-bit integer) <class> non-null terminated ASCII string (length given by <classlen>) <extag> extension tag (16-bit integer) <exref> extension reference number (16-bit integer) <version> version number of DFTAG_VH information (16-bit integer) <more> unused (2 zero bytes) DFTAG_VE provides all the information necessary to process a DFTAG_VS. For more information on Vsets, see the chapter entitled "HDF Vsets." See also: DFTAG_VS (this section) DFTAG_VS Vdata nvert * Sisize n bytes (where nvert, and isize n are given by the corresponding DFTAG_VH) 1963 (0x07AB) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <vdata> data block interpreted according to the corresponding DFTAG_VH (nvert * Sisize n bytes, where nvert, and isize are given by the corresponding DFTAG_VH) DFTAG_VS contains a block of data which is to be interpreted according to the information in the corresponding DFTAG_VR. For more information on Vsets, see the chapter entitled "HDF Vsets." See also: DFTAG_VE (this section) Obsolete Tags DFTAG_ID8 Image dimension-8 4 bytes 200 (0x00C8) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <x dim> length of x dimension (16-bit integer) <y dim> length of y dimension (16-bit integer) The data for this tag consists of two 16-bit integers representing the width and height of an 8-bit raster image in bytes. This tag has been superceded by DFTAG_ID. DFTAG_IP8 Image palette-8 768 bytes 201 (0x00C9) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) table entries 256 triples of 8-bit integers. The data for this tag can be thought of as a table of 256 entries, each containing one value for red, green, and blue. The first triple is palette entry 0 and the last is palette entry 255. This tag has been superceded by DFTAG_LUT. DFTAG_RI8 Raster image-8 xdim*ydim bytes (where xdim and ydim are the dimensions given by the corresponding DFTAG_ID8.) 202 (0X00CA) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) image data 2-d array of 8-bit integers The data for this tag is a row-wise representation of the elementary 8-bit image data. The data is stored width-first (hence row-wise) and is 8 bits per pixel. The first byte of data represents the pixel in the upper-left hand corner of the image. This tag has been superceded by DFTAG_RI. DFTAG_CI8 Compressed image-8 ? bytes 203 (0X00CB) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <compressed image> series of run-length encoded bytes The data for this tag is a row-wise representation of the elementary 8-bit image data. Each row is compressed using the following run-length encoding where n is the lower seven bits of the byte. The high bit represents whether the following n character will be reproduced exactly (high bit-0) or whether the following character will be reproduced n times (high bit=1) . Since DFTAG_CI8 and DFTAG_Rl8 are basically interchangeable, it is suggested that you not have a DFTAG_CI8 and a DFTAG_RI8 that have the same reference number. This tag has been superceded by DFTAG_RLE. DFTAG_II8 IMCOMP image-8 ? bytes 204 (0X00CC) *** INSERT FIGURE HERE *** The data for this tag is a 4:1 compressed 8-bit image, using the IMCOMP compression scheme. This tag has been superceded by DFTAG_IMC. DFTAG_SDG Scientific data group n*4 bytes (where n is the number of data objects in the group.) 700 (0x02BC) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) <tag n> tag number of nth member of the group (16-bit integer) <ref n> reference number of nth member of the group (16-bit integer) The scientific data group (SDG) data is a list of data identifiers (tag/ref pairs) that describe a scientific dataset. All of the members of the group provide information for correctly interpreting and displaying the data. Application programs that deal with SDGs should read all of the elements of a SDG and process those identifiers which it can use. Even if an application cannot process all of the tags, the tags that it can understand will be usable. Tag types that may appear in a DFTAG-SDG are listed in Table 6.7. Table 6.7 Possible Tag Types in an SDG Tag Description DFTAG_SDD scientific data dimension record (rank and dimensions) DFTAG_SD scientific data DFTAG_SDS scales DFTAG_SDL labels DFTAG_SDU units DFTAG_SDF formats DFTAG_SDM maximum and minimum values DFTAG_SDC coordinate system DFTAG_SDT transposition (obsolete) DFTAG_SDLNK link to new DFTAG_NDG (see Sets and Groups) Example DFTAG_SDD, DFTAG_SD, DFTAG_SDM A dimension record, the scientific data, and the maximum and minimum values of the data go together. The application reads the rank and dimensions from the dimension record, then reads the data array with those dimensions. If it needs maximum and minimum, it also reads them. This tag has been superceded by DFTAG_NDG. See also: Sets and Groups DFTAG_SDT Scientific data transpose 0 bytes 709 (0x02C5) *** INSERT FIGURE HERE *** <ref no> reference number (16-bit integer) The presence of this tag in a group indicates that the data pointed to by the corresponding DFTAG_SD is in column-major order, instead of the default row-major order. No data is associated with this tag. This tag will no longer be written by the HDF library, but if it is encountered in an old file it will be interpreted as originally intended. Chapter 7 Making HDF Portable Chapter Overview The HDF Environment Machines Supported Language Standards Organization of Source Files Header Files Source Code Files Passing Strings Between FORTRAN and C Passing Strings from FORTRAN to C Passing Strings from C to FORTRAN Function Return Values between FORTRAN and C Differences in Acceptable Routine Names Case Sensitivity How HDF Deals with "All-Upper Case" Compilers Appended Underscore How HDF Specifies the Appended (and Prepended) Underscore Short Names vs. Long Names ANSI C vs. Old C Type Differences Size Differences Number Representation Byte-order and Structure Representations Access to Library Functions Chapter Overview The NCSA implementation of HDF is accessible to both C and FORTRAN programs and is implemented on many different machines and several operating systems. There are important differences between C and FORTRAN, as well as between different implementations of each language, especially FORTRAN. There are also important differences between the different machines and operating systems that HDF supports. This chapter describes many of these differences, problems and issues associated with them, and methods employed in the HDF source code to deal with them. The HDF Environment The list of machines and operating systems on which HDF is implemented is steadily growing. For reasons that should soon be clear, the number of platforms on which HDF is officially supported is growing slowly. Every time a new platform is added to the list of those that HDF supports, additional code must be written that takes into account the way memory is organized, the way the operating system works, the way numbers are represented, the way the file system works, and the way FORTRAN and C works on that system. Machines Supported As of this writing, the following platforms are supported by NCSA's HDF group: Cray X-MP and Cray 2 (UNICOS) Sun Systems' Sun 3, Sun 386, and Sparcstation (Unix) Convex (Unix), Macintosh (MPW Shell) IBM PC (MS-DOS) Silicon Graphics (Unix) Vax (VMS) HP 9000 (HPUX) DecStation (Ultrix) IBM RT (Unix) In addition to these platforms, HDF has been ported to many other platforms for which support cannot currently be provided. These include Alliant, Apollo (Domain), HP 3000, Stellar, Amiga, Symbolics, NeXT, and IBM 3090 (MVS). Language Standards Unfortunately, not all compilers are the same. FORTRAN compilers often differ in the ways they pass parameters, in the identifier naming conventions they employ, and in the number types that they support. Similarly, though generally not as drastically, compilers differ in the number types that they support and in their adherence to the ANSI C standard. In order to keep these differences to a minimum, the primary dialects used for the source code in the NCSA implementation of RDF FORTRAN 77, ANSI C, and "old style C"(1), hereafter referred to as "old C". There are very few platforms whose C and FORTRAN compilers do not adhere to at least one of these standards. When time and resources permit, attempts are also made to support features or variations in other dialects of C and FORTRAN, particularly on those platforms that are important to NCSA users. Much of the remainder of this Chapter speaks to these differences. Follow these guidelines To all future HDF developers, we cannot overstress the importance of following the guidelines outlined in this Chapter. It may take longer to write code, and it may be considerably more difficult to adapt your coding style to that given here, but the long-term benefits in terms of portability and maintenance costs are well worth the effort. Organization of Source Files There are three types of files in the HDF source code directory: header files, source code files, and a makefile. Header files and source code files are organized by application area. All of the functions that apply to a particular application area are stored in three source files, and all definitions and declarations that apply are stored in a corresponding header file. The makefile describes the dependencies among the source and header files, and also provides the commands required to compile the corresponding libraries and utilities. Header Files There is one header file for each application area. The HDF Raster Image Set interface, for example, has the header file dfr8.h. It contains definitions and declarations that are unique to the interface. (1) "old style C" refers to the version of C described in the first edition of The C Programming Language, by Brian Kernighan and Dennis Ritchie, published by Prentice-Hall. Other header files include: hdf.h hdfi.h hproto.h constants.f functions.f hdf.h and hdfi.h.(1) The file hdf.h contains declarations and definitions for the common data structures used throughout HDF, definitions of the HDF tags, definitions of error numbers, and definitions and declarations specific to the general purpose interface. Since hdf.h depends on hdfi.h, it includes (via #include) hdfi.h. The file hdfi.h contains a large amount of information specific to the various computing environments supported by HDF. Those environmental parameters that need to be set to particular values when compiling the HDF library are contained in hdfi.h. Machine dependent definitions of such things as number types and macros for reading and writing numbers are also included in hdfi.h. When porting HDF to a new system, only hdfi.h and the Makefile need to be modified. Normally it is a good idea to include hdf.h (and therefore indirectly hdfi.h) in user programs, though users usually need not be aware of their contents. hproto.h. This file contains ANSI C prototypes for all HDF C routines, and must be include in ANSI-conforming C programs that make calls to HDF routines. constants.f. This file is for use in FORTRAN programs. It contains important constants, such as tag values, that are defined in hdf.h. Systems that have FORTRAN preprocessors might be able to include these files via #include statements or their equivalent. functions.f. This file is for use in FORTRAN programs. It contains declarations of all HDF FORTRAN-callable functions. Systems that have FORTRAN preprocessors might be able to include these files via #include statements or their equivalent. Source Code Files All HDF operations are performed by routines written in C. Hence, even FORTRAN calls to HDF result in calls to the corresponding C routines. However, because of the problems described below the relationships between the C routines and the corresponding FORTRAN routines can be very confusing. Before looking at the specific problems, we first describe the C and FORTRAN source file organization. (1)In earlier implementations of HDF, these files were called df.h and dfi.h. Starting with HDF 3.2 the general purpose layer of HDF was completely rewritten, and all routine names changed from "df ... " to "h ...". Each HDF interface typically has four files associated with it. The HDF Raster Image Set interface, for example, has four associated source files: dfr8.h, dfr8.c, dfr8f.c, dfr8ff.f. The suffixes on the filenames indicate their functions, as we describe next. The ".h" file is the header file. The other three files, which contain the C and FORTRAN functions, are: (1) The "normal" C routines. These routines do all of the actual HDF work. The others have the job of transferring control and data from a FORTRAN environment to a C environment. These routines are stored in files whose names end with ".c", as in "dfr8.c". Every call to HDF, whether it is a C call or a FORTRAN call, ultimately results in a call to one of these routines. (2) C routines that are compatible with FORTRAN and therefore directly callable from FORTRAN. The primary function of these routines is to provide recognizable function names to the linker. They may also perform operations on data they receive from the FORTRAN routines that call them, such as transferring a FORTRAN string to a local C data area. Examples of how they perform these operations are given below. These routines are stored in files whose names end with "f.c", as in "dfr8f.c" for the raster image interface. The "f" means that the routines are meant to be called from FORTRAN; the "c" means that they are C source code. (3) FORTRAN routines that perform some operation on the parameters that C is unable to perform, before and/or after calling the corresponding C routine. These routines are required, for example, when one of the parameters is a string. The corresponding C routine has no way of knowing the length of the string unless it is explicitly given the length by the FORTRAN routine. These routines are stored in files whose names end with "ff.f", as in "dfr8ff.f" for the raster image interface. The "f' means that the routines are to be called from FORTRAN; the first "f" means that they perform some FORTRAN operation that C cannot perform; the second "f" means that they are FORTRAN source code. The roles of these different types of source file types will become clearer as we look at some of the problems that arise in interfacing C and many different implementations of FORTRAN. File naming conventions The naming conventions for HDF library source code files are complicated by several factors. Because of the wide variety of platforms which HDF must accommodate, all files that will compile to object modules in the HDF library must have names that are unique in the first 8 characters, ignoring case. The difficulties involved in maintaining a Fortran-callable interface to a library that is primarily written in C further complicate the naming of source code files. Passing Strings between FORTRAN and C One of the most important differences between FORTRAN and C compilers is in the way strings are represented. Different compilers use different data structures for strings, and supply string length information in different ways. Passing Strings from FORTRAN to C When strings are passed between FORTRAN and C routines, they may need to be converted from one representation to the other. C compilers store strings in an array of type char, terminated by a NULL byte ('\0'). The name of a string variable is a pointer to the address of the first character in the string. FORTRAN compilers are not consistent in the ways that they store strings. Two pieces of information are needed in order to pass a string from FORTRAN to C: its length and its address. The first problem is solved by invoking the standard FORTRAN function len(), which returns the length of a string. Since C expects a '\0' (NULL) byte at the end of strings, care must be taken that this NULL byte does not overwrite useful information in the FORTRAN string. The second problem is more difficult because of the different ways that different FORTRANs store string. To solve this, a macro_fcdtocp ("FORTRAN character descriptor to C pointer) is used. _fcdtocp is defined differently, depending on the machine on which it is compiled. Here are some different ways that _fcdtocp works: There are three different ways that a FORTRAN string's address can be passed to C: * UNICOS FORTRAN stores strings in a structure called '_fcd" (FORTRAN character descriptor). '_fcdtocp' is a built-in function in UNICOS that returns the address of the string. * VMS FORTRAN stores strings by means of a string descriptor structure that provides information about where the string is stored and its length. When compiled under VMS, the function _fcdtocp extracts the string's address and returns that value. * Most other FORTRAN compilers supported by HDF store strings just as C does, in character arrays with the array name identifying the array's address. For these compilers nothing special need be done in passing a string from FORTRAN to C. In HDF, a FORTRAN call that involves passing a string results in the following sequences of actions: (1) A FORTRAN "stub" determines the length and address in memory of the string. Since this is a FORTRAN routine, it can be found in the file. (2) The FORTRAN stub then calls a C routine, which it passes all parameters from the initial call, plus one extra parameter: the string's length. (3) The C routine converts the FORTRAN string to a C string by copying it to a C array of type char, and appending a '\0' byte. Since this C routine serves as a link between a FORTRAN stub and the corresponding C interface call, it can be found in the " ... f.c" file. (4) This C routine then calls the HDF C routine that performs the actual function. This process is illustrated in Figure 7.1 *** INSERT FIGURE HERE *** Passing Strings from C to FORTRAN When strings are passed from C to FORTRAN, the reverse procedure is followed. First, a string pointer is obtained within the FORTRAN routine's data area. (It is assumed that the space pointed to has already been allocated, and is sufficiently large to hold the string.) The string is then copied from the C data area to the FORTRAN data area. Finally, if necessary the FORTRAN string's data area is padded with blanks. Function Return Values between FORTRAN and C When a FORTRAN routine calls a C function, it always expects a return value from that function. Unfortunately, the form in which C functions return arguments is not always compatible with the form in with FORTRAN expects them. To solve this problem, some C compilers offer the option of controlling the form of the return value from a function. For example, Language Systems FORTRAN for the Macintosh requires that all C function declarations be prepended by the word "pascal" so that the return value can be recognized by a FORTRAN routine that calls it, as in: pascal int dsgrang(void *pmax, void *pmin) Since C always expects return values to be passed "by value" rather than, say, "by reference," it is important to coerce FORTRAN functions to do the same. This is accomplished by defining a macro FRETVAL that is prepended to the declaration of every FORTRAN- callable C function. For example: FRETVAL (int) dsgrang(void *pmax, void *pmin) If Language Systems FORTRAN is to be used, then FRETVAL is defined (in hdfi.h) as follows: #if defined(MAC) /* with LS FORTRAN */ # define FRETVAL(X) pascal x #endif Differences in Acceptable Routine Names Different FORTRAN compilers impose different restrictions on the length, character set, and form of identifiers. In general, HDF uses C conventions for naming routines, and this means that measures must be taken to accommodate those compilers which have different conventions than C. The method used in HDF is to name routines differently, depending on the particular conventions of the FORTRAN compiler being used. This is done by defining certain flags for the preprocessor via #define statements in the hdfi.h file. Then conditional compilation--via #ifdef statements in.the source code files--is used to compile the routines that are called from FORTRAN with names that that particular FORTRAN can understand. Case Sensitivity C compilers are case sensitive. That is, upper and lower case letters are different. Many FORTRAN compilers allow users to use upper and lower case letters in naming routines, but the symbol table names that they produce in object modules are all in upper case or all in lower case. These compilers are not case sensitive. If routines compiled by a case-sensitive compiler are to be linked with routines compiled by a compiler that is not case sensitive, they might not recognize one another's routines. For example, the UNICOS FORTRAN compiler allows you to name routines without regard to case, but produces object modules with all routine names converted to upper case. UNICOS C, on the other hand, performs no such conversion. Consider how the HDF routine Hopen is treated by the two compilers. Hopen is written in C, so the HDF library has the name 'Hopen', a mixed-case name, in its symbol table. Suppose you make the following call in your UNICOS FORTRAN program: file_id = Hopen('myfile', ... ) The FORTRAN compiler will create an object module with the routine name "HOPEN" (all upper case) in its symbol table. When you link it to the HDF library, it will find "Hopen", but not "HOPEN", and will generate an "unsatisfied external reference" error. So far there are three FORTRAN compilers supported by HDF that convert names to upper case in the symbol table: VMS FORTRAN UNICOS FORTRAN Language Systems FORTRAN. How HDF Deals with "All-Upper Case" Compilers The solution to this problem is to name C functions entirely in upper case whenever they are called by all-upper case FORTRAN routines. This is done as follows: For FORTRAN compilers that produce all upper case symbol table entries a flag "DF_CAPFNAMES" is defined via a #define in hdfi.h. Then conditional compilation is used in the source code files to compile the routines that are called from FORTRAN with all-upper case names. For example, since UNICOS FORTRAN produces all-upper case symbol table entries, there is in the UNICOS section of hdfi.h. the following line: #define DF_CAPFNAMES Correspondingly, there are conditional compilations in the "..f.c" files that produce all-upper case routine names. For example, the function name "Fun" can be redefined at "FUN" as follows: #ifdef DF_CAPFNAMES define Fun FUN #endif /* DF_CAPFNAMES */ Appended Underscore A similar problem occurs with respect to the underscore character. When compilers generate object module symbol tables from source code, they commonly prepend an underscore ('_') to all external symbols. C generally does this. Then, when linking occurs, the linker looks for external symbols in the symbol table with the prefix. Unfortunately, many FORTRAN compilers also append an underscore to identify external symbols. Since C does not generally do this, external references in FORTRAN-generated object modules will not recognize externals with the same names in C-generated modules. For example, the FORTRAN compiler on the CONVEX, places an underscore at the end of routine names, while the C compiler only places an underscore at the front. Consider how a C function called FUN would be treated in this context. Since FUN is a C function, the object module containing FUN has it stored under the name "_FUN". Suppose you make the following call in a FORTRAN program: x = FUN (y) The FORTRAN compiler creates an object module with the routine name "_ FUN_" in its symbol table. When you link it to the C module, it will find " FUN", but not "_FUN_", and will generate an "unsatisified external reference error." How HDF Specifies Appended (and Prepended) Underscores The solution to this problem is to name C functions with an appended underscore whenever one is expected by FORTRAN calling routines. For instance, if the name of FUN had been "FUN_" in the example, its name in the C object module would have been "_FUN_", which is exactly what FORTRAN put into its symbol table. This is done as follows: For every machine whose FORTRAN compiler requires appended underscores, a flag "FNAME_POST_UNDERSCORE" is defined via a #define in hdfi.h in the section associated with that machine. Similarly, for those that require a prepended underscore a flag "FNAME_PRE_UNDERSCORE" is defined. Then, in a section of code in hdfi.h, conditional compilation is used to define a macro called "FNAME" that appends and/or prepends underscores as required. In the modules in which routines are actually defined (including in hptroto.h), the FNAME macro is then applied to each routine, causing the appropriate underscores to be added. Hence, in the example above in which "Fun" was caused to be uppercase, the actual definition would be as follows: #ifdef DF_CAPFNAMES define Fun FNAME(FUN) #endif /* DF_CAPFNAMES */ Short Names vs. Long Names In the C implementations supported by HDF, identifiers may be any length, with at least the first 31 characters having significance. FORTRAN compilers differ in the maximum lengths of identifiers that they allow, but all of those supported by HDF allow identifiers to have at least seven characters. To deal with the discrepancies between identifier lengths allowed by C and those allowed by the various FORTRAN compilers, a set of equivalent short names has been devised that can be used when programming in FORTRAN. For all HDF routines with names that are more than seven characters long, there is an identical routine whose name is eight or fewer characters long. For example, for the routine "DFSDgetdims" in the file dfsd.c there is a corresponding routine "dsgdims" in the file dfsdff.f with exactly the same functionality. ANSI C vs. Old C Both ANSI and old C compilers are supported in the current implementation of HDF (HDF 3.2). ANSI C is preferred, because it has many features that help insure portability, but unfortunately many important platforms do not support full ANSI C. The HDF code determines whether or not ANSI C is available from the flag _STDC_. If ANSI C is available, then _STDC_ is defined.(1) The most noticeable difference between ANSI and old C is in the way functions are declared. For example, in ANSI C the function DFSDsetdims() is declared with int DFSDsetdims(intn rank, int32 dimsizes[]) (1) Some C compilers are not entirely ANSI-conforming, yet they conform well enough that the HDF implementation can treat them as if they were. In such cases, it is considered permissible to "#define" _STDC_ when compiling. In old C the same function is declared with int DFSDsetdims(rank, dimsizes) intn rank; int32 dimsizes[]; The NCSA implementation of HDF accommodates these differences by defining in hdfi.h a flag called PROTOTYPE, which is used for every function declaration, as in the following example. #ifdef PROTOTYPE int DFSDsetdims(intn rank, int32 dimsizes[]) #else int DFSDsetdims(rank, dimsizes) intn rank; int32 dimsizes[]; #endif /* PROTOTYPE */ Another big difference between K&R and ANSI C is that ANSI C allows the use of function prototypes that include arguments, which helps enormously in detecting errors in the number and types of arguments. Old C also allows the use of function prototypes, but without the argument list. This difference occurs whenever PROTOTYPE is defined, it is handled by means of a macro called PROTO, which is defined as follows: #ifdef PROTOTYPE #define PROTO(x) x #else #define PROTO(x) () #endif This macro is applied as in the following example: extern int32 Hopen PROTO((char *path, intn access, int16 ndds)); When PROTOTYPE is defined, PROTO causes the argument list to stay as it is. When PROTOTYPE is not defined, PROTO causes the argument list to disappear. Type Differences Different machines and compilers differ in the sizes of numbers that they assign to different data types, in their representations of different number types, and in the way they organize aggregates of numbers (especially structures). Size differences The same number type can be different sizes on different machines. Type int, for example, is 16 bits to many IBM PC compilers, 48 bits to some supercomputer compilers, and 32 bits on most others. These differences can cause insidious problems in code like the HDF code that depends in so many places on numbers being the right size. This problem is handled in HDF by insisting in the code that all variables and functions must use a typedef'ed type which fully defines their type, including the number of bits that they occupy. This includes all parameters, members of structures, and static, automatic, and external variables. Hence, the data types used in HDF include the following. (The prefix "u" stands for "unsigned".) int8 uint8 int16 uint16 int32 uint32 float32 float64 intn uintn So, for example, on Sun's C compiler uint32 is defined with typedef long int int32; Hence, for each machine, typedefs are declared that map all of the data types used into the best available types. Unfortunately, it is not always possible to find a local data type that maps exactly to one of these types. For example, the Cray UNICOS C compiler does not support a 16-bit data type. In such instances, we do the best we can and try to be on the lookout for potential problems with number sizes. The data types "intn" and uintn are to be used whenever it can be determined that number type size is of no consequence, and that a 16-bit integer is large enough to hold any value the number can have. In such cases, the native int (or unsigned int) type of the host machine is used. Experience has shown that substantial performance gains can be achieved by using intn or uintn in certain circumstances. Number Representation One of the keys to producing a portable file format is insuring that numbers that are represented differently on different machines somehow get converted correctly when moved from machine to machine. The approach taken to this in the NCSA implementation is to provide conversion routines to convert between local representations and a standard representation that is stored in HDF files. Details of this process will be included in a later edition of this manual. Byte-order and Structure Representations Even when the basic bit-representation of constants or aggregates like structures is the same between machines, the ways that the bits are packed into a word, and the order in which the bits are layed out, can differ among machines. For example, Digital machines and Intel-based machines generally order bytes differently from most others. And the C compiler on a Cray, whose word size is 64 bits, packs structures differently from one on machines whose word size is 32 bits. Differences in byte order among machines are handled in two ways. when the data to be written (or read) consists of non-integer data and/or a large array or any type of data, a conversion routine (mentioned in the previous section, "Number Representation") is invoked. When an individual integer is to be written (or read), an "ENCODE" or "DECODE" macro is used. There are ENCODE and DECODE macros for 16-bit and 32-bit integers: INT16ENCODE UINT16ENCODE INT32ENCODE UINT32ENCODE INT16DECODE UINT16DECODE INT32DECODE UINT32DECODE The ENCODE macros are written in such a way that they write integers to an HDF file in a standard way, no matter what the corresponding word-size and byte order are of the host machine. Likewise, Tthe DECODE macros are written in such a way that they read integers stored in a standard way in an HDF file and store the integers in the required byte order and word size on the host machine. Since the ENCODE and DECODE macros deal with both byte order and word size, they are also used to handle the reading and writing of record-like structures. For example, the structure for an HDF data descriptor consists of two 16-bit fields, followed by two 32-bit fields, as implied by the following C declaration: struct { uint16 tag; uint16 ref; uint32 offset; uint32 length; } In an HDF file this structure must occupy exactly 12 bytes. On one computer it might occupy 12 bytes of storage, and on another, such as the Cray, it might occupy 32 bytes. Furthermore some machines might represent the numbers internally in different byte orders than others. By using the ENCODE and DECODE macros we are able to insure that these values are represented correctly in all machines and in HDF files. Access to Library Functions Despite efforts to standardize them, function libraries often differ in significant ways. There are at least three types of functions that need special treatment in the HDF implementation: (1) All file I/0 access. Both the stream and system level functions need this (i.e. the functions associated with the fopen() call, and the functions associated with the open() call). This is generally a 16-bit vs. 32-bit problem, because some machines use 16-bit values for the size of and the number of elements to write/read, and others use 32-bit values. (2) All memory allocation and releasing. There are two different problems associated with this. The first is that on a 16-bit machine, a 16-bit value is used for the number of bytes to allocate at one time. The second is that certain operating systems (notably MS-Windows and MAC/OS) don't have malloc() and free() calls. These operating systems use handles for allocating memory and require different function calls for memory allocation. (3) Memory and string manipulation. These functions (such as memcpy(), memcmp(), strcpy(), strlen(), etc.) require slightly different function names under different memory models in MS-DOS and under MS-Windows than on most other systems. These differences are dealt with by defining macros for the relevant functions, and defining them appropriately in the machine-specific sections of hdfi.h.