home *** CD-ROM | disk | FTP | other *** search
- Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!eru.mt.luth.se!www.nntp.primenet.com!nntp.primenet.com!news1.mpcs.com!hammer.uoregon.edu!arclight.uoregon.edu!su-news-hub1.bbnplanet.com!cam-news-hub1.bbnplanet.com!news.bbnplanet.com!cpk-news-hub1.bbnplanet.com!cam-news-feed2.bbnplanet.com!amber.ora.com!not-for-mail
- From: jdm@ora.com
- Newsgroups: comp.graphics.misc,comp.answers,news.answers
- Subject: Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
- Supersedes: <graphics/fileformats-faq-4-849730784@ora.com>
- Followup-To: poster
- Date: 20 Jan 1997 00:13:12 -0800
- Organization: O'Reilly & Associates, Inc.
- Lines: 550
- Sender: jdm@ruby.ora.com
- Approved: news-answers-request@MIT.EDU
- Distribution: world
- Expires: 02/24/97 00:13:00
- Message-ID: <graphics/fileformats-faq-4-853747980@ora.com>
- References: <graphics/fileformats-faq-1-853747980@ora.com>
- Reply-To: jdm@ora.com (James D. Murray)
- NNTP-Posting-Host: ruby.ora.com
- Summary: This document answers many of the most frequently asked
- questions about graphics file formats on Usenet.
- Keywords: FAQ, GRAPHICS, FORMAT, IMAGE, MULTIMEDIA, 3D
- Xref: senator-bedfellow.mit.edu comp.graphics.misc:18678 comp.answers:23814 news.answers:92563
-
- Posted-By: auto-faq 3.1.1.2
- Archive-name: graphics/fileformats-faq/part4
- Posting-Frequency: monthly
- Last-modified: 20Jan97
-
- Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
-
- ------------------------------
-
- This FAQ (Frequently Asked Questions) list contains information on
- graphics file formats, including, raster, vector, metafile, Page
- Description Language, 3D object, animation, and multimedia formats.
-
- This FAQ is divided into four parts, each covering a different area of
- graphics file format information:
-
- Graphics File Formats FAQ (Part 1 of 4): General Graphics Format Questions
- Graphics File Formats FAQ (Part 2 of 4): Image Conversion and Display Programs
- Graphics File Formats FAQ (Part 3 of 4): Where to Get File Format Specifications
- Graphics File Formats FAQ (Part 4 of 4): Tips and Tricks of the Trade
-
- Please email contributions, corrections, and suggestions about this FAQ to
- jdm@ora.com. Relevant information posted to newsgroups will not
- automatically make it into this FAQ.
-
- -- James D. Murray
-
- ------------------------------
-
- Subject: 0. Contents of Tips and Tricks of the Trade
- Subjects marked with <NEW> are new to this FAQ. Subjects marked with <UPD>
- have been updated since the last release of this FAQ.
-
- I. General questions about this FAQ
-
- 0. Maintainer's Comments
- 1. What's new in this latest FAQ release?
-
- II. Programming Tips for Graphics File Formats
-
- 0. What's the best way to read a file header?
- 1. What's this business about endianness?
- 2. How can I determine the byte-order of a system at run-time?
- 3. How can I identify the format of a graphics file?
- 4. What are the format identifiers of some popular file formats?
-
- III. Kudos and Assertions
-
- 0. Acknowledgments
- 1. About The Author
- 2. Disclaimer
- 3. Copyright Notice
-
-
- ------------------------------
-
-
- Subject: I. General questions about this FAQ
-
- ------------------------------
-
- Subject: 0. Maintainer's Comments
-
- Programmer's are code-hungry people. They just want the secrets and they want
- them to work NOW! But always in the back of a hack's mind there are the
- questions: "Is this really the best way to do this? Could it be better?".
-
- This FAQ is to share ideas on the implementation details of reading, writing,
- converting, and displaying graphics file formats. You'll probably get some
- good ideas here, find a few things you didn't know about, and even have a few
- suggestions and improvements of you own to add (send them to jdm@ora.com).
-
- If you need to know the best way to do something with file formats, or
- just find it embarrassing to implement a chunk of some other programmer's
- code and then have to admit you really don't understand how it works, then
- this FAQ is for you.
-
- ------------------------------
-
- Subject: 1. What's new in this latest FAQ release?
-
- o Minor bug fixed in GetLittleWord() and GetLittleDword() functions
-
- ------------------------------
-
- Subject: II. Programming Tips for Graphics File Formats
-
- ------------------------------
-
- Subject: 0. What's the best way to read a file header?
-
- You wouldn't think there's a lot of mystery about reading a few bytes from
- a disk file, eh? Programmer's, however, are constantly loosing time
- because they don't consider a few problems that may occur and cause them
- to loose time. Consider the following code:
-
- typedef struct _Header
- {
- BYTE Id;
- WORD Height;
- WORD Width;
- BYTE Colors;
- } HEADER;
-
- HEADER Header;
-
- void ReadHeader(FILE *fp)
- {
- if (fp != (FILE *)NULL)
- fread(&Header, sizeof(HEADER), 1, fp);
- }
-
- Looks good, right? The fread() will read the next sizeof(HEADER) bytes from
- a valid FILE pointer into the Header data structure. So what could go
- wrong?
-
- The problem often encountered with this method is one of element alignment
- within structures. Compilers may pad structures with "invisible" elements
- to allow each "visible" element to align on a 2- or 4-byte address
- boundary. This is done for efficiency in accessing the element while in
- memory. Padding may also be added to the end of the structure to bring
- it's total length to an even number of bytes. This is done so the data
- following the structure in memory will also align on a proper address
- boundary.
-
- If the above code is compiled with no (or 1-byte) structure alignment the
- code will operate as expected. With 2-byte alignment an extra two bytes
- would be added to the HEADER structure in memory and make it appear as
- such:
-
- typedef struct _Header
- {
- BYTE Id;
- BYTE Pad1; // Added padding
- WORD Height;
- WORD Width;
- BYTE Colors;
- BYTE Pad2; // Added padding
- } HEADER;
-
- As you can see the fread() will store the correct value in Id, but the
- first byte of Height will be stored in the padding byte. This will throw
- off the correct storage of data in the remaining part of the structure
- causing the values to be garbage.
-
- A compiler using 4-byte alignment would change the HEADER in memory as such:
-
- typedef struct _Header
- {
- BYTE Id;
- BYTE Pad1; // Added padding
- BYTE Pad2; // Added padding
- BYTE Pad3; // Added padding
- WORD Height;
- WORD Width;
- BYTE Colors;
- BYTE Pad4; // Added padding
- BYTE Pad5; // Added padding
- BYTE Pad6; // Added padding
- } HEADER;
-
- What started off as a 6-byte header increased to 8 and 12 bytes thanks to
- alignment. But what can you do? All the documentation and makefiles you
- write will not prevent someone from compiling with the wrong options flag
- and then pulling their (or your) hair out when your software appears not
- to work correctly.
-
- Now considering this alternative to the ReadHeader() function:
-
- HEADER Header;
-
- void ReadHeader(FILE *fp)
- {
- if (fp != (FILE *)NULL)
- {
- fread(&Header.Id, sizeof(Header.Id), 1, fp);
- fread(&Header.Height, sizeof(Header.Height), 1, fp);
- fread(&Header.Width, sizeof(Header.Width), 1, fp);
- fread(&Header.Colors, sizeof(Header.Colors), 1, fp);
- }
- }
-
- What both you and your compiler now see is a lot more code. Rather than
- reading the entire structure in one, elegant shot, you read in each
- element separately using multiple calls to fread(). The trade-off here is
- increased code size for not caring what the structure alignment option of
- the compiler is set to. These cases are also true for writing structures
- to files using fwrite(). Write only the data and not the padding please.
-
- But is there still anything we've yet over looked? Will fread() (fscanf(),
- fgetc(), and so forth) always return the data we expect? Will fwrite()
- (fprintf(), fputc(), and so forth) ever write data that we don't want, or
- in a way we don't expect? Read on to the next section...
-
- ------------------------------
-
- Subject: 1. What's this business about endianness?
-
- So you've been pulling you hair out trying to discover why your elegant
- and perfect-beyond-reproach code, running on your Macintosh or Sun, is
- reading garbage from PCX and TGA files. Or perhaps your MS-DOS or Windows
- application just can't seem to make heads or tails out of that Sun Raster
- file. And, to make matters even more mysterious, it seems your most
- illustrious creation will read some TIFF files, but not others.
-
- As was hinted at in the previous section, just reading the header of a
- graphics file one field is not enough to insure data is always read correctly
- (not enough for portable code, anyway). In addition to structure, we must also
- consider the endianness of the file's data, and the endianness of the
- system's architecture our code is running on.
-
- Here's are some baseline rules to follow:
-
- 1) Graphics files typically use a fixed byte-ordering scheme. For example,
- PCX and TGA files are always little-endian; Sun Raster and Macintosh
- PICT are always big-endian.
- 2) Graphics files that may contain data using either byte-ordering scheme
- (for example TIFF) will have an identifier that indicates the
- endianness of the data.
- 3) ASCII-based graphics files (such as DXF and most 3D object files),
- have no endianness and are always read in the same way on any system.
- 4) Most CPUs use a fixed byte-ordering scheme. For example, the 80486
- is little-endian and the 68040 is big-endian.
- 5) You can test for the type of endianness a system using software.
- 6) There are many systems that are neither big- nor little-endian; these
- middle-endian systems will possibly cause such byte-order detection
- tests to return erroneous results.
-
- Now we know that using fread() on a big-endian system to read data from a
- file that was originally written in little-endian order will return
- incorrect data. Actually, the data is correct, but the bytes that make up
- the data are arranged in the wrong order. If we attempt to read the 16-bit
- value 1234h from a little-endian file, it would be stored in memory using
- the big-endian byte-ordering scheme and the value 3412h would result. What
- we need is a swap function to change the resulting position of the bytes:
-
- WORD SwapTwoBytes(WORD w)
- {
- register WORD tmp;
- tmp = (w & 0x00FF);
- tmp = ((w & 0xFF00) >> 0x08) | (tmp << 0x08);
- return(tmp);
- }
-
- Now we can read a two-byte header value and swap the bytes as such:
-
- fread(&Header.Height, sizeof(Header.Height), 1, fp);
- Header.Height = SwapTwoBytes(Header.Height);
-
- But what about four-byte values? The value 12345678h would be stored as
- 78563412h. What we need is a swap function to handle four-byte values:
-
- DWORD SwapFourBytes(DWORD dw)
- {
- register DWORD tmp;
- tmp = (dw & 0x000000FF);
- tmp = ((dw & 0x0000FF00) >> 0x08) | (tmp << 0x08);
- tmp = ((dw & 0x00FF0000) >> 0x10) | (tmp << 0x08);
- tmp = ((dw & 0xFF000000) >> 0x18) | (tmp << 0x08);
- return(tmp);
- }
-
- But how do we know when to swap and when not to swap? We always know the
- byte-order of a graphics file that we are reading, but how do we check
- what the endianness of system we are running on is? Using the C language,
- we might use preprocessor switches to cause a conditional compile based on
- a system definition flag:
-
- #define MSDOS 1
- #define WINDOWS 2
- #define MACINTOSH 3
- #define AMIGA 4
- #define SUNUNIX 5
-
- #define SYSTEM MSDOS
-
- #if defined(SYSTEM == MSDOS)
- // Little-endian code here
- #elif defined(SYSTEM == WINDOWS)
- // Little-endian code here
- #elif defined(SYSTEM == MACINTOSH)
- // Big-endian code here
- #elif defined(SYSTEM == AMIGA)
- // Big-endian code here
- #elif defined(SYSTEM == SUNUNIX)
- // Big-endian code here
- #else
- #error Unknown SYSTEM definition
- #endif
-
- My reaction to the above code was *YUCK!* (and I hope yours was too!). A
- snarl of fread(), fwrite(), SwapTwoBytes(), and SwapFourBytes() functions
- laced between preprocessor statements is hardly elegant code, although
- sometimes it is our best choice. Fortunately, this is not one of those
- times.
-
- What we first need is a set of functions to read the data from a file
- using the byte-ordering scheme of the data. This effectively combines the
- read\write and swap operations into one set of functions. Considering the
- following:
-
- WORD GetBigWord(FILE *fp)
- {
- register WORD w;
- w = (WORD) (fgetc(fp) & 0xFF);
- w = ((WORD) (fgetc(fp) & 0xFF)) | (w << 0x08);
- return(w);
- }
-
- WORD GetLittleWord(FILE *fp)
- {
- register WORD w;
- w = (WORD) (fgetc(fp) & 0xFF);
- w |= ((WORD) (fgetc(fp) & 0xFF) << 0x08);
- return(w);
- }
-
- DWORD GetBigDoubleWord(FILE *fp)
- {
- register DWORD dw;
- dw = (DWORD) (fgetc(fp) & 0xFF);
- dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
- dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
- dw = ((DWORD) (fgetc(fp) & 0xFF)) | (dw << 0x08);
- return(dw);
- }
-
- DWORD GetLittleDoubleWord(FILE *fp)
- {
- register DWORD dw;
- dw = (DWORD) (fgetc(fp) & 0xFF);
- dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x08);
- dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x10);
- dw |= ((DWORD) (fgetc(fp) & 0xFF) << 0x18);
- return(dw);
- }
-
- void PutBigWord(WORD w, FILE *fp)
- {
- fputc((w >> 0x08) & 0xFF, fp);
- fputc(w & 0xFF, fp);
- }
-
- void PutLittleWord(WORD w, FILE *fp)
- {
- fputc(w & 0xFF, fp);
- fputc((w >> 0x08) & 0xFF, fp);
- }
-
- void PutBigDoubleWord(DWORD dw, FILE *fp)
- {
- fputc((dw >> 0x18) & 0xFF, fp);
- fputc((dw >> 0x10) & 0xFF, fp);
- fputc((dw >> 0x08) & 0xFF, fp);
- fputc(dw & 0xFF, fp);
- }
-
- void PutLittleDoubleWord(DWORD dw, FILE *fp)
- {
- fputc(dw & 0xFF, fp);
- fputc((dw >> 0x08) & 0xFF, fp);
- fputc((dw >> 0x10) & 0xFF, fp);
- fputc((dw >> 0x18) & 0xFF, fp);
- }
-
- If we were reading a little-endian file on a big-endian system (or visa
- versa), the previous code:
-
- fread(&Header.Height, sizeof(Header.Height), 1, fp);
- Header.Height = SwapTwoBytes(Header.Height);
-
- Would be replaced by:
-
- Header.Height = GetLittleWord(fp);
-
- The code to write the same value to a file would be changed from:
-
- Header.Height = SwapTwoBytes(Header.Height);
- fwrite(&Header.Height, sizeof(Header.Height), 1, fp);
-
- To the slightly more readable:
-
- PutLittleWord(Header.Height, fp);
-
- Note that these functions are the same regardless of the endianness of a
- system. For example, the ReadLittleWord() will always read a two-byte value
- from a little-endian file regardless of the endianness of the system;
- PutBigDoubleWord() will always write a four-byte big-endian value, and so
- forth.
-
- ------------------------------
-
- Subject: 2. How can I determine the byte-order of a system at run-time?
-
- You may wish to optimize how you read (or write) data from a graphics file
- based on the endianness of your system. Using the GetBigDoubleWord()
- function mentioned in the previous section to read big-endian data from a
- file on a big-endian system imposes extra overhead we don't really need
- (although if the actual number of read/write operations in your program is
- small you might not consider this overhead to be too bad).
-
- If our code could tell what the endianness of the system was at run-time,
- it could choose (using function pointers) what set of read/write functions
- to use. Look at the following function:
-
- #define BIG_ENDIAN 0
- #define LITTLE_ENDIAN 1
-
- int TestByteOrder(void)
- {
- short int word = 0x0001;
- char *byte = (char *) &word;
- return(byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
- }
-
- This code assigns the value 0001h to a 16-bit integer. A char pointer is
- then assigned to point at the first (least-significant) byte of the
- integer value. If the first byte of the integer is 01h, then the system
- is little-endian (the 01h is in the lowest, or least-significant,
- address). If it is 00h then the system is big-endian.
-
- ------------------------------
-
- Subject: 3. How can I identify the format of a graphics file?
-
- When writing any type of file or data stream reader it is very important
- to implement some sort of method for verifying that the input data is in
- the format you expect. Here are a few methods:
-
- 1) Trust the user of your program to always supply the correct data,
- thereby freeing you from the tedious task of writing any type of format
- identification routines. Choose this method and you will provide solid
- proof that contradicts the popular claim that users are inherently far
- more stupid than programmers.
-
- 2) Read the file extension or descriptor. A GIF file will always have the
- extension .GIF, right? Targa files .TGA, yes? And TIFF files will have an
- extension of .TIF or a descriptor of TIFF. So no problem?
-
- Well, for the most part, this is true. This method certainly isn't
- bulletproof, however. Your reader will occasionally be fed the odd-batch
- of mis-label files ("I thought they were PCX files!"). Or files with
- unrecognized mangled extensions (.TAR rather than .TGA or .JFI rather
- than .JPG) that your reader knows how to read, but won't read because it
- doesn't recognize the extensions. File extensions also won't usually tell
- you the revision of the file format you are reading (with some revisions
- creating an almost entirely new format). And more than one file format
- share the more common file extensions (such as .IMG and .PIC). And last of
- all, data streams have no file extensions or descriptors to read at all.
-
- 3) Read the file and attempt to recognize the format by specific patterns
- in the data. Most file formats contain some sort of identifying pattern of
- data that is identical in all files. In some cases this pattern gives and
- indication of the revision of the format (such as GIF87a and GIF89a) or
- the endianness of the data format.
-
- Nothing is easy, however. Not all formats contain such identifiers (such
- as PCX). And those that do don't necessarily put it at the beginning of
- the file. This means if the data is in the format of a stream you many
- have to read (and buffer) most or all of the data before you can determine
- the format. Of course, not all graphics formats are suitable to be read as
- a data stream anyway.
-
- Your best bet for a method of format detection is a combination of methods
- two and three. First believe the file extension or descriptor, read some
- data, and check for identifying data patterns. If this test fails, then
- attempt to recognize all other known patterns.
-
- Run-time file format identification a black-art at best.
-
- ------------------------------
-
- Subject: 4. What are the format identifiers of some popular file formats?
-
- Here are a few algorithms that you can use to determine the format of a
- graphics file at run-time.
-
-
- GIF: The first six bytes of a GIF file will be the byte pattern of
- 474946383761h ("GIF87a") or 474946383961h ("GIF89a").
-
- JFIF: The first three bytes are ffd8ffh (i.e., an SOI marker followed
- by any marker). Do not check the fourth byte, as it will vary.
-
- JPEG: The first three bytes are ffd8ffh (i.e., an SOI marker followed
- by any marker). Do not check the fourth byte, as it will vary.
- This works with most variants of "raw JPEG" as well.
-
- PNG: The first eight bytes of all PNG files are 89504e470d0a1a0ah.
-
- SPIFF: The first three bytes are ffd8ffh (i.e., an SOI marker followed
- by any marker). Do not check the fourth byte, as it will vary.
-
- Sun: The first four bytes of a Sun Rasterfile are 59a66a95h. If you have
- accidentally read this identifier using the little-endian byte order
- this value will will be read as 956aa659h.
-
- TGA: The last 18 bytes of a TGA Version 2 file is the string
- "TRUEVISION-XFILE.\0". If this string is not present, then the file
- is assumed to be a TGA Version 1 file.
-
- TIFF: The first four bytes of a big-endian TIFF files are 4d4d002ah and
- 49492a00h for little-endian TIFF files.
-
- ------------------------------
-
- Subject: III. Kudos and Assertions
-
- ------------------------------
-
- Subject: 0. Acknowledgments
-
- Chris M. Cooney <cooney1@imssys.imssys.com>
- Tom Lane <tgl@netcom.com>
- Charles R. Patton <crpatton@ingr.com>
-
- ------------------------------
-
- Subject: 1. About The Author
-
- The author of this FAQ, James D. Murray, lives in the City of Orange,
- Orange County, California, USA. He is the co-author of the book
- Encyclopedia of Graphics File Formats published by O'Reilly and
- Associates, makes a living writing books for O'Reilly, writing
- telecommuncations network management software in C++ and Visual Basic,
- and may be reached as jdm@ora.com,
- or via U.S. Snail at: P.O. Box 70, Orange, CA 92666-0070 USA.
-
- ------------------------------
-
- Subject: 2. Disclaimer
-
- While every effort has been taken to insure the accuracy of the
- information contained in this FAQ list compilation, the author and
- contributors assume no responsibility for errors or omissions, or for
- damages resulting from the use of the information contained herein.
-
- ------------------------------
-
- Subject: 3. Copyright Notice
-
- This FAQ is Copyright 1994-96 by James D. Murray. This work may be
- reproduced, in whole or in part, using any medium, including, but not
- limited to, electronic transmission, CD-ROM, or published in print, under
- the condition that this copyright notice remains intact.
-
- ------------------------------
-
-