FAQ: Audio File Formats (version 2.10) ====================================== Table of contents ----------------- Introduction Device characteristics Popular sampling rates Compression schemes Current hardware File formats File conversions Playing audio files on UNIX Playing audio files on micros The Sound Site Newsletter Posting sounds Appendices: FTP access for non-internet sites AIFF Format (Audio IFF) The NeXT/Sun audio file format IFF/8SVX Format Playing sound on a PC The EA-IFF-85 documentation US Federal Standard 1016 availability Creative Voice (VOC) file format RIFF WAVE (.WAV) file format U-LAW and A-LAW definitions AVR File Format Introduction ------------ This is version 2 of this FAQ, which I started in November 1991 under the name "The audio formats guide". I bumped the major version number since the Subject and Newsgroups headers have changed to make the subject more informative and give the guide a wider audience. I also added a Table of contents section at the top. I am posting this about once a fortnight, either unchanged (just to inform new readers), or updated (if I learn more or when new hardware or software becomes popular). I post to alt.binaries.sounds.{misc,d} and to comp.dsp, for maximal coverage of people interested in audio, and to news.answers, for easy reference. A companion posting with subject "Change to: ..." is occasionally posted listing the diffs between a new version and the last. This is not reposted, and it is suppressed when the diffs are bigger than the new version. NEWSFLASH: This FAQ is now also available in distributed hypertext form! If you have a WWW browser and direct Internet access you can point it to "http://voorn.cwi.nl/audio-formats/a00.html". (WWW is the CERN World-Wide Web initiative; for more info, telnet or ftp to info.cern.ch.) Send updates, comments and questions to ; flames to /dev/null. I'd like to thank everyone who sent me mail with updates for previous versions. The list of names is really too long to list you all... --Guido van Rossum, CWI, Amsterdam "Lobster thermidor aux crevettes with a mornay sauce garnished with truffle pate, brandy and a fried egg on top and spam" Device characteristics ---------------------- In this text, I will only use the term "sample" to refer to a single output value from an A/D converter, i.e., a small integer number (usually 8 or 16 bits). Audio data is characterized by the following parameters, which correspond to settings of the A/D converter when the data was recorded. Naturally, the same settings must be used to play the data. - sampling rate (in samples per second), e.g. 8000 or 44100 - number of bits per sample, e.g. 8 or 16 - number of channels (1 for mono, 2 for stereo, etc.) Approximate sampling rates are often quoted in Hz or kHz ([kilo-] Hertz), however, the politically correct term is samples per second (samples/sec). Sampling rates are always measured per channel, so for stereo data recorded at 8000 samples/sec, there are actually 16000 samples in a second. I will sometimes write 8 k as a shorthand for 8000 samples/sec. Multi-channel samples are generally interleaved on a frame-by-frame basis: if there are N channels, the data is a sequence of frames, where each frame contains N samples, one from each channel. (Thus, the sampling rate is really the number of *frames* per second.) For stereo, the left channel usually comes first. The specification of the number of bits for U-LAW (pronounced mu-law -- the u really stands for the Greek letter mu) samples is somewhat problematic. These samples are logarithmically encoded in 8 bits, like a tiny floating point number; however, their dynamic range is that of 14 bit linear data. Source for converting to/from U-LAW (written by Jef Poskanzer) is distributed as part of the SOX package mentioned below; it can easily be ripped apart to serve in other applications. The official definition is the CCITT standard G.711. There exists another encoding similar to U-LAW, called A-LAW, which is used as a European telephony standard. There is less support for it in UNIX workstations. (See the Appendix for some formulae describing U-LAW and A-LAW.) Popular sampling rates ---------------------- Some sampling rates are more popular than others, for various reasons. Some recording hardware is restricted to (approximations of) some of these rates, some playback hardware has direct support for some. The popularity of divisors of common rates can be explained by the simplicity of clock frequency dividing circuits :-). Samples/sec Description 5500 One fourth of the Mac sampling rate (rarely seen). 7333 One third of the Mac sampling rate (rarely seen). 8000 Exactly 8000 samples/sec is a telephony standard that goes together with U-LAW (and also A-LAW) encoding. Some systems use an slightly different rate; in particular, the NeXT workstation uses 8012.8210513, apparently the rate used by Telco CODECs. 11 k Either 11025, a quarter of the CD sampling rate, or half the Mac sampling rate (perhaps the most popular rate on the Mac). 16000 Used by, e.g. the G.722 compression standard. 18.9 k CD-ROM/XA standard. 22 k Either 22050, half the CD sampling rate, or the Mac rate; the latter is precisely 22254.545454545454 but usually misquoted as 22000. (Historical note: 22254.5454... was the horizontal scan rate of the original 128k Mac.) 32000 Used in digital radio, NICAM (Nearly-Instantaneous Companded Audio Multiplex [IBA/BREMA/BBC]) and other TV work, at least in the UK; also long play DAT and Japanese HDTV. 37.8 k CD-ROM/XA standard for higher quality. 44056 This weird rate is used by professional audio equipment to fit an integral number of samples in a video frame. 44100 The CD sampling rate. (DAT players recording digitally from CD also use this rate.) 48000 The DAT (Digital Audio Tape) sampling rate for domestic use. Files samples on SoundBlaster hardware have sampling rates that are divisors of 1000000. While professinal musicians disagree, most people don't have a problem if recorded sound is played at a slightly different rate, say, 1-2%. On the other hand, if recorded data is being fed into a playback device in real time (say, over a network), even the smallest difference in sampling rate can frustrate the buffering scheme used... There may be an emerging tendency to standardize on only a few sampling rates and encoding styles, even if the file formats may differ. The suggested rates and styles are: rate (samp/sec) style mono/stereo 8000 8-bit U-LAW mono 22050 8-bit linear unsigned mono and stereo 44100 16-bit linear signed mono and stereo Compression schemes ------------------- Strange though it seems, audio data is remarkably hard to compress effectively. For 8-bit data, a Huffman encoding of the deltas between successive samples is relatively successful. For 16-bit data, companies like Sony and Philips have spent millions to develop proprietary schemes. Public standards for voice compression are slowly gaining popularity, e.g. CCITT G.721 and G.723 (ADPCM at 32 and 24 kbits/sec). (ADPCM == Adaptive Delta Pulse Code Modulation.) Free source code for a *fast* 32 kbits/sec ADPCM algorithm is available by ftp from ftp.cwi.nl as /pub/adpcm.shar. (** NOTE: if you are using v1.0, you should get v1.1, released 17-Dec-1992, which fixes a serious bug -- the quality of v1.1 is claimed to be better than uLAW **) There are also two US federal standards, 1016 (Code excited linear prediction (CELP), 4800 bits/s) and 1015 (LPC-10E, 2400 bits/s). See also the appendix for 1016. (Note that U-LAW and silence detection can also be considered compression schemes.) Here's a note about audio codings by Van Jacobson : Several people used the words "LPC" and "CELP" interchangably. They are very different. An LPC (Linear Predictive Coding) coder fits speech to a simple, analytic model of the vocal tract, then throws away the speech & ships the parameters of the best-fit model. An LPC decoder uses those parameters to generate synthetic speech that is usually more-or-less similar to the original. The result is intelligible but sounds like a machine is talking. A CELP (Code Excited Linear Predictor) coder does the same LPC modeling but then computes the errors between the original speech & the synthetic model and transmits both model parameters and a very compressed representation of the errors (the compressed representation is an index into a 'code book' shared between coders & decoders -- this is why it's called "Code Excited"). A CELP coder does much more work than an LPC coder (usually about an order of magnitude more) but the result is much higher quality speech: The FIPS-1016 CELP we're working on is essentially the same quality as the 32Kb/s ADPCM coder but uses only 4.8Kb/s (the same as the LPC coder). Finally, the comp.compression FAQ has some text on the 6:1 audio compression scheme used by MPEG (a video compression standard-to-be). It's interesting to note that video compression reaches much higher ratios (like 26:1). This FAQ is ftp'able from rtfm.mit.edu [18.72.1.58] in directory /pub/usenet/news.answers/compression-faq, files part1 and part2. Comp.compression also carries a regular posting "How to uncompress anything" by David Lemson , which (tersely) hints on which program you need to uncompress a file whose name ends in . for almost any conceivable . Ftp'able from ftp.cso.uiuc.edu (128.174.5.59) in the directory /doc/pcnet as the file compression. Current hardware ---------------- I am aware of the following computer systems that can play back and (sometimes) record audio data, with their characteristics. Note that for most systems you can also buy "professional" sampling hardware, which supports much better quality, e.g. >= 44.1 k 16 bits stereo. The characteristics listed here are a rough estimate of the capabilities of the basic hardware only (and even here I am on thin ice, with systems becoming ever more powerful). machine bits max sampling rate #output channels Mac 8 22k 1 Apple IIgs 8 32k / >70k 8(st) PC/Soundblaster v1 8 13k / 22k 1 PC/Soundblaster v2 8 15k / 44.1k 1 PC/PAS-16 16 44.1k ?(st) Atari ST 8 22k 1 Atari STe,TT 8 50k 2 Atari Falcon 030 16 50k 8(st) Amiga 8 ~29k 4(st) Sun Sparc U-LAW 8k 1 Sun Sparcst. 10 U-LAW,8,16 48k 1(st) NeXT U-LAW,8,16 44.1k 1(st) SGI Indigo 8,16 48k 4(st) Acorn Archimedes ~U-LAW ~180k 8(st) Sony RISC-NEWS 8, 16 37.8k ?(st) VAXstation 4000 U-LAW 8k 1 Tandy 1000/*L* 8 22k 3 HP9000/705,710,425e U,A-LAW,8 8k 1 4(st) means "four voices, stereo"; sampling rates xx/yy are different recording/playback rates; *L* is any type with 'L' in it. All these machines can play back sound without additional hardware, although the needed software is not always standard; only the Sun, NeXT and SGI come with standard sampling hardware (the NeXT only samples U-LAW at 8000 samples/sec from the built-in microphone port; you need a separate board for other rates). The new VAXstation 4000 (VLC and model 60) series lets you PLAY audio (.au) files, and the package DECsound will let you do the recording. In fact, DECsound is given away free with Motif 1.1 and supports the VAXstation, Sun SPARCstation, DECvoice, and XMedia audio devices. Sun sound files work without change. The SGI Personal IRIS 4D/30 and 4D/35 have the same capabilities as the Indigo. The new Apple Macs have more powerful audio hardware; the latest models have built-in microphones. Software exists for the PC that can play sound on its 1-bit speaker using pulse width modulation (see appendix); the Soundblaster board records at rates up to 13 k and plays back up to 22 k (weird combination, but that's the way it is). Here's some info about the newest Atari machine, the Falcon030. This machine has stereo 16 bit CODECs and a 32 MHz Motorola 56001 that can handle 8 channels of 16 bit audio, up to 50 khz/channel with simultaneous playback and record. The Falcon DMA sound engine is also compatible with the 8 bit stereo DMA used on the STe and TT. All of these systems use signed data. On the NeXT, the Motorola 56001 DSP chip is programmable and you can (in principle) do what you want. The SGI uses the same DSP chip but it can't be programmed by users -- SGI prefers to offer it as a shared system resource to multiple applications, thus enabling developers to program audio with their Audio Library and avoid code modifications for execution on future machines with different audio hardware, i.e. a different DSP. The Amiga also has a 6-bit volume, which can be used to produce something like a 14-bit output for each voice. The hardware can also use one of each voice-pair to modulate the other in FM (period) or AM (volume, 6-bits). The Acorn Archimedes uses a variation on U-LAW with the bit order reversed and the sign bit in bit 0. Being a 'minority' architecture, Arc owners are quite adept at converting sound/image formats from other machines, and it is unlikely that you'll ever encounter sound in one of the Arc's own formats (there are several). CD-I machines form a special category. The following formats are used: - PCM 44.1 kHz standard CD format - ADPCM - Addaptive Delta PCM - Level A 37.8 kHz 8-bit - Level B 37.8 kHz 4-bit - Level C 18.9 kHz 4-bit File formats ------------ Historically, almost every type of machine used its own file format for audio data, but some file formats are more generally applicable, and in general it is possible to define conversions between almost any pair of file formats -- sometimes losing information, however. File formats are a separate issue from device characteristics. There are two types of file formats: self-describing formats, where the device parameters and encoding are made explicit in some form of header, and "raw" formats, where the device parameters and encoding are fixed. Self-describing file formats generally define a family of data encodings, where a header fields indicates the particular encoding variant used. Headerless formats define a single encoding and usually allows no variation in device parameters (except sometimes sampling rate, which can be a pain to figure out other than by listening to the sample). The header of self-describing formats contains the parameters of the sampling device and sometimes other information (e.g. a human-readable description of the sound, or a copyright notice). Most headers begin with a simple "magic word". (Some formats do not simply define a header format, but may contain chunks of data intermingled with chunks of encoding info.) The data encoding defines how the actual samples are stored in the file, e.g. signed or unsigned, as bytes or short integers, in little-endian or big-endian byte order, etc. Strictly spoken, channel interleaving is also part of the encoding, although so far I have seen little variation in this area. Some file formats apply some kind of compression to the data, e.g. Huffman encoding, or simple silence deletion. Here's an overview of popular file formats. Self-describing file formats ---------------------------- extension, name origin variable parameters (fixed; comments) .au or .snd NeXT, Sun rate, #channels, encoding, info string .aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info .aif(f), AIFC Apple, SGI same (extension of AIFF with compression) .iff, IFF/8SVX Amiga rate, #channels, instrument info (8 bits) .voc Soundblaster rate (8 bits/1 ch; can use silence deletion) .wav, WAVE Microsoft rate, #channels, sample width, lots of info .sf IRCAM rate, #channels, encoding, info none, HCOM Mac rate (8 bits/1 ch; uses Huffman compression) none, MIME Internet (see below) .mod or .nst Amiga (see below) Note that the filename extension ".snd" is ambiguous: it can be either the self-describing NeXT format or the headerless Mac/PC format, or even a headerless Amiga format. I know nothing for sure about the origin of HCOM files, only that there are a lot of them floating around on our system and probably at FTP sites over the world. The filenames usually don't have a ".hcom" extension, but this is what SOX (see below) uses. The file format recognized by SOX includes a MacBinary header, where the file type field is "FSSD". The data fork begins with the magic word "HCOM" and contains Huffman compressed data; after decompression it it is 8 bits unsigned data. IFF/8SVX allows for amplitude contours for sounds (attack/decay/etc). Compression is optional (and extensible); volume is variable; author, notes and copyright properties; etc. AIFF, AIFC and WAVE are similar in spirit but allow more freedom in encoding style (other than 8 bit/sample), amongst others. There are other sound formats in use on Amiga by digitizers and music programs, such as IFF/SMUS. Appendices describes the NeXT and VOC formats; pointers to more info about AIFF, AIFC, 8SVX and WAVE (which are too complex to describe here) are also in appendices. DEC systems (e.g. DECstation 5000) use a variant of the NeXT format that uses little-endian encoding and has a different magic number (0x0064732E in little-endian encoding). Standard file formats used in the CD-I world are IFF but on the disc they're in realtime files. An interesting "interchange format" for audio data is described in the proposed Internet Standard "MIME", which describes a family of transport encodings and structuring devices for electronic mail. This is an extensible format, and initially standardizes a type of audio data dubbed "audio/basic", which is 8-bit U-LAW data sampled at 8000 samples/sec. Finally, a somewhat different but popular format are "MOD" files, usually with extension ".mod" or ".nst" (they can also have a prefix of "mod."). This originated at the Amiga but players now exist for many platforms. MOD files are music files containing 2 parts: (1) a bank of digitized samples; (2) sequencing information describing how and when to play the samples. See the appendix "The Amiga MOD Format" for a description of this file format. Headerless file formats ----------------------- extension origin parameters or name .snd, .fssd Mac, PC variable rate, 1 channel, 8 bits unsigned .ul US telephony 8 k, 1 channel, 8 bit "U-LAW" encoding .snd? Amiga variable rate, 1 channel, 8 bits signed It is usually easy to distinguish 8-bit signed formats from unsigned by looking at the beginning of the data with 'od -b " option. Remember that the most common file type is unsigned bytes, which can be indicated with "-t ub". You'll have to guess the proper sampling rate, but often it's 11k or 22k. - In particular, with SOX version 4 (or earlier), you have to specify "-t 8svx" for files with an .iff extension. - When converting linear samples to U-LAW using the .au type for the output file, you must specify "-U" for the output file, otherwise you will end up with a file containing a NeXT/Sun header but linear samples -- only the NeXT will play such files correctly. Also, you must explicitly specify an output sampling rate with "-r 8000". (This may seem fixed for most cases in version 5, but it is still occasionally necessary, so I'm keeping this warning in.) Sun Sparc --------- On Sun Sparcs, starting at SunOS 4.1, a program "raw2audio" is provided by Sun (in /usr/demo/SOUND -- see below) which takes a raw U-LAW file and turns it into a ".au" file by prefixing it with an appropriate header. NeXT ---- On NeXTs, you can usually rename .au files to .snd and it'll work like a charm, but some .au files lack header info that the NeXT needs. This can be fixed by using sndconvert: sndconvert -c 1 -f 1 -s 8012.8210513 -o nextfile.snd sunfile.au SGI Indigo and Personal IRIS ---------------------------- SGI supports "soundfiler" (in /usr/sbin), a program similar in spirit to SOX but with a GUI. Soundfiler plays aiff, aifc, NeXT/Sun and .wav formats. It can do conversions between any of these formats and to and from raw formats including mulaw. It also does sample rate conversions. Three shell commands are also provided that give the same functionality: "sfplay", "sfconvert", and "aifcresample" (all in /usr/sbin). Amiga ----- Mike Cramer's SoundZAP can do no effects except rate change and it only does conversions to IFF, but it is generally much faster than SOX. (Ftp'able from the same directory as amisox above.) Tandy ----- The Tandy 1000 uses a (proprietary?) compressed format. There is a PD Mac to Tandy conversion program called CONVERT. Playing audio files on UNIX --------------------------- The commands needed to play an audio file depend on the file format and the available hardware and software. Most systems can only directly play sound in their native format; use a conversion program (see above) to play other formats. Sun Sparcstation running SunOS 4.x ---------------------------------- Raw U-LAW files can be played using "cat file >/dev/audio". A whole package for dealing with ".au" files is provided by Sun on an experimental basis, in /usr/demo/SOUND. You may have to compile the programs first. (If you can't find this directory, either you are not running SunOS 4.1 yet, or your system administrator hasn't installed it -- go ask him for it, not me!) The program "play" in this directory recognizes all files in Sun/NeXT format, but a SS 1 or 2 can play only those using U-LAW encoding at 8 k -- the SS 10 hardware plays other encodings, too. If you ca't find "play", you can also cat a ".au" file to /dev/audio, if it uses U-LAW; the header will sound like a short burst of noise but the rest of the data will sound OK (really, the only difference in this case between raw U-LAW and ".au" files is the header; the U-LAW data is exactly the same). Finally, OpenWindows 3.0 has a full-fledged audio tool. You can drop audio file icons into it, edit them, etc. Sun Sparcstation running Solaris 2.0 ------------------------------------ Under SVR4 (and hence Solaris 2.0), writing to /dev/audio from the shell is a bad idea, because the device driver will flush its queue as soon as the file is closed. Use "audioplay" instead. The supported formats and sampling rates are the same as above. ------------------------------------------------------------------------ AIFF Format (Audio IFF) and AIFC -------------------------------- This format was developed by Apple for storing high-quality sampled sound and musical instrument info; it is also used by SGI and several professional audio packages (sorry, I know no names). An extension, called AIFC or AIFF-C, supports compression (see the last item below). I've made a BinHex'ed MacWrite version of the AIFF spec (no idea if it's the same text as mentioned below) available by anonymous ftp from ftp.cwi.nl [192.16.184.180]; the file is /pub/AudioIFF1.2.hqx. But you may be better off with the AIFF-C specs, see below. Mike Brindley (brindley@ece.orst.edu) writes: "The complete AIFF spec by Steve Milne, Matt Deatherage (Apple) is available in 'AMIGA ROM Kernal Reference Manual: Devices (3rd Edition)' 1991 by Commodore-Amiga, Inc.; Addison-Wesley Publishing Co.; ISBN 0-201-56775-X, starting on page 435 (this edition has a charcoal grey cover). It is available in most bookstores, and soon in many good librairies." Finally, Mark Callow writes (in comp.sys.sgi): "I have placed a PostScript version of the AIFF-C specification on sgi.sgi.com for public ftp. It is in the file sgi/aiff-c.9.26.91.ps. sgi.sgi.com's internet host number is (I think) 192.48.153.1." ------------------------------------------------------------------------ IFF/8SVX Format --------------- Newsgroups: alt.binaries.sounds.d,alt.sex.sounds Subject: Format of the IFF header (Amiga sounds) Message-ID: <2509@tardis.Tymnet.COM> From: jms@tardis.Tymnet.COM (Joe Smith) Date: 23 Oct 91 23:54:38 GMT Followup-To: alt.binaries.sounds.d Organization: BT North America (Tymnet) The first 12 bytes of an IFF file are used to distinguish between an Amiga picture (FORM-ILBM), an Amiga sound sample (FORM-8SVX), or other file conforming to the IFF specification. The middle 4 bytes is the count of bytes that follow the "FORM" and byte count longwords. (Numbers are stored in M68000 form, high order byte first.) ------------------------------------------ FutureSound audio file, 15000 samples at 10.000KHz, file is 15048 bytes long. 0000: 464F524D 00003AC0 38535658 56484452 FORM..:.8SVXVHDR F O R M 15040 8 S V X V H D R 0010: 00000014 00003A98 00000000 00000000 ......:......... 20 15000 0 0 0020: 27100100 00010000 424F4459 00003A98 '.......BODY..:. 10000 1 0 1.0 B O D Y 15000 0000000..03 = "FORM", identifies this as an IFF format file. FORM+00..03 (ULONG) = number of bytes that follow. (Unsigned long int.) FORM+03..07 = "8SVX", identifies this as an 8-bit sampled voice. ????+00..03 = "VHDR", Voice8Header, describes the parameters for the BODY. VHDR+00..03 (ULONG) = number of bytes to follow. VHDR+04..07 (ULONG) = samples in the high octave 1-shot part. VHDR+08..0B (ULONG) = samples in the high octave repeat part. VHDR+0C..0F (ULONG) = samples per cycle in high octave (if repeating), else 0. VHDR+10..11 (UWORD) = samples per second. (Unsigned 16-bit quantity.) VHDR+12 (UBYTE) = number of octaves of waveforms in sample. VHDR+13 (UBYTE) = data compression (0=none, 1=Fibonacci-delta encoding). VHDR+14..17 (FIXED) = volume. (The number 65536 means 1.0 or full volume.) ????+00..03 = "BODY", identifies the start of the audio data. BODY+00..03 (ULONG) = number of bytes to follow. BODY+04..NNNNN = Data, signed bytes, from -128 to +127. 0030: 04030201 02030303 04050605 05060605 0040: 06080806 07060505 04020202 01FF0000 0050: 00000000 FF00FFFF FFFEFDFD FDFEFFFF 0060: FDFDFF00 00FFFFFF 00000000 00FFFF00 0070: 00000000 00FF0000 00FFFEFF 00000000 0080: 00010000 000101FF FF0000FE FEFFFFFE 0090: FDFDFEFD FDFFFFFC FDFEFDFD FEFFFEFE 00A0: FFFEFEFE FEFEFEFF FFFFFEFF 00FFFF01 This small section of the audio sample shows the number ranging from -5 (0xFD) to +8 (0x08). Warning: Do not assume that the BODY starts 48 bytes into the file. In addition to "VHDR", chunks labeled "NAME", "AUTH", "ANNO", or "(c) " may be present, and may be in any order. You will have to check the byte count in each chunk to determine how many bytes to skip. ------------------------------------------------------------------------ The EA-IFF-85 documentation --------------------------- From: dgc3@midway.uchicago.edu As promised, here's an ftp location for the EA-IFF-85 documentation. It's the November 1988 release as revised by Commodore (the last public release), with specifications for IFF FORMs for graphics, sound, formatted text, and more. IFF FORMS now exist for other media, including structured drawing, and new documentation is now available only from Commodore. The documentation is at grind.isca.uiowa.edu [128.255.19.233], in the directory /amiga/f1/ff185. The complete file list is as follows: DOCUMENTS.zoo EXAMPLES.zoo EXECUTABLE.zoo INCLUDE.zoo LINKER_INFO.zoo OBJECT.zoo SOURCE.zoo TP_IFF_Specs.zoo All files except DOCUMENTS.zoo are Amiga-specific, but may be used as a basis for conversion to other platforms. Well, I take that tentatively back. I don't know what TP_IFF_Specs.zoo contains, so it might be non-Amiga-specific. ------------------------------------------------------------------------ Creative Voice (VOC) file format -------------------------------- From: galt@dsd.es.com (byte numbers are hex!) HEADER (bytes 00-19) Series of DATA BLOCKS (bytes 1A+) [Must end w/ Terminator Block] - --------------------------------------------------------------- HEADER: ======= byte # Description ------ ------------------------------------------ 00-12 "Creative Voice File" 13 1A (eof to abort printing of file) 14-15 Offset of first datablock in .voc file (std 1A 00 in Intel Notation) 16-17 Version number (minor,major) (VOC-HDR puts 0A 01) 18-19 2's Comp of Ver. # + 1234h (VOC-HDR puts 29 11) - --------------------------------------------------------------- DATA BLOCK: =========== Data Block: TYPE(1-byte), SIZE(3-bytes), INFO(0+ bytes) NOTE: Terminator Block is an exception -- it has only the TYPE byte. TYPE Description Size (3-byte int) Info ---- ----------- ----------------- ----------------------- 00 Terminator (NONE) (NONE) 01 Sound data 2+length of data * 02 Sound continue length of data Voice Data 03 Silence 3 ** 04 Marker 2 Marker# (2 bytes) 05 ASCII length of string null terminated string 06 Repeat 2 Count# (2 bytes) 07 End repeat 0 (NONE) *Sound Info Format: **Silence Info Format: --------------------- ---------------------------- 00 Sample Rate 00-01 Length of silence - 1 01 Compression Type 02 Sample Rate 02+ Voice Data Marker# -- Driver keeps the most recent marker in a status byte Count# -- Number of repetitions + 1 Count# may be 1 to FFFE for 0 - FFFD repetitions or FFFF for endless repetitions Sample Rate -- SR byte = 256-(1000000/sample_rate) Length of silence -- in units of sampling cycle Compression Type -- of voice data 8-bits = 0 4-bits = 1 2.6-bits = 2 2-bits = 3 Multi DAC = 3+(# of channels) [interesting-- this isn't in the developer's manual] ------------------------------------------------------------------------ RIFF WAVE (.WAV) file format ---------------------------- RIFF is a format by Microsoft and IBM which is similar in spirit and functionality as EA-IFF-85, but not compatible (and it's in little-endian byte order, of course :-). WAVE is RIFF's equivalent of AIFF, and its inclusion in Microsoft Windows 3.1 has suddenly made it important to know about. Rob Ryan was kind enough to send me a description of the RIFF format. Unfortunately, it is too big to include here (27 k), but I've made it available for anonymous ftp as ftp.cwi.nl:/pub/RIFF-format. And here's a pointer to the official description from Matt Saettler, Microsoft Multimedia: "The complete definition of the WAVE file format as defined by IBM/Microsoft is available for anon. FTP from ftp.uu.net in the vendor/microsoft/multimedia directory." (Rob Ryan's version may actually be an extract from one of the files stored there.) ------------------------------------------------------------------------ U-LAW and A-LAW definitions --------------------------- [Adapted from information provided by duggan@cc.gatech.edu (Rick Duggan) and davep@zenobia.phys.unsw.EDU.AU (David Perry)] u-LAW (really mu-LAW) is sgn(m) ( |m |) |m | y= ------- ln( 1+ u|--|) |--| =< 1 ln(1+u) ( |mp|) |mp| A-LAW is | A (m ) |m | 1 | ------- (--) |--| =< - | 1+ln A (mp) |mp| A y=| | sgn(m) ( |m |) 1 |m | | ------ ( 1+ ln A|--|) - =< |--| =< 1 | 1+ln A ( |mp|) A |mp| Values of u=100 and 255, A=87.6, mp is the Peak message value, m is the current quantised message value. (The formulae get simpler if you substitute x for m/mp and sgn(x) for sgn(m); then -1 <= x <= 1.) Converting from u-LAW to A-LAW is in a sense "lossy" since there are quantizing errors introduced in the conversion. "..the u-LAW used in North America and Japan, and the A-LAW used in Europe and the rest of the world and international routes.." References: Modern Digital and Analog Communication Systems, B.P.Lathi., 2nd ed. ISBN 0-03-027933-X Transmission Systems for Communications Fifth Edition by Members of the Technical Staff at Bell Telephone Laboratories Bell Telephone Laboratories, Incorporated Copyright 1959, 1964, 1970, 1982 ------------------------------------------------------------------------ AVR File Format --------------- From: hyc@hanauma.Jpl.Nasa.Gov (Howard Chu) A lot of PD software exists to play Mac .snd files on the ST. One other format that seems pretty popular (used by a number of commercial packages) is the AVR format (from Audio Visual Research). This format has a 128 byte header that looks like this: char magic[4]="2BIT"; char name[8]; /* null-padded sample name */ short mono; /* 0 = mono, 0xffff = stereo */ short rez; /* 8 = 8 bit, 16 = 16 bit */ short sign; /* 0 = unsigned, 0xffff = signed */ short loop; /* 0 = no loop, 0xffff = looping sample */ short midi; /* 0xffff = no MIDI note assigned, 0xffXX = single key note assignment 0xLLHH = key split, low/hi note */ long rate; /* sample frequency in hertz */ long size; /* sample length in bytes or words (see rez) */ long lbeg; /* offset to start of loop in bytes or words. set to zero if unused. */ long lend; /* offset to end of loop in bytes or words. set to sample length if unused. */ short res1; /* Reserved, MIDI keyboard split */ short res2; /* Reserved, sample compression */ short res3; /* Reserved */ char ext[20]; /* Additional filename space, used if (name[7] != 0) */ char user[64]; /* User defined. Typically ASCII message. */ ----------------------------------------------------------------------- The Amiga MOD Format -------------------- From: norlin@mailhost.ecn.uoknor.edu (Norman Lin) MOD files are music files containing 2 parts: (1) a bank of digitized samples (2) sequencing information describing how and when to play the samples MOD files originated on the Amiga, but because of their flexibility and the extremely large number of MOD files available, MOD players are now available for a variety of machines (IBM PC, Mac, Sparc Station, etc.) The samples in a MOD file are raw, 8 bit, signed, headerless, linear digital data. There may be up to 31 distinct samples in a MOD file, each with a length of up to 128K (though most are much smaller; say, 10K - 60K). An older MOD format only allowed for up to 15 samples in a MOD file; you don't see many of these anymore. There is no standard sampling rate for these samples. The sequencing information in a MOD file contains 4 tracks of information describing which, when, for how long, and at what frequency samples should be played. This means that a MOD file can have up to 31 distinct (digitized) instrument sounds, with up to 4 playing simultaneously at any given point. This allows a wide variety of orchestrational possibilities, including use of voice samples or creation of one's own instruments (with appropriate sampling hardware/software). The ability to use one's own samples as instruments is a flexibility that other music files/formats do not share, and is one of the reasons MOD files are so popular, numerous, and diverse. 15 instrument MODs, as noted above, are somewhat older than 31 instrument MODs and are not (at least not by me) seen very often anymore. Their format is identical to that of 31 instrument MODs except: (1) Since there are only 15 samples, the information for the last (15th) sample starts at byte 440 and goes through byte 469. (2) The songlength is at byte 470 (contrast with byte 950 in 31 instrument MOD) (3) Byte 471 appears to be ignored, but has been observed to be 127. (Sorry, this is from observation only) (4) Byte 472 begins the pattern sequence table (contrast with byte 952 in a 31 instrument MOD) (5) Patterns start at byte 600 (contrast with byte 1084 in 31 instrument MOD) "ProTracker," an Amiga MOD file creator/editor, is available for ftp everywhere as pt??.lzh. From: Apollo Wong From: M.J.H.Cox@bradford.ac.uk (Mark Cox) Newsgroups: alt.sb.programmer Subject: Re: Format for MOD files... Message-ID: <1992Mar18.103608.4061@bradford.ac.uk> Date: 18 Mar 92 10:36:08 GMT Organization: University of Bradford, UK wdc50@DUTS.ccc.amdahl.com (Winthrop D Chan) writes: >I'd like to know if anyone has a reference document on the format of the >Amiga Sound/NoiseTracker (MOD) files. The author of Modplay said he was going >to release such a document sometime last year, but he never did. If anyone I found this one, which covers it better than I can explain it - if you use this in conjunction with the documentation that comes with Norman Lin's Modedit program it should pretty much cover it. Mark J Cox /*********************************************************************** Protracker 1.1B Song/Module Format: ----------------------------------- Offset Bytes Description ------ ----- ----------- 0 20 Songname. Remember to put trailing null bytes at the end... Information for sample 1-31: Offset Bytes Description ------ ----- ----------- 20 22 Samplename for sample 1. Pad with null bytes. 42 2 Samplelength for sample 1. Stored as number of words. Multiply by two to get real sample length in bytes. 44 1 Lower four bits are the finetune value, stored as a signed four bit number. The upper four bits are not used, and should be set to zero. Value: Finetune: 0 0 1 +1 2 +2 3 +3 4 +4 5 +5 6 +6 7 +7 8 -8 9 -7 A -6 B -5 C -4 D -3 E -2 F -1 45 1 Volume for sample 1. Range is $00-$40, or 0-64 decimal. 46 2 Repeat point for sample 1. Stored as number of words offset from start of sample. Multiply by two to get offset in bytes. 48 2 Repeat Length for sample 1. Stored as number of words in loop. Multiply by two to get replen in bytes. Information for the next 30 samples starts here. It's just like the info for sample 1. Offset Bytes Description ------ ----- ----------- 50 30 Sample 2... 80 30 Sample 3... . . . 890 30 Sample 30... 920 30 Sample 31... Offset Bytes Description ------ ----- ----------- 950 1 Songlength. Range is 1-128. 951 1 Well... this little byte here is set to 127, so that old trackers will search through all patterns when loading. Noisetracker uses this byte for restart, but we don't. 952 128 Song positions 0-127. Each hold a number from 0-63 that tells the tracker what pattern to play at that position. 1080 4 The four letters "M.K." - This is something Mahoney & Kaktus inserted when they increased the number of samples from 15 to 31. If it's not there, the module/song uses 15 samples or the text has been removed to make the module harder to rip. Startrekker puts "FLT4" or "FLT8" there instead. Offset Bytes Description ------ ----- ----------- 1084 1024 Data for pattern 00. . . . xxxx Number of patterns stored is equal to the highest patternnumber in the song position table (at offset 952-1079). Each note is stored as 4 bytes, and all four notes at each position in the pattern are stored after each other. 00 - chan1 chan2 chan3 chan4 01 - chan1 chan2 chan3 chan4 02 - chan1 chan2 chan3 chan4 etc. Info for each note: _____byte 1_____ byte2_ _____byte 3_____ byte4_ / \ / \ / \ / \ 0000 0000-00000000 0000 0000-00000000 Upper four 12 bits for Lower four Effect command. bits of sam- note period. bits of sam- ple number. ple number. Periodtable for Tuning 0, Normal C-1 to B-1 : 856,808,762,720,678,640,604,570,538,508,480,453 C-2 to B-2 : 428,404,381,360,339,320,302,285,269,254,240,226 C-3 to B-3 : 214,202,190,180,170,160,151,143,135,127,120,113 To determine what note to show, scan through the table until you find the same period as the one stored in byte 1-2. Use the index to look up in a notenames table. This is the data stored in a normal song. A packed song starts with the four letters "PACK", but i don't know how the song is packed: You can get the source code for the cruncher/decruncher from us if you need it, but I don't understand it; I've just ripped it from another tracker... In a module, all the samples are stored right after the patterndata. To determine where a sample starts and stops, you use the sampleinfo structures in the beginning of the file (from offset 20). Take a look at the mt_init routine in the playroutine, and you'll see just how it is done. Lars "ZAP" Hamre/Amiga Freelancers ***********************************************************************/ -- Mark J Cox ----- Bradford, UK --- -----------------------------------------------------------------------