The Datafile PD-CD 4

home *** CD-ROM | disk | FTP | other *** search

/ The Datafile PD-CD 4 / DATAFILE_PDCD4.iso / utilities / utilsc / computils / !CompUtils / Resources / Compress / Docs next >

Wrap

Text File | 1996-04-13 | 30.1 KB | 664 lines

############################################################################ # # # Compress utility # # # #==========================================================================# # # # by David Radford # # # #==========================================================================# # # # These files are a part of !CompUtils and may not be distributed # # separately other than as laid down in the !ReadMe file. # # # ############################################################################ Overview ======== Compress is a piece of code that produces compressed samples from raw linear-signed ones. There is one piece of code for each group of sample types supported (eg. Type0-2, Type6-7, etc). Unlike Expand and ExpCode, there is no universal version - this would be utterly pointless in 99.9% of cases anyway. Each version has an almost-identical user interface, meaning that any program you write can quickly take advantage of new compression formats provided in future releases of !CompUtils with little or no changes needed to your code. However, certain compression algorithms require additional data to be passed to them, so a 16-byte block has been set aside for this. The format of this 16-byte block depends on the particular algorithm, and, with the possible exception of the flags, the default values are usually acceptable. Input/Output ============ The Compress utility passes data to and from it using a pair of buffers. These must be provided for it by your external program, from hereon referred to as the master. Data transfers are always controlled by Compress, not the master program; Compress *requests* new blocks of raw sample data from the master,rather than having them forced upon it. Similarly, Compress *requests* the master to dispose of the compressed data in the output (destination) buffer when it becomes full. Compress simply treats the input and output buffers as blocks of data. Whether or not you reuse these areas is up to you (thus making them buffers in the true sense). If you prefer, you could load the source sample into one or more blocks of memory then pass these one at a time to Compress. Obviously these blocks could no longer be described as buffers. For consistency the term 'buffer' will be used throughout this document since most uses of Compress will require one or the other of these blocks to be a buffer. The Compress header =================== The code has a standard header through which all parameters are passed: +0 branch instruction to machine code routine +4 returned reason code +8 pointer to the source buffer (user provided) +12 length of source buffer (user provided) +16 pointer to the destination buffer (user provided) +20 length of destination buffer (user provided) +24 pointer to global workspace block (user provided) +28 required length of global workspace, or 0 +32 pointer to phase 1 workspace block (user provided) +36 required length of phase 1 workspace, or 0 +40 pointer to phase 2 workspace block (user provided) +44 required length of phase 2 workspace, or 0 +48 size of source sample (user provided) +52 number of bytes written into destination buffer +56 sample period for output (user provided) +60 \ +64 \_ Special data - 16 bytes +68 / (see below) +72 / +76 Amount of source processed so far +80 Filetype for compressed samples Only those entries listed as 'user provided' may be modified; all others are read-only and provided for information. Workspace, phases and passes ============================ Compression is a two-phase process. First the entire sample is scanned from start to finish to determine important information about the sample, then Compress works its way through the sample a second time, performing the actual compression. In addition, the scanning phase (phase 1) may consist of zero or more passes. (A pass is one complete run through the sample.) There is usually only one pass in the scanning phase, and there is *always* one pass (no more or less) in the compression phase. Note that the scanning phase may not need to scan the sample at all (ie. just perform initialisation), in which case no source data is requested until sometime in the compression phase. Compress provides one reason code to tell you that it needs to perform another pass of the sample (4), and another reason code to tell you it has completed the scanning phase (3). In either case you should arrange matters so that the next source block read is from the start of the file. Compress requires a certain amount of addition workspace. Three blocks must be supplied: global, phase 1 and phase 2. The global workspace must be available from the start of the operation to its completion. The phase 1 workspace must be available from the start of the operation to the end of the scanning phase (ie. reason code 3). The phase 2 workspace must be available from the start of the compression phase (ie. after reason code 3) to the completion of the operation (reason code 0). The global workspace must be claimed before an operation begins. The phase 1 workspace only needs to be claimed if Compress specifically requests it using reason code 7. The sizes needed are given in the header; if a block of workspace is not needed, its length is given as zero. When Compress returns a reason code of 3 (end of scanning phase) you should then claim the phase 2 workspace. You are free to release the phase 1 workspace (if there is any) at this point if this will help. (You cannot claim the phase 2 workspace before this point because its length may not have been determined.) Using Compress ============== The source file is compressed by setting the value at +4 to zero, then repeatedly calling the location +0 and examining the reason code returned at +4 to see what action needs to be taken. Typical actions are refilling the source buffer or outputting the contents of the destination buffer. The master program (ie. the one calling the decompression routine) must respond to the reason code returned in the way outlined below. There is no need to call the machine code again immediately, and it may be used quite happily by repeated calls from the polling loop of a multitasking application. This is how !Compress works. If you want to do this, then make sure the buffers are fairly small ie. 16K or so to avoid soaking up too much processor time. If your program decides to abort the operation for some reason, you MUST ensure that the reason code is at some stage reset to zero before calling the machine code routine again (otherwise it will attempt to carry on where it left off, with disastrous consequences). Note that all source buffers passed, except the last one, must be completely full. Another way of looking at this is to say that the word at +12 gives the amount of data actually stored in the source buffer, or that +8 and +12 describe a block of data to be decompressed. Destination buffers will always be completely full when returned to the master, with the exception of the very last buffer which may only be partially full (though never completely empty). When starting a new operation, there is no need to provide a source buffer until Compress actually asks for one by means of reason code 1. Equally, there is no need to provide a destination buffer until phase 1 is complete (which you will be informed of via reason code 3). Compress may seem a little complicated to use at first, but it is really quite straight-forward. The complexity arrises from its flexibility. It is recommended that you work your way through one of the example programs to get a better understanding of the way it works. The code - Technical details ============================ (Basic and C programmers can ignore all this information.) On entry, R14 contains the return address and R13 must point to a full, descending stack with space for at least 32 registers. On exit, registers R0-R12 and R14 will have been corrupted, and the value at +32 will give a reason code as detailed below. The flags will be corrupted (except for mode and interrupt), but no SWIs are called so it will operate in any environment (including machines without a proper operating system). A copy of the reason code will be returned in R0 to make life easier for you. If you intend to use Compress on an Arm6-based machine (or later) other than an Acorn computer then you should ensure you are using the 26-bit programmer's model rather than the 32-bit one, since Compress assumes the flags and PC are combined into register R15. There is a potential problem when Compress is being used in IRQ mode (or FIQ mode for that matter) with interrupts enabled, since R14_irq can suddenly become corrupt. Since R14 is extensively used as a subroutine link register this can be fairly annoying. The solution is to change the ARM to either to SVC or USR mode before calling compress. Obviously this problem only applies when trying to use Compress from interrupt code, and will not affect programs written in Basic or C. Reason codes ============ Below is a description of the meanings of the various reason codes that may be returned from Compress, together with information on what action should be taken. In general, reason codes 0-4 are normal operations, 5-7 are 'soft' errors (from which a sufficiently capable master program can recover without having to abort the operation), and 8 upwards are 'hard' errors (which must always cause the operation to abort). 0 The operation has now finished. There is *no* data in the destination buffer, even though the value at +52 (the amount of data in the buffer) may be non-zero. In fact, the value at +52 is a copy of the value last returned with reason code 2 *unless* that buffer was completely full in which case +52 is zero. There is a very good reason for this, though it may not be clear. If your program is designed so that it only saves the destination buffer on reason code 2 *if* the buffer is completely full, then, on reason code 0, the amount of data still to be written is given in +52, and the data in the buffer is undamaged. A use for this may seem a little obscure - the feature is present only because Compress shares some code with Expand. After this reason code no more calls to Compress need to be made, and any remaining workspace can be freed. Since +4 is now set to zero, any future calls will start a *new* operation. 1 Source buffer request. Compress returns this reason code when it wants the master to provide it with a new block of source data. You should transfer some data from the source file to a convenient block of memory, set +8 to point to it and +12 to give the length of it, then call Compress again. Compress treats +8 and +12 as read-only fields from its point of view, so you don't have to keep resetting these if you are reusing the same block of memory. Note that the last buffer does not have to be completely full - Compress knows automatically when the end has been reached from the sample length given in the header and ignores any further source data. No copy is made of the data, but copies *are* taken of +8 and +12, which makes it impossible to move the source buffer while it is in use. Reason codes 1, 3 and 4 are the only time during an operation when the position and size of this buffer can be changed, since it is never in use at this point. No source buffer needs to be provided to Compress until the first time reason code 1 is returned. (Before this, +8 and +12 are considered to be undefined and can be changed as you see fit.) You can expect this event sometime between starting the operation and receiving reason code 3, though it could easily occur after reason code 3 if the particular version of Compress has no need of a scanning phase. By the way, this reason code is *only* for providing a new source buffer. You should not attempt to access the destination buffer, and the value at +52 will be meaningless. Reason code 2 is used for dealing with the destination buffer. If you find it useful, +76 tells you how much data you have already sent. If you want to calculate the percentage of compression done, use the formula: I Percentage = 100 * --- S where I is the amount of source processed so far (ie. the contents of +76) and S is the length of the source sample (ie. the contents of +48). 2 The destination buffer has become full. Compress returns this reason code when it has a block of data for you to output. This data is held in the destination buffer, which (with the exception of the last block returned) will always be completely full. The last buffer returned may be completely full or partially full but never completely empty. Your program should dispose of the data in whatever manner seems fit (eg. copy it to disc) then call the Compress routine again. If you wish, before invoking the routine again your program may also change the buffer pointer and length fields at offsets +16 and +20 respectively. This is one of the two occasions when you may safely do this; the other is on reason code 3 (which is always returned before any attempt is made by Compress to access the destination buffer). The value at +76 (the number of source bytes processed so far) *may* have been updated (depending on the code in use), so you could use this to calculate the percentage done so far (eg. for the hourglass). Note that the number of bytes in the destination buffer is given by +52. This will usually (but not always) be equal to the value at +20 (the buffer size). Always use +52 instead. 3 End of scanning phase. The scanning phase has now been completed. It is possible that no source buffer requests (reason code 1) have been made at this point, so you should be careful about any assumptions you make. At this point you must claim the phase 2 workspace using the length provided in +44 (unless the length is zero in which case no workspace is needed). You cannot claim the workspace until this point - the size may have to be calculated during the scanning phase. If you want to save on memory you can of course release the phase 1 workspace, since it will no longer be needed. No output will have been made yet, so there is no need to set up a destination buffer until this reason code is received. The situation will change shortly after returning control to Compress, so you *must* now make sure the buffer is present. You should arrange matters so that the source block passed on the next occurance of reason code 1 is from the start of the sample. For example, if the source is a file on disc then you should set the file position back to the start of the file. The address and length of the source buffer can be changed at the same time if need be. After receiving this reason code (which you will always get, barring errors) you may alter the destination buffer pointer and length at offsets +16 and +20. The only other times you may alter these values are after the utility has finished filling the destination buffer (ie. on reason code 2). Note that copies are taken of +16 and +20 so it is not possible to move or resize the destination buffer except on reason code 2. The value at +56 is a single byte given the sample period of the sample in microseconds. This information is stored in the file at the time of compression, and is otherwise ignored by Expand and Compress, so *could* be used to store other information if absolutely necessary. (With type 0 samples it must be non-zero.) Many programs using Expand expect it to be the sample period though, and you may get odd results from such programs if you use it for anything else. 4 Next pass warning - another pass of the source sample is needed Only occurs during phase 1, and not all versions of Compress will need to generate this reason code. The only action that needs to be taken is to arrange that future reads made with reason code 1 are from the start of the sample again. For example, if the source is a file on disc, then this reason code should be used to reset the file pointer to the start of the file. 5 Soft error - source buffer was invalid This reason code is only returned if there is a bug in your program, and should not occur under normal circumstances. It occurs if an invalid source buffer was passed after a request for a new source buffer (reason code 1). The only check performed on the buffer is that the length is not negative and is non-zero. You should either correct the problem and repeat the call, or abort the operation. 6 Soft error - destination buffer was invalid This reason code is only returned if there is a bug in your program, and should not occur under normal circumstances. It occurs if an invalid destination buffer was passed after a request for a new destination buffer (reason code 2). The only check performed on the buffer is that the length is not negative and is non-zero. You should either correct the problem and repeat the call, or abort the operation. 7 Instruction to set up the phase 1 workspace This reason code is returned sometime between starting the operation and the end of the scanning phase. It instructs you to set up suitable phase 1 workspace using the length given at +36. Until you receive this reason code, the word at +36 is undefined, so no claim could be made before starting the operation. Some compressors may not need phase 1 workspace, in which case they will never return this value, and the value at +36 will always be zero. If you do receive this reason code, the value at +36 will definitely be non-zero, and will remain valid until the start of the next operation. 8 Hard error - undefined This error is undefined at present. The operation should be aborted. It should never occur in code from this particular release of !CompUtils. Note that Expand *does* have a use for this reason code. Filetypes ========= The filetype I use for compressed samples is: &350 - Compressed sample (Squished) This filetyps is not Acorn allocated and may change in the future. To facilitate a smooth transition your program should read the filetype from the header on startup. This means that if future versions of Compress use a different filetype it is a simple matter of replacing the old code with the new code and everything will still work perfectly (barring renaming the icons in the !Sprites files). Special data ============ This 16-byte block in the header has a format that depends on the compression code being used: Type0-2 ------- +60 Flags: bit 0 - Store in linear form, else store in VIDC form. bit 1 - Allow compressed table. Distinguishes between Type 1 and Type 2 samples. +64 Number of 16-byte sample blocks per block group (0-256). +68 Number of bits in group header (0-2). Storing the sample in VIDC format reduces the quality of the sample slightly but often produces greater compression. Types 1 and 2 split the 16-byte blocks up into groups of blocks. Each group has a short header giving attributes to the group, the size of which is determined in +68. The smaller the header, the less the overheads, but the result is a restriction on the compression techniques used. Similarly, a large value for +64 reduces the number of headers needed, at the expense of slow reaction time to changes in the condition of the sample. For type 0 samples, +64 and +68 should be zero. For types 1 and 2 +64 and +68 should both be non-zero, and bit 1 of +60 determines whether type 1 or type 2 is used (there is little difference between the two). The initial values in the header provide sensible defaults for producing type 2 samples. Type4-5 ------- +60 Flags: bit 0 - Enable entropy coding. bit 1 - Entropy coding periodically resets. bit 31 - Source is 16-bit linear else 8-bit linear. +64 Number of samples (not bytes) in one entropy block. Type 5 samples are essentially just Type 4 with entropy coding on the output. Thus bit 0 is the Type 5/Type 4 switch. In addition, when entropy coding in use, you can opt to use a single table throughout the sample, or periodically recalculate the tree to adapt to subtle changes in code frequency. If you choose the adaptive output coder you should set bit 1 and specify the maximum number of samples that each table will be used for in +64. Type 5 is not currently supported and should not be used. For this reason, the above information is subject to change in future versions. For now simply make sure that bit 0 of the flags are clear. Type6-7 ------- +60 Flags: bit 0 - Store in linear form, else store in VIDC form. bit 1 - Use a 32-byte translation table, which improves compression slightly (only for 8-bit samples) bit 2 - Use new sign coding (ie. output as Type 7) Storing the sample in VIDC format reduces the quality of the sample slightly but *should* produce greater compression (this isn't always the case). Storing in linear format results in completely lossless compression. Using Compress with your own programs ===================================== Basic ----- Choose an appropriate Compress file and copy it into your application's directory. You can load it using something like this: SYS "OS_File",5,"<App$Dir>.Compress" TO a%,,,,l% IF a%<>1 THEN l%=16 DIM compress% l% SYS "OS_File",255,"<App$Dir>.Compress",compress%,0 and the code can be called with: CALL compress%+0 Although Compress returns the reason code in R0, I would not advise using USR instead of CALL. It's better to use CALL and then read the reason code from the header. It's easier for debugging for a start. Assembler --------- When including Compress in your own programs there is no reason why it has to be kept as a separate file in your application's directory. It is actually designed to be embedded somewhere in the middle of your own code. Refer to the Technical Details section for more information on calling Compress from machine code programs. ObjAsm ------ Users of Acorn's Desktop Assembler will find an AOF version of Compress in the 'asm' sub-directory. Header files are provided to import all the symbols you'll need. Note that unlike Expand and ExpCode, you can link more than one copy of Compress to your code. For this reason there is one header file (Universal) defining several constants used by all versions of Compress, plus one additional header for each group of sample types, giving version-specific data. All you have to do is include the line: GET CompUtils:Compress.asm.h.Universal at the start of your source code, followed by something like: GET CompUtils:Compress.asm.h.Type0-2 repeated as many times as necessary. Then add the appropriate object file(s) to the list of objects and libraries to be linked. There is one object file for each group of sample types supported (eg. Type0-2, Type6-7, etc), but no Universal version even if there *is* a Universal header. The version-specific header files import symbols for the start of the Compress header (eg. Compress_0) and the entry point (eg. Compress_0_Code). Note the number inserted after 'Compress'. This is the lowest numbered sample type supported by the object in question (this example is from Type0-2). Only version-specific symbols need this number, so that the linker can tell the various versions of Compress apart if you have more than 1 linked to your code. The Universal header also defines symbols for offsets from the start of the Compress header to various entries within that header (eg. Compress_SrcBufferPtr, Compress_SamplePeriod, etc). So, to set the sample period you would use something like this: LDR R0,=Compress ; finds the base address STR R1,[R0,#Compress_SamplePeriod] ; writes the word The first instruction transfers the address of Compress's header into R0, then the second instruction transfers the contents of R1 into the sample period entry. To call Compress use: BL Compress_0_Code which returns a copy of the reason code in R0 to make life easier for you. Apart from these symbols, the header files also define a series of symbols for various bits in the output flags eg. OUTPUT_LINEAR_0, ALLOW_SMALL_TABLE, etc. These should be used in preference to actual values (eg. 1<<2 or 4+2+1) where possible to allow these to be changed in the future, should the need arrise. It can also be used to highlight trouble spots, if a bit's meaning changes. Flag symbols are version-specific so are not held in the Universal file. In fact, there is no guarantee that +60 is going to be used for flags in future compression code - all the current ones simply define the first of their special words as containing flags. Other routines might use this word for something else. The version-specific header may also contain symbols for entries in the special-data part of Compress's header (eg. Compress_0_BlocksPerGroup). There are also some symbols in the Universal header giving textual equivalents for reason codes eg. SRC_EMPTY, DEST_FULL, etc which may make your source code easier to read (unless you're using jump tables of course). Have a look at the file for more details. C/APCS ------ For C users, a header file and a series of object files can be been found in the 'cc' sub-directory. One header file and one object are provided for each group of sample formats supported. You can link more than one of these to your own code without problems. The object code is APCS-compliant (obviously) so can be used from any APCS language, such as Pascal, C++, etc. You would have to write your own header files though - the ones provided are for C, and should be included in your source with something similar to: #include "CompUtils:Compress.cc.h.Type0-2" They define the entries in Compress's header as global variables that can be accessed just like normal variables eg. int filetype; int count; char *buffer; char bytes[16]; filetype = Compress_6_SampleType; buffer = Compress_6_DestBufferPtr; for (count = 0; count < 16; count++) bytes[count] = buffer[count]; Apart from these variables, the header files may also define symbols for various bits in the output flags, depending on what features the corresponding object file supports eg. OUTPUT_LINEAR, ALLOW_SMALL_TABLE, etc. These should be used in preference to actual values (eg. 1<<2 or 4+2+1) where possible to allow these to be changed in the future, should the need arise. It can also be used to highlight trouble spots: if a bit's meaning changes then so would the name attached to it, and the compiler would spot the fact that the required symbol no longer exists. There are also some symbols giving textual equivalents for reason codes eg. SRC_EMPTY, DEST_FULL, etc which may make your source code easy to read, especially when using 'switch'. Have a look at the header file for more details. To call Compress just make a call to function Compress_6(). This takes no arguments and returns the reason code (which is also held in global variable Compress_6_ReasonCode). Please look at the example file provided for further details. Note that the above description assumes you are using object file 'Type6'. For 'Type20' you would use names of the form Compress_20_Reason code, and so on. Variables and constants that are not specific to any one version of Compress do not need the number in the name (eg. SRC_EMPTY, UNKNOWN_FORMAT), but these are not all that common. Where an object file supports more than one type of compression eg. Type0-2, the numberic part of the name is a list of the types supported eg. Compress_0_1_2_ReasonCode. There are also variables Compress_0_ReasonCode, Compress_1_ReasonCode, and Compress_2_ReasonCode. All four are equivalent; they are just different names for the same variables. Generally you should choose one naming convention and stick to it. Similarly, there would be four functions defined: Compress_0_1_2(), Compress_0(), Compress_1(), and Compress_2(). Unlike the variables and constants these are *not* all equivalent. Instead, the 0_1_2 form is the raw Compress code and the other 3 are charged with the responsibility of setting up the special data themselves to produce a typical output for the given sample type. For example, Compress_2() forces the ALLOW_SMALL_TABLE bit in the flags to be set, sets BlocksPerGroup to 6 and BitsPerHeader to 2, and only then calls Compress_0_1_2(). The output is then 100% backwards-compatible with old versions of the !Compress program (with Pack switched on). This short cut may be useful to you, or then again it may not. If all this has left you a little befuddled (and who would blame you!) try modifying the example file and seeing what happens. Or, alternatively, send me an email or a letter at one of the addresses in the !ReadMe file and I'll see if I can help you out. Bugs ==== There may well be some bugs lurking around, particularly in the ObjAsm and C versions of Compress (I never use these myself). They have been tested with the example programs so they should work, but you never know. Please report any bugs, omissions or suggestions for improvement to one of the addresses in the main !ReadMe file. (c) David Radford