home *** CD-ROM | disk | FTP | other *** search
- ############################################################################
- # #
- # Compress utility #
- # #
- #==========================================================================#
- # #
- # by David Radford #
- # #
- #==========================================================================#
- # #
- # These files are a part of !CompUtils and may not be distributed #
- # separately other than as laid down in the !ReadMe file. #
- # #
- ############################################################################
-
- Overview
- ========
-
- Compress is a piece of code that produces compressed samples from raw
- linear-signed ones. There is one piece of code for each group of sample
- types supported (eg. Type0-2, Type6-7, etc). Unlike Expand and ExpCode, there
- is no universal version - this would be utterly pointless in 99.9% of cases
- anyway. Each version has an almost-identical user interface, meaning that
- any program you write can quickly take advantage of new compression formats
- provided in future releases of !CompUtils with little or no changes needed
- to your code. However, certain compression algorithms require additional
- data to be passed to them, so a 16-byte block has been set aside for this.
- The format of this 16-byte block depends on the particular algorithm, and,
- with the possible exception of the flags, the default values are usually
- acceptable.
-
-
- Input/Output
- ============
-
- The Compress utility passes data to and from it using a pair of buffers.
- These must be provided for it by your external program, from hereon referred
- to as the master. Data transfers are always controlled by Compress, not the
- master program; Compress *requests* new blocks of raw sample data from the
- master,rather than having them forced upon it. Similarly, Compress
- *requests* the master to dispose of the compressed data in the output
- (destination) buffer when it becomes full.
-
- Compress simply treats the input and output buffers as blocks of data.
- Whether or not you reuse these areas is up to you (thus making them buffers
- in the true sense). If you prefer, you could load the source sample into one
- or more blocks of memory then pass these one at a time to Compress.
- Obviously these blocks could no longer be described as buffers. For
- consistency the term 'buffer' will be used throughout this document since
- most uses of Compress will require one or the other of these blocks to be a
- buffer.
-
-
- The Compress header
- ===================
-
- The code has a standard header through which all parameters are passed:
-
- +0 branch instruction to machine code routine
- +4 returned reason code
- +8 pointer to the source buffer (user provided)
- +12 length of source buffer (user provided)
- +16 pointer to the destination buffer (user provided)
- +20 length of destination buffer (user provided)
- +24 pointer to global workspace block (user provided)
- +28 required length of global workspace, or 0
- +32 pointer to phase 1 workspace block (user provided)
- +36 required length of phase 1 workspace, or 0
- +40 pointer to phase 2 workspace block (user provided)
- +44 required length of phase 2 workspace, or 0
- +48 size of source sample (user provided)
- +52 number of bytes written into destination buffer
- +56 sample period for output (user provided)
- +60 \
- +64 \_ Special data - 16 bytes
- +68 / (see below)
- +72 /
- +76 Amount of source processed so far
- +80 Filetype for compressed samples
-
- Only those entries listed as 'user provided' may be modified; all others are
- read-only and provided for information.
-
-
- Workspace, phases and passes
- ============================
-
- Compression is a two-phase process. First the entire sample is scanned from
- start to finish to determine important information about the sample, then
- Compress works its way through the sample a second time, performing the
- actual compression. In addition, the scanning phase (phase 1) may consist of
- zero or more passes. (A pass is one complete run through the sample.) There
- is usually only one pass in the scanning phase, and there is *always*
- one pass (no more or less) in the compression phase. Note that the scanning
- phase may not need to scan the sample at all (ie. just perform
- initialisation), in which case no source data is requested until sometime
- in the compression phase.
-
- Compress provides one reason code to tell you that it needs to perform
- another pass of the sample (4), and another reason code to tell you it has
- completed the scanning phase (3). In either case you should arrange matters
- so that the next source block read is from the start of the file.
-
- Compress requires a certain amount of addition workspace. Three blocks
- must be supplied: global, phase 1 and phase 2. The global workspace must be
- available from the start of the operation to its completion. The phase 1
- workspace must be available from the start of the operation to the end of
- the scanning phase (ie. reason code 3). The phase 2 workspace must be
- available from the start of the compression phase (ie. after reason code 3)
- to the completion of the operation (reason code 0).
-
- The global workspace must be claimed before an operation begins. The phase 1
- workspace only needs to be claimed if Compress specifically requests it
- using reason code 7. The sizes needed are given in the header; if a block of
- workspace is not needed, its length is given as zero. When Compress returns a
- reason code of 3 (end of scanning phase) you should then claim the phase 2
- workspace. You are free to release the phase 1 workspace (if there is any)
- at this point if this will help. (You cannot claim the phase 2 workspace
- before this point because its length may not have been determined.)
-
-
- Using Compress
- ==============
-
- The source file is compressed by setting the value at +4 to zero, then
- repeatedly calling the location +0 and examining the reason code returned
- at +4 to see what action needs to be taken. Typical actions are refilling
- the source buffer or outputting the contents of the destination buffer.
-
- The master program (ie. the one calling the decompression routine) must
- respond to the reason code returned in the way outlined below. There is no
- need to call the machine code again immediately, and it may be used quite
- happily by repeated calls from the polling loop of a multitasking
- application. This is how !Compress works. If you want to do this, then make
- sure the buffers are fairly small ie. 16K or so to avoid soaking up too
- much processor time.
-
- If your program decides to abort the operation for some reason, you MUST
- ensure that the reason code is at some stage reset to zero before calling
- the machine code routine again (otherwise it will attempt to carry on where
- it left off, with disastrous consequences).
-
- Note that all source buffers passed, except the last one, must be completely
- full. Another way of looking at this is to say that the word at +12 gives
- the amount of data actually stored in the source buffer, or that +8 and +12
- describe a block of data to be decompressed. Destination buffers will always
- be completely full when returned to the master, with the exception of the
- very last buffer which may only be partially full (though never completely
- empty).
-
- When starting a new operation, there is no need to provide a source buffer
- until Compress actually asks for one by means of reason code 1. Equally,
- there is no need to provide a destination buffer until phase 1 is complete
- (which you will be informed of via reason code 3).
-
- Compress may seem a little complicated to use at first, but it is really
- quite straight-forward. The complexity arrises from its flexibility. It is
- recommended that you work your way through one of the example programs to
- get a better understanding of the way it works.
-
-
- The code - Technical details
- ============================
-
- (Basic and C programmers can ignore all this information.)
-
- On entry, R14 contains the return address and R13 must point to a full,
- descending stack with space for at least 32 registers. On exit, registers
- R0-R12 and R14 will have been corrupted, and the value at +32 will give a
- reason code as detailed below. The flags will be corrupted (except for mode
- and interrupt), but no SWIs are called so it will operate in any environment
- (including machines without a proper operating system). A copy of the reason
- code will be returned in R0 to make life easier for you.
-
- If you intend to use Compress on an Arm6-based machine (or later) other than
- an Acorn computer then you should ensure you are using the 26-bit
- programmer's model rather than the 32-bit one, since Compress assumes the
- flags and PC are combined into register R15.
-
- There is a potential problem when Compress is being used in IRQ mode (or FIQ
- mode for that matter) with interrupts enabled, since R14_irq can suddenly
- become corrupt. Since R14 is extensively used as a subroutine link register
- this can be fairly annoying. The solution is to change the ARM to either to
- SVC or USR mode before calling compress. Obviously this problem only applies
- when trying to use Compress from interrupt code, and will not affect
- programs written in Basic or C.
-
-
- Reason codes
- ============
-
- Below is a description of the meanings of the various reason codes that may
- be returned from Compress, together with information on what action should
- be taken. In general, reason codes 0-4 are normal operations, 5-7 are 'soft'
- errors (from which a sufficiently capable master program can recover without
- having to abort the operation), and 8 upwards are 'hard' errors (which must
- always cause the operation to abort).
-
-
- 0 The operation has now finished.
-
- There is *no* data in the destination buffer, even though the value at
- +52 (the amount of data in the buffer) may be non-zero. In fact, the
- value at +52 is a copy of the value last returned with reason code 2
- *unless* that buffer was completely full in which case +52 is zero.
-
- There is a very good reason for this, though it may not be clear. If
- your program is designed so that it only saves the destination buffer
- on reason code 2 *if* the buffer is completely full, then, on reason
- code 0, the amount of data still to be written is given in +52, and
- the data in the buffer is undamaged. A use for this may seem a little
- obscure - the feature is present only because Compress shares some
- code with Expand.
-
- After this reason code no more calls to Compress need to be made, and
- any remaining workspace can be freed. Since +4 is now set to zero, any
- future calls will start a *new* operation.
-
-
- 1 Source buffer request.
-
- Compress returns this reason code when it wants the master to provide
- it with a new block of source data. You should transfer some data from
- the source file to a convenient block of memory, set +8 to point to it
- and +12 to give the length of it, then call Compress again. Compress
- treats +8 and +12 as read-only fields from its point of view, so you
- don't have to keep resetting these if you are reusing the same block of
- memory. Note that the last buffer does not have to be completely full -
- Compress knows automatically when the end has been reached from the
- sample length given in the header and ignores any further source data.
-
- No copy is made of the data, but copies *are* taken of +8 and +12,
- which makes it impossible to move the source buffer while it is in use.
- Reason codes 1, 3 and 4 are the only time during an operation when the
- position and size of this buffer can be changed, since it is never in
- use at this point.
-
- No source buffer needs to be provided to Compress until the first time
- reason code 1 is returned. (Before this, +8 and +12 are considered to
- be undefined and can be changed as you see fit.) You can expect this
- event sometime between starting the operation and receiving reason code
- 3, though it could easily occur after reason code 3 if the particular
- version of Compress has no need of a scanning phase.
-
- By the way, this reason code is *only* for providing a new source
- buffer. You should not attempt to access the destination buffer, and
- the value at +52 will be meaningless. Reason code 2 is used for
- dealing with the destination buffer.
-
- If you find it useful, +76 tells you how much data you have already
- sent. If you want to calculate the percentage of compression done, use
- the formula:
-
- I
- Percentage = 100 * ---
- S
-
- where I is the amount of source processed so far (ie. the contents of
- +76) and S is the length of the source sample (ie. the contents of
- +48).
-
-
- 2 The destination buffer has become full.
-
- Compress returns this reason code when it has a block of data for you
- to output. This data is held in the destination buffer, which (with the
- exception of the last block returned) will always be completely full.
- The last buffer returned may be completely full or partially full but
- never completely empty.
-
- Your program should dispose of the data in whatever manner seems fit
- (eg. copy it to disc) then call the Compress routine again. If you
- wish, before invoking the routine again your program may also change
- the buffer pointer and length fields at offsets +16 and +20
- respectively. This is one of the two occasions when you may safely do
- this; the other is on reason code 3 (which is always returned before
- any attempt is made by Compress to access the destination buffer).
-
- The value at +76 (the number of source bytes processed so far) *may*
- have been updated (depending on the code in use), so you could use this
- to calculate the percentage done so far (eg. for the hourglass).
-
- Note that the number of bytes in the destination buffer is given by
- +52. This will usually (but not always) be equal to the value at +20
- (the buffer size). Always use +52 instead.
-
-
- 3 End of scanning phase.
-
- The scanning phase has now been completed. It is possible that no
- source buffer requests (reason code 1) have been made at this point, so
- you should be careful about any assumptions you make.
-
- At this point you must claim the phase 2 workspace using the length
- provided in +44 (unless the length is zero in which case no workspace
- is needed). You cannot claim the workspace until this point - the
- size may have to be calculated during the scanning phase. If you want
- to save on memory you can of course release the phase 1 workspace,
- since it will no longer be needed.
-
- No output will have been made yet, so there is no need to set up a
- destination buffer until this reason code is received. The situation
- will change shortly after returning control to Compress, so you *must*
- now make sure the buffer is present.
-
- You should arrange matters so that the source block passed on the next
- occurance of reason code 1 is from the start of the sample. For
- example, if the source is a file on disc then you should set the file
- position back to the start of the file. The address and length of the
- source buffer can be changed at the same time if need be.
-
- After receiving this reason code (which you will always get, barring
- errors) you may alter the destination buffer pointer and length at
- offsets +16 and +20. The only other times you may alter these values
- are after the utility has finished filling the destination buffer
- (ie. on reason code 2). Note that copies are taken of +16 and +20 so
- it is not possible to move or resize the destination buffer except on
- reason code 2.
-
- The value at +56 is a single byte given the sample period of the sample
- in microseconds. This information is stored in the file at the time of
- compression, and is otherwise ignored by Expand and Compress, so
- *could* be used to store other information if absolutely necessary.
- (With type 0 samples it must be non-zero.) Many programs using Expand
- expect it to be the sample period though, and you may get odd results
- from such programs if you use it for anything else.
-
-
- 4 Next pass warning - another pass of the source sample is needed
-
- Only occurs during phase 1, and not all versions of Compress will need
- to generate this reason code. The only action that needs to be taken is
- to arrange that future reads made with reason code 1 are from the start
- of the sample again. For example, if the source is a file on disc, then
- this reason code should be used to reset the file pointer to the start
- of the file.
-
-
- 5 Soft error - source buffer was invalid
-
- This reason code is only returned if there is a bug in your program,
- and should not occur under normal circumstances. It occurs if an
- invalid source buffer was passed after a request for a new source
- buffer (reason code 1). The only check performed on the buffer is that
- the length is not negative and is non-zero.
-
- You should either correct the problem and repeat the call, or abort
- the operation.
-
-
- 6 Soft error - destination buffer was invalid
-
- This reason code is only returned if there is a bug in your program,
- and should not occur under normal circumstances. It occurs if an
- invalid destination buffer was passed after a request for a new
- destination buffer (reason code 2). The only check performed on the
- buffer is that the length is not negative and is non-zero.
-
- You should either correct the problem and repeat the call, or abort
- the operation.
-
-
- 7 Instruction to set up the phase 1 workspace
-
- This reason code is returned sometime between starting the operation
- and the end of the scanning phase. It instructs you to set up
- suitable phase 1 workspace using the length given at +36. Until you
- receive this reason code, the word at +36 is undefined, so no claim
- could be made before starting the operation.
-
- Some compressors may not need phase 1 workspace, in which case they
- will never return this value, and the value at +36 will always be
- zero. If you do receive this reason code, the value at +36 will
- definitely be non-zero, and will remain valid until the start of
- the next operation.
-
-
- 8 Hard error - undefined
-
- This error is undefined at present. The operation should be aborted.
- It should never occur in code from this particular release of
- !CompUtils. Note that Expand *does* have a use for this reason code.
-
-
- Filetypes
- =========
-
- The filetype I use for compressed samples is:
-
- &350 - Compressed sample (Squished)
-
- This filetyps is not Acorn allocated and may change in the future. To
- facilitate a smooth transition your program should read the filetype from
- the header on startup. This means that if future versions of Compress use
- a different filetype it is a simple matter of replacing the old code with
- the new code and everything will still work perfectly (barring renaming the
- icons in the !Sprites files).
-
-
- Special data
- ============
-
- This 16-byte block in the header has a format that depends on the
- compression code being used:
-
-
- Type0-2
- -------
-
- +60 Flags:
- bit 0 - Store in linear form, else store in VIDC form.
- bit 1 - Allow compressed table. Distinguishes between Type 1 and
- Type 2 samples.
- +64 Number of 16-byte sample blocks per block group (0-256).
- +68 Number of bits in group header (0-2).
-
- Storing the sample in VIDC format reduces the quality of the sample slightly
- but often produces greater compression.
-
- Types 1 and 2 split the 16-byte blocks up into groups of blocks. Each group
- has a short header giving attributes to the group, the size of which is
- determined in +68. The smaller the header, the less the overheads, but the
- result is a restriction on the compression techniques used. Similarly, a
- large value for +64 reduces the number of headers needed, at the expense of
- slow reaction time to changes in the condition of the sample.
-
- For type 0 samples, +64 and +68 should be zero. For types 1 and 2 +64 and
- +68 should both be non-zero, and bit 1 of +60 determines whether type 1 or
- type 2 is used (there is little difference between the two). The initial
- values in the header provide sensible defaults for producing type 2 samples.
-
- Type4-5
- -------
-
- +60 Flags:
- bit 0 - Enable entropy coding.
- bit 1 - Entropy coding periodically resets.
- bit 31 - Source is 16-bit linear else 8-bit linear.
- +64 Number of samples (not bytes) in one entropy block.
-
- Type 5 samples are essentially just Type 4 with entropy coding on the output.
- Thus bit 0 is the Type 5/Type 4 switch.
-
- In addition, when entropy coding in use, you can opt to use a single table
- throughout the sample, or periodically recalculate the tree to adapt to
- subtle changes in code frequency. If you choose the adaptive output coder
- you should set bit 1 and specify the maximum number of samples that each
- table will be used for in +64.
-
- Type 5 is not currently supported and should not be used. For this reason,
- the above information is subject to change in future versions. For now simply
- make sure that bit 0 of the flags are clear.
-
- Type6-7
- -------
-
- +60 Flags:
- bit 0 - Store in linear form, else store in VIDC form.
- bit 1 - Use a 32-byte translation table, which improves compression
- slightly (only for 8-bit samples)
- bit 2 - Use new sign coding (ie. output as Type 7)
-
- Storing the sample in VIDC format reduces the quality of the sample slightly
- but *should* produce greater compression (this isn't always the case).
- Storing in linear format results in completely lossless compression.
-
-
-
- Using Compress with your own programs
- =====================================
-
- Basic
- -----
-
- Choose an appropriate Compress file and copy it into your application's
- directory. You can load it using something like this:
-
- SYS "OS_File",5,"<App$Dir>.Compress" TO a%,,,,l%
- IF a%<>1 THEN l%=16
- DIM compress% l%
- SYS "OS_File",255,"<App$Dir>.Compress",compress%,0
-
- and the code can be called with:
-
- CALL compress%+0
-
- Although Compress returns the reason code in R0, I would not advise using USR
- instead of CALL. It's better to use CALL and then read the reason code from
- the header. It's easier for debugging for a start.
-
-
- Assembler
- ---------
-
- When including Compress in your own programs there is no reason why it has
- to be kept as a separate file in your application's directory. It is
- actually designed to be embedded somewhere in the middle of your own code.
- Refer to the Technical Details section for more information on calling
- Compress from machine code programs.
-
-
- ObjAsm
- ------
-
- Users of Acorn's Desktop Assembler will find an AOF version of Compress in
- the 'asm' sub-directory. Header files are provided to import all the symbols
- you'll need. Note that unlike Expand and ExpCode, you can link more than one
- copy of Compress to your code. For this reason there is one header file
- (Universal) defining several constants used by all versions of Compress, plus
- one additional header for each group of sample types, giving version-specific
- data.
-
- All you have to do is include the line:
-
- GET CompUtils:Compress.asm.h.Universal
-
- at the start of your source code, followed by something like:
-
- GET CompUtils:Compress.asm.h.Type0-2
-
- repeated as many times as necessary.
-
- Then add the appropriate object file(s) to the list of objects and libraries
- to be linked. There is one object file for each group of sample types
- supported (eg. Type0-2, Type6-7, etc), but no Universal version even if there
- *is* a Universal header.
-
- The version-specific header files import symbols for the start of the Compress
- header (eg. Compress_0) and the entry point (eg. Compress_0_Code). Note the
- number inserted after 'Compress'. This is the lowest numbered sample type
- supported by the object in question (this example is from Type0-2). Only
- version-specific symbols need this number, so that the linker can tell the
- various versions of Compress apart if you have more than 1 linked to your
- code.
-
- The Universal header also defines symbols for offsets from the start of the
- Compress header to various entries within that header (eg.
- Compress_SrcBufferPtr, Compress_SamplePeriod, etc). So, to set the sample
- period you would use something like this:
-
- LDR R0,=Compress ; finds the base address
- STR R1,[R0,#Compress_SamplePeriod] ; writes the word
-
- The first instruction transfers the address of Compress's header into R0, then
- the second instruction transfers the contents of R1 into the sample period
- entry. To call Compress use:
-
- BL Compress_0_Code
-
- which returns a copy of the reason code in R0 to make life easier for you.
-
- Apart from these symbols, the header files also define a series of symbols
- for various bits in the output flags eg. OUTPUT_LINEAR_0, ALLOW_SMALL_TABLE,
- etc. These should be used in preference to actual values (eg. 1<<2 or 4+2+1)
- where possible to allow these to be changed in the future, should the need
- arrise. It can also be used to highlight trouble spots, if a bit's meaning
- changes.
-
- Flag symbols are version-specific so are not held in the Universal file. In
- fact, there is no guarantee that +60 is going to be used for flags in future
- compression code - all the current ones simply define the first of their
- special words as containing flags. Other routines might use this word for
- something else.
-
- The version-specific header may also contain symbols for entries in the
- special-data part of Compress's header (eg. Compress_0_BlocksPerGroup).
-
- There are also some symbols in the Universal header giving textual equivalents
- for reason codes eg. SRC_EMPTY, DEST_FULL, etc which may make your source code
- easier to read (unless you're using jump tables of course). Have a look at the
- file for more details.
-
-
- C/APCS
- ------
-
- For C users, a header file and a series of object files can be been found
- in the 'cc' sub-directory. One header file and one object are provided for
- each group of sample formats supported. You can link more than one of these
- to your own code without problems.
-
- The object code is APCS-compliant (obviously) so can be used from any APCS
- language, such as Pascal, C++, etc. You would have to write your own header
- files though - the ones provided are for C, and should be included in your
- source with something similar to:
-
- #include "CompUtils:Compress.cc.h.Type0-2"
-
- They define the entries in Compress's header as global variables that can be
- accessed just like normal variables eg.
-
- int filetype;
- int count;
- char *buffer;
- char bytes[16];
-
- filetype = Compress_6_SampleType;
- buffer = Compress_6_DestBufferPtr;
- for (count = 0; count < 16; count++)
- bytes[count] = buffer[count];
-
- Apart from these variables, the header files may also define symbols for
- various bits in the output flags, depending on what features the corresponding
- object file supports eg. OUTPUT_LINEAR, ALLOW_SMALL_TABLE, etc. These should
- be used in preference to actual values (eg. 1<<2 or 4+2+1) where possible to
- allow these to be changed in the future, should the need arise. It can also be
- used to highlight trouble spots: if a bit's meaning changes then so would the
- name attached to it, and the compiler would spot the fact that the required
- symbol no longer exists.
-
- There are also some symbols giving textual equivalents for reason codes
- eg. SRC_EMPTY, DEST_FULL, etc which may make your source code easy to read,
- especially when using 'switch'. Have a look at the header file for more
- details.
-
- To call Compress just make a call to function Compress_6(). This takes no
- arguments and returns the reason code (which is also held in global variable
- Compress_6_ReasonCode). Please look at the example file provided for further
- details.
-
- Note that the above description assumes you are using object file 'Type6'. For
- 'Type20' you would use names of the form Compress_20_Reason code, and so on.
- Variables and constants that are not specific to any one version of Compress
- do not need the number in the name (eg. SRC_EMPTY, UNKNOWN_FORMAT), but these
- are not all that common.
-
- Where an object file supports more than one type of compression eg. Type0-2,
- the numberic part of the name is a list of the types supported eg.
- Compress_0_1_2_ReasonCode. There are also variables Compress_0_ReasonCode,
- Compress_1_ReasonCode, and Compress_2_ReasonCode. All four are equivalent;
- they are just different names for the same variables. Generally you should
- choose one naming convention and stick to it.
-
- Similarly, there would be four functions defined: Compress_0_1_2(),
- Compress_0(), Compress_1(), and Compress_2(). Unlike the variables and
- constants these are *not* all equivalent. Instead, the 0_1_2 form is the raw
- Compress code and the other 3 are charged with the responsibility of setting
- up the special data themselves to produce a typical output for the given
- sample type. For example, Compress_2() forces the ALLOW_SMALL_TABLE bit in the
- flags to be set, sets BlocksPerGroup to 6 and BitsPerHeader to 2, and only
- then calls Compress_0_1_2(). The output is then 100% backwards-compatible with
- old versions of the !Compress program (with Pack switched on). This short cut
- may be useful to you, or then again it may not.
-
- If all this has left you a little befuddled (and who would blame you!) try
- modifying the example file and seeing what happens. Or, alternatively, send me
- an email or a letter at one of the addresses in the !ReadMe file and I'll see
- if I can help you out.
-
-
- Bugs
- ====
-
- There may well be some bugs lurking around, particularly in the ObjAsm and C
- versions of Compress (I never use these myself). They have been tested with
- the example programs so they should work, but you never know. Please report
- any bugs, omissions or suggestions for improvement to one of the addresses
- in the main !ReadMe file.
-
-
- (c) David Radford
-
-
-