home *** CD-ROM | disk | FTP | other *** search
- How to Use File Compression
- Rick Weerts
-
- Fred, a novice modem user, heads towards the download section of his
- favorite Bulletin Board System. He pulls off about four files and logs
- off an hour later.
-
- Fred then attempts to execute the programs, expecting marvelous things
- on his microcomputer. But no matter what he does or who he calls, the
- files he received refused to work.
-
- Fred is the victim of file compression. Numerous utilities exist today
- which allow the modem user to compress a file before transmission to a
- board or another user. This compression saves on users' time and phone
- bills, and the BBS itself receives more room for download files.
- Nevertheless, unless the files are converted properly after they are
- downloaded, they are useless, as the computer cannot read them. This
- article attempts to cut through some of the fog regarding file
- compression and its effects on the bulletin board community...
-
- The Concept
- -----------
-
- Most files on a bulletin board (and most others on microcomputers in
- general) make use of a lot of text (alphabetic) information. This
- information (when stored in a standard format known as ASCII) can be
- analyzed by virtually any other computer system operating today.
- However, with the invention of 8 bit protocols that are used for most
- telecommunications, this storage format is wasteful. It is also much
- more expensive when telephone usage is paid for by the hour.
-
- Before I go on, it is important to explain the bit protocols and the
- information they attempt to represent. As most people know, each piece
- of information in a computer is coded with a series of 1's and 0's. The
- computer can act and interpret on these codes. A combination of seven
- 1's and 0's is called a 7 bit protocol (one bit for each number in the
- combination). This protocol makes use of the ASCII character standards.
- For you math wizards out there, the combination of seven binary bits of
- 1's and 0's allows for 128 numeric combinations. Each number in the 128
- numeric combinations stands for a specific alphanumeric code. These
- codes much up the ASCII table and include all alphabetic and some
- special characters the computer generates. The numeric representation
- of characters is what the computer deals in, not the characters
- themselves.
-
- However, my IBM can generate 256 CHR$(x) codes! And I just said there
- were only 128?
-
- IBM and most other modern computer makers have also included a 8th
- (high) bit in the character codes for their system. This additional bit
- adds another 128 possible combinations (1+2+4+8+16+32+64+128) to the
- previous 128 on the original ASCII table. The additional 128 numerals
- can represent additional symbols or even block graphic characters.
-
- I am not going to get into the specifics on bit protocols. The idea
- behind file compression (squeezing, compressing, crunching, whatever you
- want to call it) is to make use of these extra 128 characters that are
- used less than the first 128. These extra characters can stand for
- twelve spaces, sixteen hyphens, or whatever else the compression
- software allows. This coding allows more information to be included in
- the same amount of space. Since the computer is using all 256
- characters in such an environment, you must transmit using the 8 bit
- modem protocol (8-N-1, referring to eight bits, no parity, and one stop
- bit).
-
- Putting it into Action
- ----------------------
-
- When a file is compressed, it passes through a utility program and the
- "white space" is deleted from the source document, resulting in a
- compact file. In some cases, this space savings can be more than 50% of
- the previous file size. However, the file is now in a format that the
- computer cannot read "stand-alone". It requires special software to
- interpret how to unsqueeze the file into standard storage specs. In
- addition, the file may be stored in a library or archive before or after
- it is compressed. The file is stored together with other files under
- one single filename on the disk. This process can then be reversed by
- the end user. The time savings involved in this clever process becomes
- readily apparent. Compare the time it takes to download one 200K file
- versus twenty 10K files, each with it's own distinct filename.
-
- File Extensions
- ---------------
-
- Most disk files have an eight letter name and a three letter extension
- (separated by a period or slash). Often, the extension indicates what
- method of compressing or archiving was used. Below are some common file
- extensions BBSs and archiving software uses to denote files which use
- packing techniques.
-
- File Name Packing Method
- =============================================================
- FIREFLY.EXE Original File None
- FIREFLY.EQE Squeeze (SQ)
- FIREFLY.LBR Library (LU)
- FIREFLY.LQR (SQ), (LU)
- FIREFLY.ARC Archive (ARC)
-
- FIDONWS.DOC Original Text File None
- FIDONWS.DQC Squeeze (SQ)
- FIDONWS.LBR Library (LU)
- FIDONWS.LQR (SQ), (LU)
- FIDONWS.ARC Archive (ARC)
-
- BOOMERS.BAS Original BASIC Program Binary
- BOOMERS.BQS Squeeze (SQ)
- BOOMERS.LBR Library (LU)
- BOOMERS.LQR (SQ), (LU)
- BOOMERS.ARC Archive (ARC)
-
- Squeeze and Unsqueeze
- ---------------------
-
- SQ and SQPC are two of the first available software packages designed to
- compress files into the smallest possible form. AUSQ, UNSQ, and NUSQ
- are their counterparts. They put files back into expanded format on
- request. Squeezed files usually have a Q as the second letter of their
- three-letter file name extension. Simply typing AUSQ or SQPC alone on a
- command line at the DOS level brings up a small help screen that shows
- how to operate the system. Since there are many such programs on the
- market, my object is to explain the concept behind them, not how a
- specific package operates. Nevertheless, the documentation on these
- packages is usually enough to operate it successfully.
-
- The Data LIBRARY
- ----------------
-
- LU.EXE is the original library (LBR) utility. It allows the packing
- (and unpacking) of files into one large file. LUed files usually have a
- LBR file extension. In the same respect, a LQR extension indicates that
- the file must be unsqueezed (using AUSQ or NUSQ) BEFORE it is converted
- with the library utility. Also, libraries may consist of one or more
- libraries residing inside of each other, some squeezed beforehand. As
- you can see, the standards for the file extensions are important to
- follow when dealing with such a variety of systems.
-
- Usually, typing LU will give a brief command line summary of the
- function. The library utility usually provides a command which will
- remove all the files from the library file to stand-alone files. The
- LBR file will then serve as a compressed backup of the information you
- have unpacked.
-
- It is usually handy to unpack a library by putting it in its own
- subdirectory (DOS MKDIR Command). In this way, it becomes clearly
- evident which files have been removed from the library. You will not
- get them confused with other files with similar or identical names. You
- can then move the files (DOS COPY Command) outside of the subdirectory
- or onto the disk of your choosing.
-
- Archiving Systems
- -----------------
-
- Finally, we come to ARC.EXE, short for Archive. This handy little
- utility takes all the guesswork out of squeezing and packing files into
- an archive (or library). ARC automatically decides the best way to
- compress a file and then adds it to the archive. ARC also unpacks the
- file in the same way, eliminating the squeeze step of the process. The
- archive utility is compact and it makes the other packing schemes
- obsolete. Of course, if you have only one file to pack, you may only
- want to squeeze it. In this case AUSQ comes into play.
-
- Typing ARC at the DOS command line prompt causes the program to supply
- you with an informative help screen. To unpack all the files from an
- archive, you type ARC E archive_name. Addition of files to an archive
- is just as simple.
-
- Once a file has been archived, there is no need for further squeezing.
- The file has been squeezed as tightly as possible and any further
- attempts at compression will only add to the file's size. Files with
- the file name extension .ARC are archive files.
-
- The instructions about putting the archive data files in separate
- directories still stands. This technique certainly makes for a much
- easier time of packing and unpacking.
-
- What All This Means to Me
- -------------------------
-
- Archiving and squeezing are not requirements (in most cases) before a
- file before is transmitted to a bulletin board. However, most BBS
- system operators will ARC or squeeze the files they receive from users.
- Compressing the files ahead of time saves time for the sysop and also
- allows more room on the BBS for additional download files, good for
- everyone involved. Also, a sysop of a Commodore board, for example,
- will probably not have the capability to squeeze files intended for IBM
- systems.
-
- Compressing files is also a good idea from the KISS (Keep It Simple,
- Stupid!) concept of file transfer. All the files necessary for a
- software system to run should be placed under one archive name. The
- users of the BBS are much more likely to get a working system than if
- they have to sift through 500 files until they find all the correct ones
- for that particular system. It also allows for easy updates when you
- improve the software. Archiving makes sure there is only one filename
- to delete and one to add.
-
- Finally, compressing files SAVES MONEY! Compressed files can be shrunk
- more than 50%, cutting AT&T's share of a long distance call in half.
- They are also convenient for pay services such as CompuServe or The
- Source, where they make sure every second costs. And it saves money in
- both directions, as both the sender and receivers benefit from lower
- bills.
-
- So the next time you send a file to your favorite BBS, do everyone a
- favor and do a squeeze play on Ma Bell.
-
- Other Confusing Items
- ---------------------
-
- After writing this piece, I noticed that I fluttered for one verb usage
- to another with uncanny regularity. So below is a list of terms and
- their (my) definitions.
-
- Archive File(s) that are squeezed and lumped under a
- single heading by the ARC.EXE package.
-
- ASCII Seven bit protocol standard agreed upon by
- all major microcomputer manufacturers.
-
- Compress To make a file smaller by shrinking the
- space the file occupies.
-
- Crunch Same as compress.
-
- Library File(s) lumped under a single heading by
- LU.EXE or a similar package.
-
- Pack Placing numerous files under a single
- heading by either the ARC or LU utilities.
-
- Protocol Code of 1's and 0's indicating
- characters stored in a computer's
- memory.
-
- Squeeze Same as compress.
-
- Unpack Remove from library or archive one or more
- sections into stand-alone files.
-
- Unsqueeze Return file to original structure
-
- I hope you will find the above article helpful. If you have any
- questions, comments, additions, corrections, gripes, etc., please send
- them to me. I will make an attempt to respond as soon as possible. Try
- to leave a Fido, GEnie or CompuServe address.
-
- December 4, 1985
- Rick Weerts
-