home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
DP Tool Club 24
/
CD_ASCQ_24_0995.iso
/
vrac
/
cdir100.zip
/
CDIR.DOC
next >
Wrap
Text File
|
1995-06-26
|
10KB
|
175 lines
ClusterDir v1.00
Why the FAT file system is inefficient
-------------------------------------
Did you know that DOS lies to you each time you type the "DIR" command?
It's true, well, sort of. You see, DOS reports a files length as the the
number of bytes in that specific file, not by the amount of space the
file takes up on disk. "Wait a minute," you say, "How can this be?"
"Isn't the amount of bytes in a file the amount of space that file takes
up on a disk?" The answer is, no. No no no. Allow me to explain. DOS
uses a file system called the FAT (File Allocation Table) file system.
The FAT file system has been in use since the days of 360k diskettes,
and it was ideal back then. Then again, having two big black floppy
disks on the front of your PC look pretty spiffy back then. These days
when a one gigabyte hard disk is standard on new systems, the FAT file
system seems totally prehistoric. The problem with the FAT file system
(well, one of it's many problems) is that it stores files in allocation
units called clusters. Clusters are equally divided portions of your
hard disk, used to store files. The cluster size of a given partition
depends on it's size. By multiplying the number of bytes per sector on a
given partition by it's sectors per cluster we can determine it's
cluster size. The below table shows cluster sizes as they relate to
various partition sizes.
Volume Size Cluster Size
------------- ------------
16MB - 128MB 2048 bytes
128MB - 256MB 4096 bytes
256MB - 512MB 8192 bytes
512MB - 1GB 16384 bytes
1GB - 2GB 32768 bytes
2GB - 4GB 65536 bytes
For a file to be stored on a FAT system it must (at the very least)
reside within one cluster. So if you have a 550MB partition your cluster
size is 16384 bytes. Every file on that partition will take up at least
16384 bytes. DOS will report a one byte file as taking up only one byte,
when in reality that one byte file takes up 16384 bytes of hard disk
real estate. Those extra 16383 bytes are what is referred to as "Cluster
Overhang." What happens when a file size is over the length of one
cluster? Well it is padded out to fit within cluster boundaries. So a
18,411 byte file on a system with a 8192 (8k) cluster size will take up
a total of 24,576 bytes (three clusters.)
Realizing this fact you may see now why disk space seems to go a lot
quicker than it seems it should. The problem becomes very apparent on a
system with a large cluster size and many small files. Each small file
is padded out to fit within cluster confines and the result is a lot
wasted disk space. On a system with many large files, cluster overhang
is less apparent, but it is still there. It is rare to find a system
with a large ratio of large files. It is extremely rare to find a disk
that would benefit from the FAT file system.
Is there any way out? If you wish to continue running DOS/Windows there
are a few of solutions. The best, but far from most convenient method
would be to reformat your disk to accommodate a smaller cluster size. If
you have a one gigabyte disk partitioned to one big partition the
cluster size is 32768 bytes (32k) - which is an insane waste of space.
If you were to format that disk into two 500 megabyte partitions the
cluster size would drop way down to 8192 bytes, thus saving a lot of
wasted disk space. Another solution (which I do not recommend to loved
ones) is to install Stacker, by Stac Electronics. Stacker uses it's own
file system for compressed drives. These drives, while FAT compatible do
not fall prey to cluster overhang. If you are looking to add drive space
via software compression and luck is on your side, Stacker is a viable
option, if not, I wouldn't recommend it. Yet another solution is
switching to either OS/2, which can use the HPFS (High Performance File
System) or Windows/NT which can use both the NTFS (New Technology File
System) and the HPFS file system. What do these files system buy you?
Under NTFS, cluster size is restricted to 4096 bytes, no matter how big
the partition. The HPFS file system allocates disk space in 512 byte
sectors, no matter how big the drive. The space saved on a large drive
formatted with either of these two file systems is substantial. "Wait!"
You say, "What about the new VFAT file system which ships with Windows
'95?" No go. The VFAT file system is simply the FAT file system, updated
to handle long file names. Sans the boneheaded 8.3 file naming
convention, VFAT inherits all of FAT's shortcomings, including cluster
overhang.
Using The Program
-----------------
ClusterDir (CDIR.EXE) is a simple directory utility. It is by no means
meant to be a "DIR" replacement. The principle use of CDIR is to get a
good look at how much space is being wasted on your hard disk due to
cluster overhang. The use of CDIR is very simple, just type "CDIR" at
the DOS prompt for a listing of the current directory. You can
optionally supply an alternative path name to get a listing in another
directory on the current drive or on another disk. CDIR's output is much
the same as that of DOS with the exception of the "true size" and
attribute fields.
An example.
...
C:\>CDIR C:\DOS\*.TXT
Reported Cluster Size for Drive C is 8,192 bytes
Directory of C:\DOS\*.TXT
COUNTRY TXT 15,920 16,384 5-31-94 6:22a .....A
DRVSPACE TXT 41,512 49,152 5-31-94 6:22a .....A
NETWORKS TXT 17,465 24,576 5-31-94 6:22a .....A
README TXT 60,646 65,536 5-31-94 6:22a .....A
4 files 135,543 bytes used
155,648 bytes used (actual)
20,105 bytes wasted (14.83%)
193,683,456 bytes free
...
The heading of the output lists the cluster size of the partition the
directory is done on as reported by DOS and the wildcard and directory
the listing is being done on...
---
Reported Cluster Size for Drive C is 8,192 bytes
Directory of \DOS\*.TXT
---
The body of the output is much the same as the DOS "DIR" command...
Filename Extension Size True size Date Time Attribute
--- | | | | | | |
README TXT 60,646 65,536 5-31-94 6:22a .....A
---
The filename, extension, size, date, and time fields are the same as in
DIR's output. The true size field is the actual amount of disk space
that file resides in. The attribute field contains the the first
letter(s) of the listed files attributes: Attribute, Read-only, Hidden,
System, Directory, or Volume-label.
The tail of the output contains the total amount of files listed, how
many bytes used are reported by DOS, how many bytes are actually being
used (including cluster overhang), the amount of disk space wasted due
to cluster overhang and the percentage of disk space wasted, and the
free amount of disk space on the drive the drive being queried.
---
4 files 135,543 bytes used
155,648 bytes used (actual)
20,105 bytes wasted (14.83%)
193,683,456 bytes free
---
Conclusion
----------
CDIR is a utility I put together after purchasing a 1.1 gigabyte hard
disk and sizing it to two 550 meg partitions. I soon found out the
troubles of a 16k cluster size. After reading an article in PC-Magazine
Vol. 14 No. 12 by Jeff Prosise, _Drive Size vs. Storage Efficiency_, I
coded CDIR to get a more atomic view of the cluster overhang shortcoming
of the FAT file system than the listed utility, "CHKDRIVE". I suggest
you also check out CHKDRIVE, the source code to this program is listed
in the above mentioned article. The utility can also be downloaded via
anonymous FTP at ftp.pcmag.ziff.com. CHKDRIVE will scan your entire disk
and compute the amount of space wasted on the entire drive due to
cluster overhang and FAT entries. It will also show how much space could
be saved by re-sizing your partitions. Another interesting utility I
have found is called "WASTED." WASTED is similar to CHKDRIVE in that it
computes the total wasted space on a drive, however WASTED is much
quicker. WASTED can be found at any SimTel site in the /msdos/diskutil
directory.
CDIR is not meant to be a DIR replacement. It has no sorting
capabilities, nor any command line options like the DIR command. It does
a pretty raw directory, listing all files specified no matter what
attribute. None-the-less it does what it was meant to do and is in it's
own right very useful. There is no fee for use, so go ahead and use it.
Any comments/flames can be directed via internet email to
hraiser@cloud9.net or http:\\cloud9\net\~hraiser via the World Wide Web.