home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.barnyard.co.uk
/
2015.02.ftp.barnyard.co.uk.tar
/
ftp.barnyard.co.uk
/
cpm
/
DRI-archive
/
roche
/
BYTESTAT.TXT
< prev
next >
Wrap
Internet Message Format
|
2009-12-11
|
25KB
From: "Salle Arobase" <salle.arob...@ville-rochefort.fr>
Newsgroups: comp.os.cpm
Subject: Re: French Luser news
Date: Tue, 19 Aug 2003 16:08:08 +0200
Organization: Ville de Rochefort
Lines: 580
Message-ID: <bhtagr$s89$1@news-reader4.wanadoo.fr>
References: <be3ocf$k4v$1@news-reader1.wanadoo.fr> <bf4dtp$sg8$1@news.hobby.nl> <1058421800snz@nospam.demon.co.uk> <bfbfjm$7kq$1@news-reader1.wanadoo.fr> <1058725031snz@nospam.demon.co.uk> <bfj8n6$3ar$1@news-reader4.wanadoo.fr> <1058904250snz@nospam.demon.co.uk> <bfr17q$jml$1@news-reader5.wanadoo.fr>
Reply-To: "Salle Arobase" <salle.arob...@ville-rochefort.fr>
NNTP-Posting-Host: apoitiers-106-2-3-98.w81-248.abo.wanadoo.fr
X-Trace: news-reader4.wanadoo.fr 1061301595 28937 81.248.43.98 (19 Aug 2003 13:59:55 GMT)
X-Complaints-To: abuse@wanadoo.fr
NNTP-Posting-Date: 19 Aug 2003 13:59:55 GMT
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
BYTESTAT.TXT by Emmanuel ROCHE
------------
A solution in search of a problem...
Since last time, I have been busy working on the WS4 to HTML
converter. I had a lot to learn about the internals of WS4 and
HTML. Since the main problem was how to display properly WS4
tables, I was surprised to see how difficult it is with HTML.
There are hundreds of Web pages dealing with this subject alone.
It seems that the origin of this difficulty is the lack of
backward compatibility. Instead of starting from an ASR-33 TTY
with its 72 columns of (monospaced) characters, "the powers that
are on the Internet" started with the number of pixels displayed
on the screen. So, when you want to display something under
HTML, you are obliged to say how many pixels to use...
Of course, since most stuff does not fit on a single "page", you
need to add "elevators" on the right side, and a border around
your text, all taking some more pixels from the screen. As a
result, most text I have read so far counsel to assume that,
instead of 640 pixels wide, the lowest resolution for a screen
should be 600 pixels...
Well, this will make for interesting stuff when CP/M computers
will have a Browser, since most of them had not 640 pixels. Back
then, in the prehistoric dark ages, we used to think in term of
the number of characters displayed... and most CP/M computers
used to be able to display 80 characters (for example, to use
WordStar), but rarely had 640 pixels (use CP/M on an Apple IIe,
and you will understand).
One day that I was thinking about this strange asking of 64
columns only (even in the 21th Century), it came to me that we
usually think in decimal. And we often use percentages, and
values less than 100. For instance, in France phone numbers
have 10 numbers, which are written as 01.23.45.67.89 (this could
be a valid phone number). You never pronounce the "hundred".
Only values less than 100, which are more often used. So, I
asked myself: "Would it be possible to display 100 things on a
64 columns line?"
I noticed that 64 is slightly more than 50. The problem being
that, to display 100 things on 50 columns, you would need one
character displaying 2 symbols... that already exist (as a
character) but one with only one representation... Now, it so
happens that, in French, semicolon is "deux points" (two dots)
and, of course, period is "point (final)" ((ending) dot). So,
here I had my 2 characters, one displaying one symbol, the other
displaying two times the same symbol.
(As you can see, I think a lot, and sometimes (most of the
times?) to things that are "obvious" to anybody else. Me, I
spend my time asking: "Why... this or that?" I seem to never
have grown up.)
I jumped to my computer, and produced the following histogram,
which should be self-explanatory:
run"percent
0|
1|.
2|:
3|:.
4|::
5|::.
6|:::
7|:::.
8|::::
9|::::.
10|:::::
Break in 50
Ok
So, we are now able to display percentages on a 64-columns line.
The program which produced the above follows:
list
10 REM PERCENT.BAS by Emmanuel ROCHE
20 :
30 FOR i = 0 TO 100
40 GOSUB 90
50 IF i MOD 21 = 20 THEN WHILE INKEY$ = "" : WEND
60 NEXT i
70 END
80 :
90 ' Percent
100 PRINT USING "###" ; i ;
110 PRINT "|" ;
120 ' 0 = even
130 ' 1 = odd
140 IF i MOD 2 = 0 THEN PRINT STRING$ (i/2, ":") ELSE PRINT
STRING$ ((i-1)/2, ":") "."
150 RETURN
Now, it so happens that, while working on the WS4 to HTML
converter, I was wondering which were the more often used
characters in a file? Difficult question, since characters are
coded using bytes with 256 values, and most files usually hold
thousands of characters... Counting them by hand would be quite
a chore!
Now, it so happens that, recently, I wrote a "general-purpose
filter program in BASIC". Instead of acting upon the occurrence
of each of the 256 possible values of a byte, a simple variation
of this program, counting the number of times a particular byte
value was found inside a file, would solve this problem...
But, there are still some problems on the way. How do you prove
that such a program works accurately? Everything in a computer
(including files and bytes) are powers of 2, and we think in
decimal (and want the results displayed in percentages!)
The only solution was to create some test files. I re-used the
MAKEASCF.BAS program that was mentioned in the FILTER.TXT file.
To be sure that the file really contains the wanted values, the
only solution is to inspect it with a DUMP program.
Ok
dir *.bin
ASCII .BIN ZEROES .BIN ZEREOF .BIN SUITEA .BIN
Ok
run"dumpfile
DUMPFILE: Enter filename.ext: ? ascii.bin
0000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ................
0010: 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F ................
0020: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F !"#$%&'()*+,-./
0030: 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 0123456789:;<=>?
0040: 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F @ABCDEFGHIJKLMNO
0050: 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F PQRSTUVWXYZ[\]^_
0060: 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F `abcdefghijklmno
0070: 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F pqrstuvwxyz{|}~.
0080: 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F ................
0090: 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F ................
00A0: A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF ................
00B0: B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF ................
00C0: C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF ................
00D0: D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF ................
00E0: E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF ................
00F0: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF ................
Ok
I know what will complain those people (probably using a 2-GHz
computers) asking that I use 64 columns: the above dump is 71
columns wide... Well, would it not be time enough to upgrade to
an ASR-33 TTY, you guys? At least, you could print those dumps
on a 30-years old Teletype. What use is Windows and 2-GHz
computers, if they are not able to display 80 columns of text on
17" screens?
So, this was the usual 256 values of a byte, with its
corresponding USASCII characters. Now, let us see what our
program diplaying the percentages of usage of byte values in a
file produces:
run"bytestat
BYTESTAT: Enter filename.typ : ? ascii.bin
Percentages of bytes usage inside file ASCII.BIN.
% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Do you want a histogram? (y/N) n
Ok
Big surprise! We have just seen that there is one occurrence of
each byte value in the file, yet the program says that each one
occurs "zero percent"! The reason is that we think in decimal
(or hundred), and the file holds 256 bytes holding the 256
possible values of a byte. And, 1/256=0.003 percent. Since the
program only display percentages (it is assuming that no single
value will occur 100% of the time in all the bytes, so it only
uses 2 digits to display the percentages) and the value is
0.003, it only displays a "00" (which is automatically shrinked
down to "0").
Well... This seems a reasonable explanation. Could we test the
case that must never occurs, when all the bytes have the same
values? Sure, we modify the line of MAKEASCF so that, instead of
writing the value of the loop index into the file, it writes a
00h byte 256 times. One more DUMP to be sure that the file
really contains them. See below.
Just one remark about the use of "0." and ".0". This goes back
to the ANSI text (1967?) defining the ASCII character set. For
some unknow reasons, it was displayed vertically. I had a
slight problem (at the beginning, since I started programming
using EBCDIC on IBM Mainframes. I was a COBOL programmer)
understanding what were the axes of the ASCII table. Since then
(many, many years ago...), I have used this way of indicating
which are the "high order axis" and the "low order axis". This
works nicely when displaying only 2-digits max values. When
displaying only single digit values (or characters), I hope that
the reader will understand that the table is positioned
horizontally.
(The carriage of the printwheel of my TTY runs horizontally, not
vertically. ASCII was standardised based on the TTY, which was a
best-seller, the standard I/O device for more than 20 years. A
whole generation learned to use computers using one. Screens (or
"glass TTYs") were quite a revolution when they were introduced.
In fact, the first CP/M system had no screen (hence the names of
the "virtual devices" of CP/M 2.2: CON, PUN, RDR, LST, and NUL,
which was a wheel inside the TTY generating a standard answer
message of 40 characters (a string of 00h if not set), as I
explained several times.))
run"dumpfile
DUMPFILE: Enter filename.ext: ? zeroes.bin
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Ok
run"bytestat
BYTESTAT: Enter filename.typ : ? zeroes.bin
Percentages of bytes usage inside file ZEROES.BIN.
% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.| %100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Do you want a histogram? (y/N) y
00|::::::::::::::::::::::::::::::::::::::::::::::::::
Ok
The program worked correctly. It is not designed to display
values appearing more than 99%, so an overflow occurred, which
produced the "%100" displayed above. All the other 256 byte
values are not used, so count for "zero percent". Finally, we
takes this opportunity to test the output of the histogram.
When we move the cursor at the end of the line, the word
processor indicates "Column 54". Since there are 50
"semicolons", the 2-digits hex value at left and a border (and
the cursor), we are correct in getting 54 columns. (We could
even have preceded the histogram with a space or a tab
(8+54=63)...)
Now, let us see what happens when testing the contents of a file
filled with only 2 values: 00h and 1Ah (zero and eof):
run"dumpfile
DUMPFILE: Enter filename.ext: ? zereof.bin
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0080: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
0090: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00A0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00B0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00C0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00D0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00E0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
00F0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................
Ok
run"bytestat
BYTESTAT: Enter filename.typ : ? zereof.bin
Percentages of bytes usage inside file ZEREOF.BIN.
% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.| 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1.| 0 0 0 0 0 0 0 0 0 0 50 0 0 0 0 0
2.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Do you want a histogram? (y/N) y
00|:::::::::::::::::::::::::
1A|:::::::::::::::::::::::::
Ok
It works. 00h is used 50% of the time. 1Ah is used 50% of the
time. The remaining 254 values are used 0% of the time.
Let us finish with a more difficult case. We start with a line
of zeroes, then one "1", then two "2", then three "3", etc. See
the following DUMP to see the internals of the file.
run"dumpfile
DUMPFILE: Enter filename.ext: ? suitea.bin
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0010: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0020: 02 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0030: 03 03 03 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0040: 04 04 04 04 00 00 00 00 00 00 00 00 00 00 00 00 ................
0050: 05 05 05 05 05 00 00 00 00 00 00 00 00 00 00 00 ................
0060: 06 06 06 06 06 06 00 00 00 00 00 00 00 00 00 00 ................
0070: 07 07 07 07 07 07 07 00 00 00 00 00 00 00 00 00 ................
0080: 08 08 08 08 08 08 08 08 00 00 00 00 00 00 00 00 ................
0090: 09 09 09 09 09 09 09 09 09 00 00 00 00 00 00 00 ................
00A0: 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 00 00 00 00 00 00 ................
00B0: 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B 00 00 00 00 00 ................
00C0: 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 00 00 00 00 ................
00D0: 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 00 00 00 ................
00E0: 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 00 00 ................
00F0: 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 00 ................
Ok
run"bytestat
BYTESTAT: Enter filename.typ : ? suitea.bin
Percentages of bytes usage inside file SUITEA.BIN.
% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.| 53 0 1 1 2 2 2 3 3 4 4 4 5 5 5 6
1.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Do you want a histogram? (y/N) y
00|::::::::::::::::::::::::::.
01|
02|.
03|.
04|:
05|:
06|:
07|:.
08|:.
09|::
0A|::
0B|::
0C|::.
0D|::.
0E|::.
0F|:::
Ok
As can be seen (I hope), the program produces correct values.
Now that we have some confidence in the program, let us see what
it displays when computing the percentage of use of bytes inside
WS4 files.
run"bytestat
BYTESTAT: Enter filename.typ : ? printtst.ws4
Percentages of bytes usage inside file PRINTTST.WS4.
% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F
--+------------------------------------------------
0.| 0 0 1 0 0 0 0 0 0 0 3 0 0 2 0 0
1.| 0 0 0 0 0 0 0 0 0 0 1 1 3 0 0 0
2.| 21 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
3.| 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6.| 0 4 1 3 1 4 1 0 2 4 0 0 2 1 3 3
7.| 2 0 3 1 4 1 0 1 0 0 0 0 0 0 0 0
8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
9.| 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0
A.| 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E.| 0 0 0 0 1 2 0 0 0 0 0 0 0 0 1 0
F.| 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
Do you want a histogram? (y/N) n
Ok
Big surprise! 21% of usage by 20h (that is to say: "space"). One
char out of five in a WS4 file is a space... We must, for sure,
treat "spaces" before any other characters when scanning a file!
Another surprise is that none of the uppercase letters (41h to
5Ah) make it, despite so much sentences starting with a A or a
T. Instead, notice the percentage of use of lowercase letters
(61h to 7Ah), and particularly "a" (4%), "e" (4%), "i" (4%) and
"t" (4%). This time, we find "a" and "t", but not in the case we
were expecting.
This above table provides a fascinating view inside the
internals of WordStar 4. But I won't bore you with the details.
Since writing this program (you will find a summary of the
listing below. The missing lines are just repetitions of the
enclosing patterns), I have been wondering what other uses it
could be applied to? If you have any idea, let us know.
list
10 REM BYTESTAT.BAS by Emmanuel ROCHE
20 :
30 PRINT
40 INPUT "BYTESTAT: Enter filename.typ : " ; file$
50 PRINT
60 nofile$ = FIND$ (file$)
70 IF nofile$ = "" THEN PRINT CHR$ (7) "File not found." : PRINT : END
80 OPTION BASE 0
90 DIM t (&HFF)
100 tot = 0
110 OPEN "R", 1, file$, 1
120 FIELD #1, 1 AS byte$
130 :
140 GET #1
150 IF EOF (1) THEN GOTO 230
160 byte = ASC (byte$)
170 hini = INT (byte / 16)
180 loni = byte - hini * 16
190 GOSUB 510
200 tot = tot + 1
210 GOTO 140 ' Main Loop
220 :
230 PRINT "Percentages of bytes usage inside file " UPPER$ (file$) "."
240 PRINT
250 PRINT "% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F"
260 PRINT "--+------------------------------------------------"
270 FOR i = 0 TO &HF
280 PRINT HEX$ (i) ".|" ;
290 FOR j = 0 TO &HF
300 PRINT " " USING "##" ; (t (i * 16 + j) ) * 100 / tot ;
310 NEXT j
320 PRINT
330 NEXT i
340 PRINT
350 :
360 z$ = "" : PRINT "Do you want a histogram? (y/N) " ;
370 z$ = INPUT$ (1)
380 z$ = UPPER$ (z$)
390 IF z$ <> "Y" THEN PRINT : GOTO 480
400 :
410 PRINT : PRINT
420 FOR k = 0 TO &HFF
430 IF t (k) < 1 THEN GOTO 460
440 PRINT RIGHT$ ("0" + HEX$ (k), 2) "|" ;
450 IF t (k) * 100 / tot MOD 2 = 0 THEN PRINT STRING$
( (t (k) * 100 / tot) / 2, ":") ELSE PRINT STRING$
( (t (k) * 100 / tot - 1) / 2, ":") "."
460 NEXT k
470 :
480 PRINT
490 END
500 :
510 ' High Nibble: 0 1 2 3 4 5 6 7
520 ON hini+1 GOSUB 600, 680, 760, 840, 920, 1000, 1080, 1160
530 IF hini > 7 THEN hini2 = hini - 7 ELSE RETURN
540 ' High Nibble: 8 9 A B C D E F
550 ON hini2 GOSUB 1240, 1320, 1400, 1480, 1560, 1640, 1720, 1800
560 RETURN
570 '
580 ' High Nibble: 0
590 ' Low Nibble: 0 1 2 3 4 5 6 7
600 ON loni+1 GOSUB 1870, 1910, 1950, 1990, 2030, 2070, 2110, 2150
610 IF loni > 7 THEN loni2 = loni - 7 ELSE RETURN
620 ' Low Nibble: 8 9 A B C D E F
630 ON loni2 GOSUB 2190, 2230, 2270, 2310, 2350, 2390, 2430, 2470
640 RETURN
1770 '
1780 ' High Nibble: F
1790 ' Low Nibble: 0 1 2 3 4 5 6 7
1800 ON loni+1 GOSUB 11470, 11510, 11550, 11590, 11630, 11670, 11710, 11750
1810 IF loni > 7 THEN loni2 = loni - 7 ELSE RETURN
1820 ' Low Nibble: 8 9 A B C D E F
1830 ON loni2 GOSUB 11790, 11830, 11870, 11910, 11950, 11990, 12030, 12070
1840 RETURN
1850 '
1860 ' 00
1870 T (&H0) = T (&H0) + 1
1880 RETURN
12050 '
12060 ' FF
12070 T (&HFF) = T (&HFF) + 1
12080 RETURN
Ok
system
A>That's all, Folks!
Yours Sincerely,
"French Luser"
EOF