ftp.barnyard.co.uk

home *** CD-ROM | disk | FTP | other *** search

/ ftp.barnyard.co.uk / 2015.02.ftp.barnyard.co.uk.tar / ftp.barnyard.co.uk / cpm / DRI-archive / roche / BYTESTAT.TXT < prev next >

Wrap

Internet Message Format | 2009-12-11 | 25KB

From: "Salle Arobase" <salle.arob...@ville-rochefort.fr> Newsgroups: comp.os.cpm Subject: Re: French Luser news Date: Tue, 19 Aug 2003 16:08:08 +0200 Organization: Ville de Rochefort Lines: 580 Message-ID: <bhtagr$s89$1@news-reader4.wanadoo.fr> References: <be3ocf$k4v$1@news-reader1.wanadoo.fr> <bf4dtp$sg8$1@news.hobby.nl> <1058421800snz@nospam.demon.co.uk> <bfbfjm$7kq$1@news-reader1.wanadoo.fr> <1058725031snz@nospam.demon.co.uk> <bfj8n6$3ar$1@news-reader4.wanadoo.fr> <1058904250snz@nospam.demon.co.uk> <bfr17q$jml$1@news-reader5.wanadoo.fr> Reply-To: "Salle Arobase" <salle.arob...@ville-rochefort.fr> NNTP-Posting-Host: apoitiers-106-2-3-98.w81-248.abo.wanadoo.fr X-Trace: news-reader4.wanadoo.fr 1061301595 28937 81.248.43.98 (19 Aug 2003 13:59:55 GMT) X-Complaints-To: abuse@wanadoo.fr NNTP-Posting-Date: 19 Aug 2003 13:59:55 GMT X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 BYTESTAT.TXT by Emmanuel ROCHE ------------ A solution in search of a problem... Since last time, I have been busy working on the WS4 to HTML converter. I had a lot to learn about the internals of WS4 and HTML. Since the main problem was how to display properly WS4 tables, I was surprised to see how difficult it is with HTML. There are hundreds of Web pages dealing with this subject alone. It seems that the origin of this difficulty is the lack of backward compatibility. Instead of starting from an ASR-33 TTY with its 72 columns of (monospaced) characters, "the powers that are on the Internet" started with the number of pixels displayed on the screen. So, when you want to display something under HTML, you are obliged to say how many pixels to use... Of course, since most stuff does not fit on a single "page", you need to add "elevators" on the right side, and a border around your text, all taking some more pixels from the screen. As a result, most text I have read so far counsel to assume that, instead of 640 pixels wide, the lowest resolution for a screen should be 600 pixels... Well, this will make for interesting stuff when CP/M computers will have a Browser, since most of them had not 640 pixels. Back then, in the prehistoric dark ages, we used to think in term of the number of characters displayed... and most CP/M computers used to be able to display 80 characters (for example, to use WordStar), but rarely had 640 pixels (use CP/M on an Apple IIe, and you will understand). One day that I was thinking about this strange asking of 64 columns only (even in the 21th Century), it came to me that we usually think in decimal. And we often use percentages, and values less than 100. For instance, in France phone numbers have 10 numbers, which are written as 01.23.45.67.89 (this could be a valid phone number). You never pronounce the "hundred". Only values less than 100, which are more often used. So, I asked myself: "Would it be possible to display 100 things on a 64 columns line?" I noticed that 64 is slightly more than 50. The problem being that, to display 100 things on 50 columns, you would need one character displaying 2 symbols... that already exist (as a character) but one with only one representation... Now, it so happens that, in French, semicolon is "deux points" (two dots) and, of course, period is "point (final)" ((ending) dot). So, here I had my 2 characters, one displaying one symbol, the other displaying two times the same symbol. (As you can see, I think a lot, and sometimes (most of the times?) to things that are "obvious" to anybody else. Me, I spend my time asking: "Why... this or that?" I seem to never have grown up.) I jumped to my computer, and produced the following histogram, which should be self-explanatory: run"percent 0| 1|. 2|: 3|:. 4|:: 5|::. 6|::: 7|:::. 8|:::: 9|::::. 10|::::: Break in 50 Ok So, we are now able to display percentages on a 64-columns line. The program which produced the above follows: list 10 REM PERCENT.BAS by Emmanuel ROCHE 20 : 30 FOR i = 0 TO 100 40 GOSUB 90 50 IF i MOD 21 = 20 THEN WHILE INKEY$ = "" : WEND 60 NEXT i 70 END 80 : 90 ' Percent 100 PRINT USING "###" ; i ; 110 PRINT "|" ; 120 ' 0 = even 130 ' 1 = odd 140 IF i MOD 2 = 0 THEN PRINT STRING$ (i/2, ":") ELSE PRINT STRING$ ((i-1)/2, ":") "." 150 RETURN Now, it so happens that, while working on the WS4 to HTML converter, I was wondering which were the more often used characters in a file? Difficult question, since characters are coded using bytes with 256 values, and most files usually hold thousands of characters... Counting them by hand would be quite a chore! Now, it so happens that, recently, I wrote a "general-purpose filter program in BASIC". Instead of acting upon the occurrence of each of the 256 possible values of a byte, a simple variation of this program, counting the number of times a particular byte value was found inside a file, would solve this problem... But, there are still some problems on the way. How do you prove that such a program works accurately? Everything in a computer (including files and bytes) are powers of 2, and we think in decimal (and want the results displayed in percentages!) The only solution was to create some test files. I re-used the MAKEASCF.BAS program that was mentioned in the FILTER.TXT file. To be sure that the file really contains the wanted values, the only solution is to inspect it with a DUMP program. Ok dir *.bin ASCII .BIN ZEROES .BIN ZEREOF .BIN SUITEA .BIN Ok run"dumpfile DUMPFILE: Enter filename.ext: ? ascii.bin 0000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ................ 0010: 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F ................ 0020: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F !"#$%&'()*+,-./ 0030: 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 0123456789:;<=>? 0040: 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F @ABCDEFGHIJKLMNO 0050: 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F PQRSTUVWXYZ[\]^_ 0060: 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F `abcdefghijklmno 0070: 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F pqrstuvwxyz{|}~. 0080: 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F ................ 0090: 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F ................ 00A0: A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF ................ 00B0: B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF ................ 00C0: C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF ................ 00D0: D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF ................ 00E0: E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF ................ 00F0: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF ................ Ok I know what will complain those people (probably using a 2-GHz computers) asking that I use 64 columns: the above dump is 71 columns wide... Well, would it not be time enough to upgrade to an ASR-33 TTY, you guys? At least, you could print those dumps on a 30-years old Teletype. What use is Windows and 2-GHz computers, if they are not able to display 80 columns of text on 17" screens? So, this was the usual 256 values of a byte, with its corresponding USASCII characters. Now, let us see what our program diplaying the percentages of usage of byte values in a file produces: run"bytestat BYTESTAT: Enter filename.typ : ? ascii.bin Percentages of bytes usage inside file ASCII.BIN. % | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F --+------------------------------------------------ 0.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Do you want a histogram? (y/N) n Ok Big surprise! We have just seen that there is one occurrence of each byte value in the file, yet the program says that each one occurs "zero percent"! The reason is that we think in decimal (or hundred), and the file holds 256 bytes holding the 256 possible values of a byte. And, 1/256=0.003 percent. Since the program only display percentages (it is assuming that no single value will occur 100% of the time in all the bytes, so it only uses 2 digits to display the percentages) and the value is 0.003, it only displays a "00" (which is automatically shrinked down to "0"). Well... This seems a reasonable explanation. Could we test the case that must never occurs, when all the bytes have the same values? Sure, we modify the line of MAKEASCF so that, instead of writing the value of the loop index into the file, it writes a 00h byte 256 times. One more DUMP to be sure that the file really contains them. See below. Just one remark about the use of "0." and ".0". This goes back to the ANSI text (1967?) defining the ASCII character set. For some unknow reasons, it was displayed vertically. I had a slight problem (at the beginning, since I started programming using EBCDIC on IBM Mainframes. I was a COBOL programmer) understanding what were the axes of the ASCII table. Since then (many, many years ago...), I have used this way of indicating which are the "high order axis" and the "low order axis". This works nicely when displaying only 2-digits max values. When displaying only single digit values (or characters), I hope that the reader will understand that the table is positioned horizontally. (The carriage of the printwheel of my TTY runs horizontally, not vertically. ASCII was standardised based on the TTY, which was a best-seller, the standard I/O device for more than 20 years. A whole generation learned to use computers using one. Screens (or "glass TTYs") were quite a revolution when they were introduced. In fact, the first CP/M system had no screen (hence the names of the "virtual devices" of CP/M 2.2: CON, PUN, RDR, LST, and NUL, which was a wheel inside the TTY generating a standard answer message of 40 characters (a string of 00h if not set), as I explained several times.)) run"dumpfile DUMPFILE: Enter filename.ext: ? zeroes.bin 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00D0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Ok run"bytestat BYTESTAT: Enter filename.typ : ? zeroes.bin Percentages of bytes usage inside file ZEROES.BIN. % | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F --+------------------------------------------------ 0.| %100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Do you want a histogram? (y/N) y 00|:::::::::::::::::::::::::::::::::::::::::::::::::: Ok The program worked correctly. It is not designed to display values appearing more than 99%, so an overflow occurred, which produced the "%100" displayed above. All the other 256 byte values are not used, so count for "zero percent". Finally, we takes this opportunity to test the output of the histogram. When we move the cursor at the end of the line, the word processor indicates "Column 54". Since there are 50 "semicolons", the 2-digits hex value at left and a border (and the cursor), we are correct in getting 54 columns. (We could even have preceded the histogram with a space or a tab (8+54=63)...) Now, let us see what happens when testing the contents of a file filled with only 2 values: 00h and 1Ah (zero and eof): run"dumpfile DUMPFILE: Enter filename.ext: ? zereof.bin 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0080: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................ 0090: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................ 00A0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................ 00B0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................ 00C0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................ 00D0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................ 00E0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................ 00F0: 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A 1A ................ Ok run"bytestat BYTESTAT: Enter filename.typ : ? zereof.bin Percentages of bytes usage inside file ZEREOF.BIN. % | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F --+------------------------------------------------ 0.| 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.| 0 0 0 0 0 0 0 0 0 0 50 0 0 0 0 0 2.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Do you want a histogram? (y/N) y 00|::::::::::::::::::::::::: 1A|::::::::::::::::::::::::: Ok It works. 00h is used 50% of the time. 1Ah is used 50% of the time. The remaining 254 values are used 0% of the time. Let us finish with a more difficult case. We start with a line of zeroes, then one "1", then two "2", then three "3", etc. See the following DUMP to see the internals of the file. run"dumpfile DUMPFILE: Enter filename.ext: ? suitea.bin 0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0010: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0020: 02 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0030: 03 03 03 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0040: 04 04 04 04 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0050: 05 05 05 05 05 00 00 00 00 00 00 00 00 00 00 00 ................ 0060: 06 06 06 06 06 06 00 00 00 00 00 00 00 00 00 00 ................ 0070: 07 07 07 07 07 07 07 00 00 00 00 00 00 00 00 00 ................ 0080: 08 08 08 08 08 08 08 08 00 00 00 00 00 00 00 00 ................ 0090: 09 09 09 09 09 09 09 09 09 00 00 00 00 00 00 00 ................ 00A0: 0A 0A 0A 0A 0A 0A 0A 0A 0A 0A 00 00 00 00 00 00 ................ 00B0: 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B 0B 00 00 00 00 00 ................ 00C0: 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 0C 00 00 00 00 ................ 00D0: 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 0D 00 00 00 ................ 00E0: 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 0E 00 00 ................ 00F0: 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 0F 00 ................ Ok run"bytestat BYTESTAT: Enter filename.typ : ? suitea.bin Percentages of bytes usage inside file SUITEA.BIN. % | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F --+------------------------------------------------ 0.| 53 0 1 1 2 2 2 3 3 4 4 4 5 5 5 6 1.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 F.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Do you want a histogram? (y/N) y 00|::::::::::::::::::::::::::. 01| 02|. 03|. 04|: 05|: 06|: 07|:. 08|:. 09|:: 0A|:: 0B|:: 0C|::. 0D|::. 0E|::. 0F|::: Ok As can be seen (I hope), the program produces correct values. Now that we have some confidence in the program, let us see what it displays when computing the percentage of use of bytes inside WS4 files. run"bytestat BYTESTAT: Enter filename.typ : ? printtst.ws4 Percentages of bytes usage inside file PRINTTST.WS4. % | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F --+------------------------------------------------ 0.| 0 0 1 0 0 0 0 0 0 0 3 0 0 2 0 0 1.| 0 0 0 0 0 0 0 0 0 0 1 1 3 0 0 0 2.| 21 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 3.| 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.| 0 4 1 3 1 4 1 0 2 4 0 0 2 1 3 3 7.| 2 0 3 1 4 1 0 1 0 0 0 0 0 0 0 0 8.| 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 9.| 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 A.| 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D.| 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E.| 0 0 0 0 1 2 0 0 0 0 0 0 0 0 1 0 F.| 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Do you want a histogram? (y/N) n Ok Big surprise! 21% of usage by 20h (that is to say: "space"). One char out of five in a WS4 file is a space... We must, for sure, treat "spaces" before any other characters when scanning a file! Another surprise is that none of the uppercase letters (41h to 5Ah) make it, despite so much sentences starting with a A or a T. Instead, notice the percentage of use of lowercase letters (61h to 7Ah), and particularly "a" (4%), "e" (4%), "i" (4%) and "t" (4%). This time, we find "a" and "t", but not in the case we were expecting. This above table provides a fascinating view inside the internals of WordStar 4. But I won't bore you with the details. Since writing this program (you will find a summary of the listing below. The missing lines are just repetitions of the enclosing patterns), I have been wondering what other uses it could be applied to? If you have any idea, let us know. list 10 REM BYTESTAT.BAS by Emmanuel ROCHE 20 : 30 PRINT 40 INPUT "BYTESTAT: Enter filename.typ : " ; file$ 50 PRINT 60 nofile$ = FIND$ (file$) 70 IF nofile$ = "" THEN PRINT CHR$ (7) "File not found." : PRINT : END 80 OPTION BASE 0 90 DIM t (&HFF) 100 tot = 0 110 OPEN "R", 1, file$, 1 120 FIELD #1, 1 AS byte$ 130 : 140 GET #1 150 IF EOF (1) THEN GOTO 230 160 byte = ASC (byte$) 170 hini = INT (byte / 16) 180 loni = byte - hini * 16 190 GOSUB 510 200 tot = tot + 1 210 GOTO 140 ' Main Loop 220 : 230 PRINT "Percentages of bytes usage inside file " UPPER$ (file$) "." 240 PRINT 250 PRINT "% | .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F" 260 PRINT "--+------------------------------------------------" 270 FOR i = 0 TO &HF 280 PRINT HEX$ (i) ".|" ; 290 FOR j = 0 TO &HF 300 PRINT " " USING "##" ; (t (i * 16 + j) ) * 100 / tot ; 310 NEXT j 320 PRINT 330 NEXT i 340 PRINT 350 : 360 z$ = "" : PRINT "Do you want a histogram? (y/N) " ; 370 z$ = INPUT$ (1) 380 z$ = UPPER$ (z$) 390 IF z$ <> "Y" THEN PRINT : GOTO 480 400 : 410 PRINT : PRINT 420 FOR k = 0 TO &HFF 430 IF t (k) < 1 THEN GOTO 460 440 PRINT RIGHT$ ("0" + HEX$ (k), 2) "|" ; 450 IF t (k) * 100 / tot MOD 2 = 0 THEN PRINT STRING$ ( (t (k) * 100 / tot) / 2, ":") ELSE PRINT STRING$ ( (t (k) * 100 / tot - 1) / 2, ":") "." 460 NEXT k 470 : 480 PRINT 490 END 500 : 510 ' High Nibble: 0 1 2 3 4 5 6 7 520 ON hini+1 GOSUB 600, 680, 760, 840, 920, 1000, 1080, 1160 530 IF hini > 7 THEN hini2 = hini - 7 ELSE RETURN 540 ' High Nibble: 8 9 A B C D E F 550 ON hini2 GOSUB 1240, 1320, 1400, 1480, 1560, 1640, 1720, 1800 560 RETURN 570 ' 580 ' High Nibble: 0 590 ' Low Nibble: 0 1 2 3 4 5 6 7 600 ON loni+1 GOSUB 1870, 1910, 1950, 1990, 2030, 2070, 2110, 2150 610 IF loni > 7 THEN loni2 = loni - 7 ELSE RETURN 620 ' Low Nibble: 8 9 A B C D E F 630 ON loni2 GOSUB 2190, 2230, 2270, 2310, 2350, 2390, 2430, 2470 640 RETURN 1770 ' 1780 ' High Nibble: F 1790 ' Low Nibble: 0 1 2 3 4 5 6 7 1800 ON loni+1 GOSUB 11470, 11510, 11550, 11590, 11630, 11670, 11710, 11750 1810 IF loni > 7 THEN loni2 = loni - 7 ELSE RETURN 1820 ' Low Nibble: 8 9 A B C D E F 1830 ON loni2 GOSUB 11790, 11830, 11870, 11910, 11950, 11990, 12030, 12070 1840 RETURN 1850 ' 1860 ' 00 1870 T (&H0) = T (&H0) + 1 1880 RETURN 12050 ' 12060 ' FF 12070 T (&HFF) = T (&HFF) + 1 12080 RETURN Ok system A>That's all, Folks! Yours Sincerely, "French Luser" EOF