home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Frostbyte's 1980s DOS Shareware Collection
/
floppyshareware.zip
/
floppyshareware
/
GLEN
/
WHATTXT.ZIP
/
WHATDOC.TXT
< prev
Wrap
Text File
|
2012-10-23
|
25KB
|
438 lines
WHAT?format 2.9
---------------
A file format recognition utility for
IBM PCs and compatibles
INTRODUCTION TO WHAT?format started life as a simple utility whose
WHAT?format purpose was to distinguish between text files
created by WordStar and WordPerfect. It was
originally written for people working in a
typesetting house which received a lot of raw text
on floppy disks -- all too often without any
information regarding the system that had created
the files. As time has gone on, the program's
abilities have been extended so that the present
version can distinguish between the native formats
of 27 of the most common word processors. In many
cases it will also distinguish between files
created by different versions of the same program
(e.g. WordPerfect 4.2, 5.0, 5.1, etc.).
In addition to native file formats, version 2.9
also supports a number of text interchange formats
(e.g. Document Content Architecture and Rich Text
Format), various text-only formats (ASCII, DOS,
EBCDIC, etc.), and a few print-formatted/page
description formats (PostScript, PCL, etc.).
Although it is primarily concerned with text files,
WHAT will also recognise an assortment of other
common data formats stemming from database,
spreadsheet, graphics and other applications. A
complete list of supported formats is given at the
end of this document. Others features include the
ability to create a pure hex dump and character set
map. These are described in detail below.
USAGE To use WHAT in its basic mode as a format
recognition utility, simply type WHAT at the DOS
prompt, followed by the name of the file to be
analysed. The file name may include optional drive
and path specifications, as well as standard DOS
wildcards:
C:\>WHAT myfile.doc
(analyse myfile.doc in current directory)
C:\>WHAT a:*.*
(analyse all files on disk in drive A:)
WHAT takes each file matching the file
specification and writes its name and size on the
screen, analyses it and reports the result.
Unrecognized files are reported as being of UNKNOWN
FORMAT. The file name and size are written to DOS'
CON device; the result is written to the standard
output device (normally the console), making it
possible to redirect the result using the usual DOS
redirection techniques.
───────────────────────────────────────────────────────────────────────
WHAT?format 2.9 1
OPTIONS NOTE: All options can be used in either upper or
lower case, and may be preceded by either a slash
or a hyphen. Those used in conjunction with file
names may appear either before or after the file
specification.
Character set Usage: WHAT <filespec> /c [ >filename ]
Creates an on-screen map of all characters
appearing in the first file that matches
<filespec>. The user is then given the option of
writing more detailed information (including the
offset and context of the first occurrence of each
character) to the standard output. If redirection
has been specified on the command line, the result
will be a text file suitable for viewing with Vern
Buerg's LIST. The character set option is
particularly useful with plain text files that do
not use one of the standard character sets. Note
that /C uses the underline attribute in order to
create a 16x16 character set matrix on the screen.
This gives better results on monochrome than on
colour monitors.
Errorlevel USAGE: WHAT /e [ >filename ]
Upon completion of a file analysis WHAT sets the
DOS errorlevel to a value corresponding to the
result arrived at. This switch generates a list of
which errorlevels correspond to which formats. The
output can be redirected to a disk file and
modified to create a batch file for automating file
handling procedures.
Hex dump USAGE: WHAT <filespec> /h [ >filename ]
Creates a hex dump of the first file that matches
<filespec>. The output contains only hex values --
no file offsets or character equivalents. The main
purpose of this switch is to simplify the analysis
of long and complicated formatting instructions
contained within a text file. (The resulting file
is easy to edit since it only contains hex
values.) If you merely want to view the contents of
a file in hex format, you will be better off using
a file browsing utility like LIST, PC-Tools or
Norton Utilities, that also displays file offsets
and character equivalents.
List formats USAGE: WHAT /l
Presents an on-screen list of all file formats
supported by the current version of WHAT.
───────────────────────────────────────────────────────────────────────
WHAT?format 2.9 2
COMMENTARY WHAT is not foolproof, nor is it meant to be. It
belongs to the venerable family of Q&D-utilities,
and its basic philosophy is to be right as often as
possible but without spending all day about it. It
is not as Quick as it could be, and it is no doubt
a good deal Dirtier than it would have been if I'd
been a real programmer. That said, it has been
tested fairly thoroughly on a number of systems and
performs as described in this documentation. No
problems have been reported that would consitute a
threat to your computer or data, but as always, no
responsibility is taken for damage resulting from
incorrect or careless use of the program.
WHAT works by scanning the beginning of a file and
looking for specific formatting features that can
identify its format. The precise features looked
for vary. Some applications -- especially newer ones
-- create files with headers containing an ID-tag, a
kind of "thumbprint" consisting of a special
sequence of bytes that the application itself uses
to determine whether or not the file is in its
native format. For example, all files created by
WordPerfect 5.0 or later begin with the byte
sequence FF 57 50 43 (-1,"WPC"). These kind of
files are an easy match, and WHAT will handle them
quickly and flawlessly.
Other programs present greater problems, especially
those with a native format closely akin to pure
ASCII. PC-Write, for example, produces ASCII files
if the document doesn't contain guide line font
commands or text with attributes such as bold,
underline etc. Such a file will be reported as
being ASCII by WHAT.
If on the other hand, the PC-Write document
contains a few words that are underlined, the file
will resemble an ASCII file interspersed with the
odd 17h -- a "non-ASCII" character. This will
probably be enough for WHAT to reach a verdict of
PC-Write, but it is not difficult to imagine that
the file could have been produced by another
program and that the 17h means something quite
different. In such borderline cases a programming
decision has been made based upon the assumed
popularity of particular applications. (If you
disagree with the decision, don't hesitate to let
me know!) When WHAT makes a mistake, it is often in
this kind of situation.
Another example will further illustrate the
problems involved in differentiating between word
processing systems that use similar formats. I
recently downloaded an ARChive file containing a
number of text files from a bulletin board system.
These files looked like ASCII when I viewed them
with LIST, but WHAT said they were WordPerfect 4.x.
───────────────────────────────────────────────────────────────────────
WHAT?format 2.9 3
In actual fact they turned out to be UNIX-type
ASCII files with line endings marked by a single LF
instead of the CR/LF pair used under DOS. (The
archive file seems to have been put together on an
Amiga.) LF (0Ah) is the code used by WordPerfect to
represent a hard return (hence WHAT's diagnosis),
so the files could equally well have been prepared
using WordPerfect (except that they also had hard
returns where there should have been soft returns).
The question here is whether the result reached by
WHAT was acceptable. My answer -- based mainly upon
pragmatic considerations -- is yes: Wherever the
file might have come from, it is now on a PC
(otherwise I wouldn't be using WHAT!), and if it is
to be edited on a PC, the best program to use is
WordPerfect. Most ASCII editors would complain
bitterly about the missing CR at the end of each
line; but WordPerfect is over the moon, and it will
even allow me to regenerate most of the soft
returns (by reading in the file, saving it as DOS
text, and reading it in again using the option of
converting hard returns in the hyphenation zone to
soft returns). So in this case, WordPerfect is the
best answer -- even though strictly speaking it is
the wrong one.
Dirty tricks If there is one thing that really slows WHAT down
it is a lot of files in unsupported formats. A
couple of dirty tricks are used to minimise this
problem. Firstly, WHAT never reads more than the
first 5 Kb of a file, reasoning that if it hasn't
made up its mind by then, it probably never will.
This could in theory lead to problems. For
example, a PC-Write document consisting of 2-3
pages of straight ASCII followed by a few pages of
heavily formatted text will be judged to be ASCII
-- but you'll be in trouble if you try to import it
to, say, WordPerfect as "DOS Text". Such situations
occur so rarely in practice, however, that the
speed advantages of just looking at the beginning
of a document outweigh the potential disadvantages.
Secondly, WHAT doesn't bother to try to ascertain
whether an .EXE-file really is executable: The
present version quite simply ignores files with the
extensions .EXE and .COM (except when the only
files that match the file specification have one of
these extensions, in which case WHAT will attempt
to analyse the last one -- hopefully
unsuccessfully).
ASCII files and The criterion for differentiating between what WHAT
DOS files calls "ASCII text files" and "DOS text files" is
whether or not characters from the Extended ASCII
set appear in the file, and most specifically
whether the file contains 7-bit or 8-bit
Norwegian/Danish characters -- [ \ ] { | } in ASCII
───────────────────────────────────────────────────────────────────────
WHAT?format 2.9 4
files and Æ ¥ Å æ ¢ å in DOS files. This is an
important distinction in certain European countries
where accented characters may be represented by
national versions of the (7-bit) ISO 646 character
set, so English-speaking users will just have to
live with it! In neither format does WHAT expect to
encounter any control characters other than TAB
(09h), CR (0Dh), LF (0Ah), FF (0Ch) or a single
Control-Z end-of-file marker (1Ah).
FEEDBACK The biggest problem with a program like WHAT is
keeping it up to date. New word processing programs
are appearing all the time, and most of them use
their own native format. Occasionally the format is
described in the documentation that follows the
application, but usually that is not the case. Some
software publishers are willing to make the
details of the format available to developers,
others (like Microsoft and IBM) keep them a closely
guarded secret.
Upgrades of existing programs also present
problems. As new formatting features are added to
the application, the native format changes in order
to accommodate them. Sometimes these changes amount
to no more than the addition of new codes to the
old format (as when WordPerfect was upgraded from
4.1 to 4.2). More major revisions, on the other
hand, can lead to a complete revamping of the
native format (as was the case with WordPerfect
5.0). WHAT has been designed as far as possible to
be able to handle new versions of formats that are
already supported, but no guarantees are made. (I
am fairly certain that WHAT will recognise
documents created by version 6.5 of WordPerfect,
but what happens with 9.0 documents is anybody's
guess!)
Keeping abreast of all these changes and additions
is no easy matter (I have yet to find a company
that runs a mailing list for people interested in
this kind of information!). What that means is that
WHAT can only be improved and kept up to date with
the assistance of its users. So if you find that
WHAT makes a mistake when analysing a supported
format, experience trouble with the latest version
of a particular program, or can provide
information on file formats not currently supported
by WHAT, please do not hesitate to get in touch.
The more example files and technical information
you can provide for a particular format the better.
Your efforts will be rewarded with an
acknowledgement in the next version of WHATDOC and
a typeset copy of this one. (The "wish list" for
the next version of WHAT includes support for Amí;
Excel; PCX, IMG, CGM and GEM graphics; the latest
versions of DisplayWrite and Lotus 1-2-3; and many
more.)
───────────────────────────────────────────────────────────────────────
WHAT?format 2.9 5
THANKS TO... Dag Hasvold, Aron Gurski, Tor Nordahl, Gisle
Hannemyr, Truls Meland and Chris Wolf for helpful
suggestions.
Send comments, files and format documentation to:
Steve Pepper, Pilestredet 97, N-0358 Oslo 3, Norway
or log on to:
Computertext BBS (2400 8-N-1) +47-2-420825.
One final thing: Don't bother suggesting that the
next version of WHAT ought to be able to recognise
non-DOS disk formats unless you are prepared to
tell me how to implement such a feature. I know it
would be enormously useful, but I am a typographer
-- not a programmer!
SUPPORTED Here is a complete list of all formats supported by
FORMATS version 2.9 of WHAT. Those formats for which a
version number or other additional information is
given are marked by an asterisk. Please support
WHAT by helping to make this list more
comprehensive!
Word Ability WP*
processors Acto WP*
ASCII text file (09, 0A, 0C, 0D, 1A, 20..7E)
ASCII even parity
Cicero
DisplayWrite*
DOS text file (as ASCII, plus 80..FE)
DSI Tekst
EBCDIC file
Enable WPF*
Framework
Manuscript*
MASS-11
Microsoft Word
MicroWord
Multimate
Notis WP*
OfficeWriter
Ordbehandling
Palantir*
PC-Write
Samna Word* 1.0 and 2.0
Sprint
Super WP
Symphony* 1.0 and 1.1
Ventura Publisher
Volkswriter
WordPerfect* 4.x and higher
WordStar* 4.0 and higher
WordStar 2000*
XyWrite
───────────────────────────────────────────────────────────────────────
WHAT?format 2.9 6
Formatted text Adobe PostScript (at least min. conforming)
DCA/RFT (Revisable Form Text)
DEC DX
HP LaserJet (PCL)
IBM DCF-GML (Generalised Markup Language)
Rich Text Format
Data bases Ability DBF*
DataPerfect*
dBASE*
Enable DBF*
Reflex*
Spreadsheets Ability s'sheet*
DIF spreadsheet
Enable s'sheet*
Lotus 1-2-3*
PlanPerfect*
SuperCalc
SYLK spreadsheet
Graphics Ability graphics*
GIF* resolution and no. of colours
IFF* resolution for ILBM files
TIFF* version + processor type
WPG* version + bitmap/drawing
Various Ability comms*
ARC archive
LZH archive
MacBinary* TYPE resource
StuffIt! archive
ZIP archive*
COPYRIGHT WHAT?format is copyright Boots & Pepper 1989-90.
It may be freely used and distributed to others
provided no changes are made in the files WHAT.EXE
and WHATDOC.
Steve Pepper
2 June 1990
───────────────────────────────────────────────────────────────────────
WHAT?format 2.9 7