home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 11 Util
/
11-Util.zip
/
file39a.zip
/
src
/
notes2.os2
< prev
next >
Wrap
Text File
|
1993-08-05
|
52KB
|
1,302 lines
Additional notes on the OS/2 port of Darwin's "file". Note that
the email exchanges have been edited. Some of the edits are
indicated with [[...]].
Some of the magic-file entries in the first OS/2 release were
originally from Greg Roelofs (roe2@midway.uchicago.edu). These appear
below.
There are several notes from Jouni Miettunen (jon@stekt.oulu.fi)
with good ideas for enhancements. His magic-file entries for DOS
executables are included.
=============================================================================
From: Greg Roelofs (roe2@midway.uchicago.edu)
...would like to be notified if and when my various
guessed-at entries (particularly for lha and zoo) are defini-
tively shown to be either correct or incorrect.
My magic file, in more or less the state in which it was posted
to alt.sources (I think) several months ago, is appended below.
Note that the image-format entries are from Cameron Simpson, on
whose previously posted magic file mine is based.
# From: cameron@spectrum.cs.unsw.oz.au (Cameron Simpson)
# Newsgroups: alt.sources,comp.unix.programming
# Subject: some useful additions to your magic file ...
# Message-ID: <1992Mar21.055523.27025@usage.csd.unsw.OZ.AU>
# Date: 21 Mar 92 05:55:23 GMT
# Sender: news@usage.csd.unsw.OZ.AU
# Organization: CS&E Computing Facility, Uni Of NSW, Oz
#
# This is simply a list of a few entries which seem missing from most systems'
# magic files. The first 10 or so entries recognise and classify executable
# scripts, which otherwise tend to be called C source etc. The last few entries
# recognise a couple of common image formats, always useful when trolling for
# users who are wasting disc space when things are tight.
# - Cameron Simpson
# cameron@cs.unsw.oz.au
#
0 string #! exec()able script
>2 string /bin/sh - Bourne shell
>2 string /bin/ksh - Korn shell
>2 string /usr/local/bin/zsh - Paul Falstad's zsh
>2 string /usr/local/bin/ash - NeilBrown's ash
>2 string /usr/local/bin/ae - NeilBrown's ae
>2 string /usr/local/bin/perl - Perl
>2 string /usr/bin/perl - Perl
>2 string /bin/awk - AWK
>2 string /bin/nawk - new AWK
>2 string /usr/bin/nawk - new AWK
>2 string /usr/local/bin/nawk - new AWK
>2 string /bin/gawk - GNU AWK
>2 string /usr/local/bin/gawk - GNU AWK
>2 string /bin/csh - C shell
0 string GIF GIF image archive
>3 string 87a - version %3s
>3 string 87A - version %3s
>3 string 89a - version %3s
>3 string 89A - version %3s
0 long 0xffd8ffe0 JPEG image, big endian
0 long 0xe0ffd8ff JPEG image, little endian
0 string hsi1 HSI1 image (wrapper for JPEG?)
#
# Newtware Specials: compressed and PC-based files (also zsh, above).
# Greg Roelofs, 15 May 92. Most recent revisions: 7 Jan 93.
#
0 string MZ MS-DOS executable
>24 string @ (OS/2 or Windows format)
#
# >>>>> ARC <<<<<
#
0 string \032\010 Arc archive
# 0 short 0x1a08 Arc archive
# 0 short 0x081a Arc archive
#
# >>>>> LHARC/LHA <<<<<
#
2 string -lh0- Lharc 1.x archive
2 string -lh1- Lharc 1.x archive
2 string -lz4- Lharc 1.x archive
2 string -lz5- Lharc 1.x archive
# [never seen any but the last:]
2 string -lzs- LHa 2.x? archive [lzs]
2 string -lh - LHa 2.x? archive [lh ]
2 string -lhd- LHa 2.x? archive [lhd]
2 string -lh2- Lha 2.x? archive [lh2]
2 string -lh3- LHa 2.x? archive [lh3]
2 string -lh4- LHa 2.x? archive [lh4]
2 string -lh5- LHa (2.x) archive
#
# >>>>> ZIP <<<<<
#
# [newer, smarter "file" programs]
0 string PK\003\004 Zip archive
>4 string \011 (at least v0.9 to extract)
>4 string \012 (at least v1.0 to extract)
>4 string \013 (at least v1.1 to extract)
>4 string \024 (at least v2.0 to extract)
# [stupid "file" programs, big-endian]
# 0 long 0x504b0304 Zip archive
# >1 long 0x4b030409 (at least v0.9 to extract)
# >1 long 0x4b03040a (at least v1.0 to extract)
# >1 long 0x4b03040b (at least v1.1 to extract)
# >1 long 0x4b030414 (at least v2.0 to extract)
# [stupid "file" programs, little-endian]
# 0 long 0x04034b50 Zip archive
# >1 long 0x0904034b (at least v0.9 to extract)
# >1 long 0x0a04034b (at least v1.0 to extract)
# >1 long 0x0b04034b (at least v1.1 to extract)
# >1 long 0x1404034b (at least v2.0 to extract)
#
# >>>>> ZOO <<<<<
#
# [GRR: don't know if all of these versions exist, or if some missing...]
0 string ZOO Zoo archive
>4 string 1.00 (v%4s)
>4 string 1.10 (v%4s)
>4 string 1.20 (v%4s)
>4 string 1.30 (v%4s)
>4 string 1.40 (v%4s)
>4 string 1.50 (v%4s)
>4 string 1.60 (v%4s)
>4 string 1.70 (v%4s)
>4 string 1.71 (v%4s)
>4 string 2.00 (v%4s)
>4 string 2.01 (v%4s)
>4 string 2.10 (v%4s)
# [newer, smarter "file" programs]
>32 string \001\000 (modify: v1.0+)
>32 string \001\004 (modify: v1.4+)
>32 string \002\000 (modify: v2.0+)
>70 string \001\000 (extract: v1.0+)
>70 string \002\001 (extract: v2.1+)
# [stupid "file" programs, big-endian]
# >32 short 0x0100 (modify: v1.0+)
# >32 short 0x0104 (modify: v1.4+)
# >32 short 0x0200 (modify: v2.0+)
# >70 short 0x0100 (extract: v1.0+)
# >70 short 0x0201 (extract: v2.1+)
# [stupid "file" programs, little-endian]
# >32 short 0x0001 (modify: v1.0+)
# >32 short 0x0401 (modify: v1.4+)
# >32 short 0x0002 (modify: v2.0+)
# >70 short 0x0001 (extract: v1.0+)
# >70 short 0x0102 (extract: v2.1+)
# [GRR: the following are alternate identifiers]
#20 long 0xdca7c4fd Zoo archive
#20 long 0xc4fddca7 Zoo archive
#
# >>>>> GZIP <<<<<
#
# [newer, smarter "file" programs]
0 string \037\213 gzip'd file
# [stupid "file" programs, big-endian]
# 0 short 0x1f8b gzip'd file
# [stupid "file" programs, little-endian]
# 0 short 0x8b1f gzip'd file
#
# >>>>> COMPRESS <<<<<
#
# [newer, smarter "file" programs]
# [GRR: are the upper three bits (block size) ever different from 100?]
0 string \037\235 compress'd file
>2 string \211 (9 bits)
>2 string \212 (10 bits)
>2 string \213 (11 bits)
>2 string \214 (12 bits)
>2 string \215 (13 bits)
>2 string \216 (14 bits)
>2 string \217 (15 bits)
>2 string \220 (16 bits)
# [stupid "file" programs, big-endian]
# 0 short 0x1f9d compress'd file
# >1 short 0x9d89 (9 bits)
# >1 short 0x9d8a (10 bits)
# >1 short 0x9d8b (11 bits)
# >1 short 0x9d8c (12 bits)
# >1 short 0x9d8d (13 bits)
# >1 short 0x9d8e (14 bits)
# >1 short 0x9d8f (15 bits)
# >1 short 0x9d90 (16 bits)
# [stupid "file" programs, little-endian]
# 0 short 0x9d1f compress'd file
# >1 short 0x899d (9 bits)
# >1 short 0x8a9d (10 bits)
# >1 short 0x8b9d (11 bits)
# >1 short 0x8c9d (12 bits)
# >1 short 0x8d9d (13 bits)
# >1 short 0x8e9d (14 bits)
# >1 short 0x8f9d (15 bits)
# >1 short 0x909d (16 bits)
=============================================================================
Date: Sun, 16 May 93 15:56:23 CDT
From: Darrel R Hankerson <hankedr>
To: roe2@midway.uchicago.edu
Cc: darwin@cs.toronto.edu, ian@sq.com, j.adams@ucl.ac.uk
In-Reply-To: "Cave Newt"'s message of Sun, 16 May 93 14:39:15 CDT <9305161939.AA19475@midway.uchicago.edu>
Subject: magic entries for compressors/archivers
[[...]]
I have not heard from Mr Darwin for some time. Originally, he
suggested that he may be able to integrate the OS/2 changes into the
main sources. There is one obscure buglet in the OS/2 port (I managed
to change the nature of an existing buglet, rather than fix it--it's
an embarrassing programming mistake on my part...). There is also the
problem of some bugs in the OS/2 DosQ[uery]Apptype call.
=============================================================================
Date: Sun, 16 May 93 20:31:20 CDT
From: Darrel R Hankerson <hankedr>
To: roe2@midway.uchicago.edu
In-Reply-To: "Cave Newt"'s message of Sun, 16 May 93 18:21:54 CDT <9305162321.AA26546@midway.uchicago.edu>
Subject: magic entries for compressors/archivers
>> There is one obscure buglet in the OS/2 port (I managed
>> to change the nature of an existing buglet, rather than fix it--it's
>> an embarrassing programming mistake on my part...). There is also the
>> problem of some bugs in the OS/2 DosQ[uery]Apptype call. I plan to
>
> I assume one of these is the "OS/2 executableDOS executable (EXE)"
> bug--I forgot to mention that. It occurs on various executables,
> possibly non-window-compatible ones (though I thought Phoenix was
> a PM program...).
Another silly programming mistake. This occurs if the application type is
not set in the header. The "obscure buglet" occurs only if using the "-c"
option, and then only with certain bad magicfile entries. The programming
error is elementary (and embarrassing).
There are more serious problems with the call. Eberhard Mattes reported
that it identifies the new Windows Emacs exe incorrectly.
As I posted on .apps, there is some strange bug which can be seen
on my machine with:
1. copy du.exe .
2. copy \autoexec.bat .\du.ini
Now do "file du.*" or "apptype du.*". The report can change on repeated runs.
A partial solution would be to check for the exe signature.
[[OS/2 2.1 may have solved some of the problems. DH 30-Jul-93]]
> Looking forward to the next version--this port was heaven-sent.
> I had even grabbed some ancient version of it from either the
> Berkeley Net2 or 386BSD sources, intending to port it myself.
I had been looking for "file" for some time. I started with a port of
Selke's Pascal "filetype" program. His handling of magicfile entries
is different, but I decided that compatibility with "file" on
magicfile entries was essential. Darwin's code appears to be very
well-done.
Additional magicfile entries (INF, .obj, etc.) are needed. The TeX-stuff
does not always work. I don't know if the "compress'd file unwinding"
is all that useful, but gzip-unwinding should be added. I had planned
to wait for Darwin to release 3.10, but I may try to do a bugfix this week.
--Darrel Hankerson hankedr@mail.auburn.edu or hank@ducvax.auburn.edu
=============================================================================
Date: Mon, 31 May 93 17:37:03 CDT
From: "Cave Newt" <roe2@midway.uchicago.edu>
To: hankedr@mail.auburn.edu
Subject: Re: Unix 'file' cmd?
Just a few more OS/2 entries for file/2 3.9b (or whatever :-) ).
I wrote a Unix program to give me lots of information about these
things, but I find that often I only want to know which format
a bitmap is, not the gory details of its size, palette, etc. I
haven't tested any but the bitmap entries...
# >>>>> BMP, etc. <<<<<
#
0 string BM bitmap
>14 byte 12 (OS/2 1.x format)
>14 byte 64 (OS/2 2.x format)
>14 byte 40 (Windows 3.x format)
0 string IC icon
0 string PI pointer
0 string CI color icon
0 string CP color pointer
0 string BA bitmap array
Greg
P.S. The >14 entries should really be strings of length four, but
at least Ultrix file(1) won't let you reference "\000" in
strings. The alternative would be longs, but that gets into
endianness.
=============================================================================
Date: Mon, 31 May 93 18:16:30 CDT
From: "Cave Newt" <roe2@midway.uchicago.edu>
To: hankedr@mail.auburn.edu
Subject: Re: Unix 'file' cmd?
[[...]]
> Speaking of speed...the magic file should probably be trimmed. Perhaps
> I should include a sed script to strip the comments. (There are many entries
> which could be deleted on OS/2.)
Or you might simply include a much-truncated version in addition to
the full one--that way people can add anything from the full version
which they might find useful (for example, VMS .exe's are hard to
distinguish from MS-DOS ones by name alone...). Perhaps the short
one could just have OS/2-relevant executables/archivers/images?
> I've only given a little thought to some of the other problems. I have not
> yet looked into identifying inf files. There is some "ascii magic" code which
> could be modified to identify rexx code, if I could think of some intelligent
> way to do this.
Good luck. :-) Aside from searching for "say" commands and/or the
absence of curly braces, that could be very tricky...
=============================================================================
Date: Thu, 17 Jun 93 20:23:56 CDT
From: Darrel R Hankerson <hankedr>
To: jon@stekt.oulu.fi
Subject: file-like programs
In article <9306171643.FA22240@tacom-emh1.army.mil> jon@stekt.oulu.fi (Jouni Miettunen) writes:
>["file"-like programs; identify files based on contents]
Thanks for this summary. I've been working on an OS/2 and MSDOS port
of Darwin's "file" program. My main interest is OS/2, but there are also
MSDOS versions in the archive:
ftp.luth.se:pub/os2/all/unix/unixutils/file39.zip
This version has some "ascii magic" code and the (fairly) standard unix
"magic" file. Under OS/2, there is additional code to determine the
application type (DOS, Windows, 32-bit, DLL, WINDOWCOMPAT, etc.).
You mention:
>Name: FileType
>Version: 1.1
>Date: 19 August 1991
>Description: identify files
>IdType: external magic (about 105)
>Where: garbo:\pc\fileutil simtel:filutl
>Archive: filtyp11.zip (21394k)
>Status: freeware
>Author: Gilbert W. Selke (s00100@dbnrhrz1.bitnet)
>Comments: pascal source included
In fact, my first "file" program was a port of Selke's version to C. However,
Selke uses a magic file which is not compatible with the ones found on unix
(it does have some useful ways to make entries), and I believe that
compatibility with the UNIX magicfile format is essential (note, however,
that the magic-file format is not the same on every machine, but most
entries I've seen on the net are of the type handled by Darwin's version).
If anyone would like to continue work on Selke's version, you are welcomed
to my C version. I was not able to reach Selke on this (no reply), and
I do think that Darwin's version may be better if you are getting
magicfile entries from the net.
I had planned to announce this port after I fix a silly OS/2 buglet, and
I would like to add more OS/2 (and MSDOS) magicfile entries. Comments
are welcomed. Thanks to Greg Roelofs for the bug reports and contributions
to the magic file.
--
--Darrel Hankerson hankedr@mail.auburn.edu or hank@ducvax.auburn.edu
=============================================================================
Date: Fri, 18 Jun 93 16:48:37 CDT
From: Darrel R Hankerson <hankedr>
To: jon@stekt.oulu.fi
In-Reply-To: Jouni Miettunen's message of Fri, 18 Jun 93 19:21:48 +0300 <9306181621.AA16706@stekt.oulu.fi>
Subject: file-like programs
> Thanx! I'll add this to the list, however I'm not familiar with any
> Darwin version. Where I could find this? Is this The Original?
I don't know if it is "The Original" (the man page suggests that there is
no such thing :), but it appears to be in wide use. From the man page:
DESCRIPTION
File tests each argument in an attempt to classify it. There are three
sets of tests, performed in this order: filesystem tests, magic number
tests, and language tests. The first test that succeeds causes the file
type to be printed.
You can obtain the original author's latest version by anonymous FTP on
ftp.cs.toronto.edu in the directory /pub/darwin/file.
Darwin's version will also "unwind" compress'd files to examine the contents;
e.g., the compressed file autoexec.bat.Z will show up as
[c:\tmp]file -z autoexec.bat.Z
autoexec.bat.Z: ascii text (block compressed data - 16 bits)
The "-z" option tells file to unwind if the file has been compress'd.
I don't know if this feature is really needed, but it is interesting. gzip
capability needs to be added.
> Well, my favorite is Ricki's File and it uses a kind of extended unix
> magicfile format. I collected all different unix magic files I could
> find (about a half a dozen) and emailed them to him and he said it was
> still compatible. I believe he still preserved the big/ little indian
> compatible data format, but someone would have to actually try it in
> such machines.
I briefly examined the magic file, and it appears to have approximately
the same flexibility as Darwin's version. Darwin's version does not have
the "#@" pseudo-comment. The names of some of the data types are slightly
different.
> [Selke's FileType]
> Well, I could make an entry for it and add a note that source is
> available. Do I put into it that source is available from you or from
> me? I'd prefer it'd be you, but it's not a big deal. Some ftp place
> maybe?? Unfortunately I have none.
Perhaps with file10 and Darwin's file39, there is no need to bother with
an update to Selke's version. I could upload a new FileType archive to simtel,
with both the original and the new versions, but I'd like Selke's permission.
> Maybe you could let me know, too, when you make that public.. about
> magic files, well, I did email my collection to Ricki (Richard Breuer)
> and it's all in there in an external ascii
> unix-more-or-less-compatible format. No OS/2 as far as I know, and
> actually I do know. Maybe you could send them to me and I could send
> you the whole unix collection :-) I'll forward them to Ricki. Let me
> know and I'll try to find them.
I will be examining the file10 magic file when I do the update to Darwin's
version (I need to fix a small OS/2 buglet). Darwin's collection may be of
interest to you. We should try to compare your unix collection with Darwin's.
My port on ftp.luth.se is "public"; I did not announce earlier since I wanted
to wait for any bug reports.
> My version is entirely based on DOS and DOS filenameextensions.
> Almost 2000 entries now.. should fix the #$ redblack library.
In the OS/2 and MSDOS world, a combination of your method
and the "file" method could be more effective than just a "file" approach.
Darwin's version does have some "ascii magic" which tries to identify stuff
like fortran source code by looking through the first part of the file for
certain keywords.
Thanks again for your summary. I did not know of some of the versions
mentioned.
--Darrel Hankerson hankedr@mail.auburn.edu
=============================================================================
Date: Sat, 19 Jun 93 20:35:37 +0300
From: Jouni Miettunen <jon@stekt.oulu.fi>
To: hankedr@mail.auburn.edu
In-Reply-To: <9306182148.AA16798@ducserv.duc.auburn.edu> (message from Darrel R Hankerson on Fri, 18 Jun 93 16:48:37 CDT)
Subject: Re: file-like programs
[[...]]
However file39.zip agreed to unpack itself. Most Impressive! I took
just a small preview into all the files, but I noticed that you are
more serious about this than I am. I'll pack the magic and email it on
Monday.. I noticed file 3.9/10 is part of Linux and BSD (?), but
how about GNU? Are they interested?
>The "-z" option tells file to unwind if the file has been compress'd.
>I don't know if this feature is really needed, but it is interesting. gzip
>capability needs to be added.
Well, IMO it's same as unarchiving a .zip file and identifying its
contents. An interesting idea considering the dozens of different
archivers, but might be a bit too far from the original purpose.
>I briefly examined the magic file, and it appears to have approximately
>the same flexibility as Darwin's version. Darwin's version does not have
>the "#@" pseudo-comment. The names of some of the data types are slightly
>different.
There were a couple nice additions to the normal unix format. First
the keyword, with which you could check only text, graphics
etc. related magic and then a pointer-like operator, which could
either check or copy on-screen the following bytes. This was because
of self-extracting archivers and their all too many identification
strings. Maybe you could contact Ricki and try to create a more common
and portable magic file format.. Last time I heard about him, he was
quite busy with his new job, but seems to check email every now and
then.
>> [Selke's FileType]
>Perhaps with file10 and Darwin's file39, there is no need to bother with
>an update to Selke's version. I could upload a new FileType archive to simtel,
>with both the original and the new versions, but I'd like Selke's permission.
Well, it wouldn't be the first update made by others than original
author. As-is Selke's FileType is outdated and if you do have a better
version ready and if you include info about the history of that
program, then I think you should upload it. It seems like Selke is not
connected to net anymore. With a new version you'd keep his memory and
work alive :) I would like that, but unfortunately I haven't released
any source to net.
>I will be examining the file10 magic file when I do the update to Darwin's
>version (I need to fix a small OS/2 buglet). Darwin's collection may be of
>interest to you. We should try to compare your unix collection with Darwin's.
>My port on ftp.luth.se is "public"; I did not announce earlier since I wanted
>to wait for any bug reports.
[[...]]
>> My version is entirely based on DOS and DOS filenameextensions.
>> Almost 2000 entries now.. should fix the #$ redblack library.
>
>In the OS/2 and MSDOS world, a combination of your method
>and the "file" method could be more effective than just a "file" approach.
Yes, that's the reason I originally dumped everything to Ricki. He
already had a better program I could hope to make and desided to
concentrate into the other version. Besided there really wasn't enough
magic around to make the program useful. Ricki made a good job. IMHO.
>Darwin's version does have some "ascii magic" which tries to identify stuff
>like fortran source code by looking through the first part of the file for
>certain keywords.
This is what the What program does for text files. I once checked a
couple books about various text file formats. Not enough to make a
useful program and there already was a very good tool available.
--jouni
============================================================================
Date: Tue, 22 Jun 93 20:02:04 +0300
From: Jouni Miettunen <jon@stekt.oulu.fi>
To: hankedr@mail.auburn.edu
Subject: about magic collection + new AIX370 + comments
I just sent all magic files I had and at the end of this I'll enclose
a new version from AIX370. Alliant (Concentrix 5.7.00) didn't seem to
have any external magic file and when I checked the executable itself,
it indeed had build-in data, including the ascii magic as in file39.
I'll check if VM/XA has any file utility, too, but I believe I won't
be able to locate it. I never could get hold of that 2-level deep
directory structure..
The magic collection has lots of redundant data, but since I prefer
checking original documents, I sent them as-is.. including the various
copyright notices, which you can't find in the news articles. I hope
these won't cause any trouble. I'm not even sure, if such magic number
information can be copyrighted at all..
The Ascii Magic (AM), which I first saw in file39's code and then in
Alliant file's code, is a great idea. I was thinking about adding such
thing into filex as a next improvement (filex is my file extension
idenfier) and after that a thing I call Environmental Magic (EM)..
When you have tried all the conventional ways eg. looking for magic
numbers, trying file extension and with text files scanning the
contents (AM) and still haven't been able to identify it, you could
try checking the environment of file:
- what type are the other files same directory
. unknown is probably same/same as majority (if such is clear)
- what is the name of the directory/path
. some keywords there? Doc - music - pict
- what executables are in the directory
. data files usually belong to executable in same file
Only the identification of the other files in same directory seem to
be useful operation, eg. a couple is identified as printer drivers by
magic numbers and the rest have a same file extension, then all are
propably printer drivers. If the absolute path includes keyword "win"
they are propably printer drivers for Windows.. Better than
"Unknown file"..
The Ascii Magic part should be enhanced to be useful in OS/2 and DOS
environments. First there are too few of them (too many slow the
overall operation, on the other hand I believe AM is the last change
anyway, not a supporting identification way) and secondly they really
are too Unix orientated. Eg. I doubt one could find any assembly text
(looks like Motorola 68000, but I really don't know. Except it isn't
for 80x86 processor) and it should be changed to identify something
that really could exist, in this case TASM or MASM compatible assembly
for the used processor. Also shell script should be changed to DOS
batch files and also add OS/2 .cmd files and REXX code. The
programming languages should be changed to.. (what was there? Fortran,
APL and C?) C/C++, Pascal etc.
Even though it's good that you ported Darwin's file to OS/2, it's not
enough, since it's way too much Unix orientated. Just take a look at
magic files, full of various Unix executables, shells (btw bash
(Bourne Again SHell) was missing), etc., totally useless data for these
operating systems.. and it's missing something that would be
valuable for OS/2 and DOS:
- executables
. Windows, OS/2, 32-bit, maybe even GEOS etc. executables
. compressed executables (about half a dozen different)
. self-extracting file archives
a dozen or two different archivers, each have a few diff. versions
- graphics files
. a couple dozen different formats, pretty well covered
- text formats
. a couple dozen different, another useful class
. I know two old books, don't cover any modern text processors
- file archivers
. almost all covered
. the 4 most usual Mac archivers should be added
Now I see only two ways: either you branch file39 to a separate file
remotely connected to Darwin's file or try to make changes to that
original one.
Well, this was some of my opinions about file39. It's a good unix util
port, but could use some changes or additions to make it support more
target OS, to be even more useful. Enhancing the Ascii Magic part
would give most with least work.
--jouni
=============================================================================
Some misc info I collected.. first offset from the beginning of the file
and then either a character or a hex number inside parenthesis. Only the
self-extracting archive stuff is interesting,
--jouni
zip (4 chars):
0:PK(03)(04)
lzh (5 chars, 3rd tells compression type 1-5):
3:-lh(??)-
zoo (text string):
0:ZOO 2.00 Archive
0:ZOO 2.10 Archive
arc:
0:(01)(08)
lharc self-extracting package:
24:LHA's SFX 2.12S (c) Yoshi, 1991 (.exe file)
24:LHA's SFX 2.13L (c) Yoshi, 1991 (.exe file)
6:SFX of LHarc 1.00 (c)Yoshi, 1989 (.com file)
zip self-extracting package:
1e:Copyright 1989-1990 PKWARE Inc. All Rights Reserved.
arj self-extracting package:
1c:RJSX(ff)(ff)
lzexe compressed executable:
19:(00)(00)(00)LZ91
pklite compressed executable -offset 1e = .exe -offset 2e/30 = .com:
1e:PKLITE Copr. 1990-92 PKWARE Inc. All Rights Reserved(07)(00)(00)(00)
diet compressed executable:
1c:diet(f9)(9c)
Microsoft executable:
0:MZ
Norton Guides database:
0:NG(00)(01)(00)(00)(xx)(00)
8 to 2f info header
30 to 179 long info
HPACK file archive:
0:HPAK
=============================================================================
From: ddissett@netcom.com (Daniel Dissett)
Subject: Re: Fetch Title of an .INF
The title of an .INF file is located 108 bytes into the file and can be
up to 64 bytes long. The title is null terminated. I use the following
REXX procedure to strip the title out of .INF files:
GetInfTitle:
procedure
parse arg infFile
data = charin( infFile, 108, 64 )
call stream infFile, 'c', 'close'
if data = '' then return ''
title = left( data, pos( d2c( 0 ), data ) - 1 )
return title
[[...]]
I did some quick checking and the ten or so .INF files I looked at
start with the following sequence:
48 53 50 01 9b 00 02 00
[[See inf02a.doc]]
=============================================================================
Date: Sun, 1 Aug 93 10:27:59 CDT
From: Darrel R Hankerson <hankedr>
To: marcusg@ph-cip.uni-koeln.de, chauser.parc@xerox.com
Subject: inf02a.doc
Thanks for inf01 and inf02! This arrived just in time for an update to
the "file" program. I see that you have
int16 ID; // ID magic word (5348h = "HS")
int8 unknown1; // unknown purpose, could be third letter of ID
int8 flags; // probably a flag word...
// bit 0: set if INF style file
// bit 4: set if HLP style file
According to limited testing by three people, INF files start with
HSP\x01\x9b\x00
and HLP files start with
HSP\x10\x9b\x00
Of course, this agrees with the "bit 0, bit 4" and other info in your docs.
I'm using the above as magic-file entries, hoping that it (almost)
positively identifies the files (I'm hoping it works and is better than
just checking "HS").
=============================================================================
Date: Fri, 30 Jul 93 16:01:45 CDT
From: "Cave Newt" <roe2@midway.uchicago.edu>
To: hankedr@mail.auburn.edu
Subject: Re: file 3.9a
[[...]]
> 1. You have "0 ZOO ..." for zoo archives, with alternate choice
> at offset 20. I was planning to continue to use your stuff listed at
> offset 20, but I find that it is really at 0x2a. (I need to check if this
> is DOS/OS2-specific.)
Oops...you're quite correct. I probably got my hex/dec conversion
mixed up or something (0x20+10?). I've never used that identifier
myself; I think I found it in the source code somewhere.
> If you have any "unusual" inf files, perhaps you could test this. I've
> tested everyting on my 2.1 machine and prcp.inf, guiref.inf, the
> posted GNU texi-to-os2inf files, and saaref.inf. The inf01.doc
> lists these bytes as unknown.
Works OK on the following (which was all I could find):
4os2.inf: OS/2 INF (4OS2 1.11 Online Documentation)
CMDREF.INF: OS/2 INF (OS/2 Command Reference)
COMMON.INF: OS/2 INF (Common/2 Subroutine Library version 1.0.1)
EDMI1.INF: OS/2 INF (EDM/2 - Issue #1 - March 1993)
emxfn.inf: OS/2 INF (EMX Library)
GG243730.INF: OS/2 INF (OS/2 V2.0 Volume 1: Control Program)
GG243731.INF: OS/2 INF (OS/2 V2.0 Volume 2: DOS and Windows Environment)
GG243732.INF: OS/2 INF (OS/2 V2.0 Volume 3: PM and Workplace Shell)
GG243774.INF: OS/2 INF (OS/2 V2.0 Volume 4: Application Development)
OS2TNT.INF: OS/2 INF (Tips and Techniques)
REXX.INF: OS/2 INF (OS/2 Procedures Language 2/REXX)
GUIREF20.INF: OS/2 INF (Control Program Reference)
IPFC20.INF: OS/2 INF (Information Presentation Facility)
IPFCEXMP.INF: OS/2 INF (Compiled Examples)
PMFUN.INF: OS/2 INF (PM Reference)
PMGPI.INF: OS/2 INF
PMHOK.INF: OS/2 INF
PMMSG.INF: OS/2 INF
PMREL.INF: OS/2 INF
PMWIN.INF: OS/2 INF
PMWKP.INF: OS/2 INF
prcp.inf: OS/2 INF (OS/2 Programming Reference)
REXXAPI.INF: OS/2 INF (REXX Program Reference)
SOM.INF: OS/2 INF (System Object Model (SOM) Reference)
TOOLINFO.INF: OS/2 INF (Tools Reference)
DDE3WF.INF: OS/2 INF (IBM WorkFrame/2* Online Reference)
DDE4HELP.INF: OS/2 INF (IBM C Set/2 Online Reference)
=============================================================================
Date: Sun, 1 Aug 93 12:52:42 CDT
From: Darrel R Hankerson <hankedr>
To: roe2@midway.uchicago.edu
Subject: file 3.9a
I do not have many interesting files either. Much more work is needed
on the magic-file. Some magic entries match on too-limited
information. The "tentative-match" idea might help. The addition of
regular-expressions in the string may be a good idea for the next
version. Also, it might be nice to be able to make better use of those
text fields which are preceded by a length byte/int.
Note that both PKZIP and LH2 are capable of making self-extracting
executables under OS/2. Also, it just occurred to me that checking
the magic file is unlikely to be helpful, because the signatures are
buried in the middle of the executable. (Certainly the ZIP magic
entries will fail--they check the first four bytes.) I think it
really needs some sort of smart checking (perhaps figuring out where
the executable's data segment begins and looking at that?). Probably
not so easy...
I placed some additional entries in Magdir/ms-dos for these. However,
the current OS/2 beta only checks the magic-file if DosQueryAppType
fails or indicates DOS. In addition, the check for PKSFX is from
1.01; I do not know how to check the version used on
ftp-os2.cdrom.com:pub/os2/all/archiver/unz50x16.exe
This is one of those things that I haven't decided how to
handle. There are arguments for always doing both the AppType-call and
the magic-check. I wanted to minimize the number of switches, and
keep the output clean on executables. On the other hand,
DosQueryAppType is not without its problems.
On the magic-file: I see that some of your zoo entries are of form
>4 string 1.00 (v%4s)
Apparently, your mean to write "(v%.4s)". In any case, I've used
0x2a string \xdc\xa7\xc4\xfd
>0 string >0 %.16s
for zoo files. Perhaps you can scan the magic-file entries which you
have contributed, and see if I have these correct (both in form and
credits).
=============================================================================
Date: Sun, 1 Aug 93 17:03:02 CDT
From: Darrel R Hankerson <hankedr>
To: jon@stekt.oulu.fi
In-Reply-To: <9307301022.AA06564@stekt.oulu.fi> (message from Jouni Miettunen on Fri, 30 Jul 93 13:22:15 +0300)
Subject: Re: new versions of programs
[[...]]
And while we are talking about mysterios errors, I experienced some
w/file39:
Eg. in garbo.uwasa.fi:/pc/arcers you can find unz50p1.exe and
zoo210.exe, which are self-extracting file archivers. Your file
identifies them as "Linux/i386 not stripped".
The magic-file needs more work. Currently, the file "magic" is created
by cat'ing, in alphabetical order, all the stuff in ./Magdir/. This
means that the "linux" stuff comes before the "ms-dos" stuff. (The OS/2
version uses a DosQueryApptype call.)
Another weird problem I found w/Tristan (a flipper game) configuration
file.. it's a plain ASCII file:
Video = 1
Again, this shows problems with the magic-file. Someone has used a
1-byte check
0 byte 0126 ps database
which is probably a bad idea.
OS/2 info and I was wondering, if you would know this.. what are the
magic numbers for OS/2 executables in the New EXE Header (offset 0ch)?
At the end of this note is the new ms-dos (and OS/2) exe-magic entries
that I've been testing (a number of these are from you). Roelofs gives
the one for OS/2 and Windows exe. Note that under OS/2, the
DosQueryAppType call is used.
I'm planning to upload the bugfixes soon. I was hoping to hear from
Darwin on 3.10. You mentioned a number of good ideas in earlier notes.
From your notes and the work on this port, I see the following as possible
future enhancements to file:
1. regular expressions in the string
2. the "pseudo-comment" idea and the keyword idea
3. "tentative match" requiring a match in some following line
4. additions from the filex approach
5. capability to use, for example, the length-byte which precedes
a string.
[[...]]
One last thing: I plan to include edited versions of some of your
notes and suggestions, provided that you have no objections. I can
forward a copy if you wish to review.
=============================================================================
Date: Tue, 3 Aug 93 16:51:34 +0300
From: Jouni Miettunen <jon@stekt.oulu.fi>
To: hankedr@mail.auburn.edu
In-Reply-To: <9308012203.AA27657@ducserv.duc.auburn.edu> (message from Darrel R Hankerson on Sun, 1 Aug 93 17:03:02 CDT)
Subject: Re: new versions of programs
[[...]]
>The magic-file needs more work. Currently, the file "magic" is created
>by cat'ing, in alphabetical order, all the stuff in ./Magdir/. This
...
>Again, this shows problems with the magic-file. Someone has used a
Have you read the latest (it's July, the graphics magazine) Dr. Dobb's
Journal? There is an interesting article about fuzzy logic by Abrasm
(sp) and it seems to express more clearly, what I had in my mind.
In case of magic numbers one should check _all_ magic. If there's only
one identification, that'll be shown. If there are more, one has to
recheck all found identifications in another way, eg. if one is for
unix and another for dos, then we'll check the used operating system
and show the one labeled to that OS, since it'll be much more likely
the correct one. The used OS can be hardcoded while compiling the
program. Besides using labels in magic file (Ricki's idea), in dos and
OS/2 one can check the filename extension, in which case extensions
and magic numbers should be connected somehow. The idea behind File is
that files can have "wrong" name and so extensions could be used only
to select one from a couple possible id's. A major problem w/Filex..
W/Tristan file there'd be Linux and ASCII identification and since OS
was DOS, the ASCII would have been the correct id. BTW there's same
kind of trouble w/device drivers in \dos directory.. One reason to
write Filex was to get quantity. I'll rather see even false id's than
almost all files as unidentified. With File39 you have critical mass,
too, to get people to use it, but now you need quality eg. correct
identifications and so do I. Here we go Fuzzy :) Too bad I haven't
found any books about this, it seems this could use some nice data
structure..
Another way to deside wanted identification is to use some preknown
keywords in the output strings, that's what I'll do w/filex. To find
out the keyword one could use 1) directory name (dbase, graphics) 2)
part of path (windows) 3) external configuration file (bc3 - Borland
C++ 3.1) 4) one given on command line.
[[...]]
> 4. additions from the filex approach
Only if one can combine magic and extensions. You got 900 magic
numbers and I got 1500 extensions :) And the file can always have a
wrong extension..
[[...]]
Old EXE header can be identified by MZ (or ZM) as the first chars of
file. If some offset has value 0x40 or more (the size of header?),
then there is a new EXE header starting from place mentioned in some
other offset, this might be the second line above.. The new EXE header
starts with chars NE (as in New Executable. MZ was initials of the
designer programmer) and there in some offset is the id byte. Value
0x2 is for Windows exe, MS Windows SDK manuals said only that values
1, 3 and 4 are "reserved", so one might be for OS/2, but which one?
In another offset there's mentioned the needed Windows version number,
if program indeed is a win app.
=============================================================================
Date: Tue, 3 Aug 93 11:01:00 CDT
From: Darrel R Hankerson <hankedr>
To: roe2@midway.uchicago.edu
Subject: notes on file39a
In summary, I think extensions to the magic-file entries are needed
(some ideas appear in the notes). I'm hoping that Darwin decides
to coordinate a new version. I'd like to add
* match on first line only if some ">" line matches
* "confidence" pseudo-comment, indicating the chances that
a match does imply the type of file. A switch could then
indicate what level of confidence is desired.
* ability to use, for example, a length-byte which precedes a string
* regular expressions
* a new format ("%S" ?) which indicates "print while isprint()"
I plan to upload 3.9a. This has been an interesting project, but
I doubt that I will do 3.10 on my own.
=============================================================================
Date: Wed, 4 Aug 93 14:30:11 CDT
From: "Cave Newt" <roe2@midway.uchicago.edu>
To: hankedr@mail.auburn.edu
Subject: Re: notes on file39a
> I am considering including some of your email notes in the dist
> of file39a.zip. These would be useful to someone who decides to
> do an enhancement to the program. I include a copy. I would need
> your permission for this.
Your second message about them came just as I was reading through
them. You certainly have my permission, and I even think it would
be interesting to include them--they contain pointers to other ports,
questions about possible errors, etc. I would go ahead with your
original plan.
Btw, the notes say something about the copy du.exe problem being
fixed in 2.1; I'm still running 2.0(+SP), and everything works for
me now, too. Recompiling must have helped somehow (or maybe there
was a pointer bug in the wildcard/readdir code?).
> * regular expressions
Ooo, most especially this one. (All of your ideas were good, but
this is the one I've most missed, along with indirect offsets.)
> Did you find any problems with the beta that I sent?
A few oddities, but mostly problems you already know about, I think.
Here's a list:
> pbrush.ico: bitmap array
I have no idea why icon files are given the type "bitmap array"
(by IBM, I mean), but I double-checked for myself--that's the
correct ID.
> desktop.cmd: ascii text
> nmack.bat: English text
As you noted, the ascii magic needs some work. In these cases
very short batch files), it would be difficult to make a positive
ID, but ascii magic might check the file extension under OS/2 and
MS-DOS (or VMS, for that matter).
> spin.pmg: c program text
> test.pmg: c program text
These are actually weird data files, but it would be hard to check
that.
> cpumeter.exe: PM executable [WINDOWAPI]
Would it be better to label these as 16-bit PM exe's? Or possibly
even 16-bit OS/2 PM exe's? "PM" might not mean a lot to clueless
MS-DOS types...
> diskmap.exe: OS/2 executable
Ditto (16-bit OS/2 exe?)
> esctilde.dcp: data
I'm sure code pages must have some identifying feature...
> phoenix.exe: OS/2 executable
This is actually a PM program--I don't understand why it doesn't
show up as such.
> pushd.cmd: c program text
A REXX script--another ascii magic problem which you've already
mentioned.
> pbrush.hlp: data
> wizunzip.hlp: data
These are both Windows-format HLP files.
> syscols.ini: ispell hash file and 20 string characters
> Mine.INI: ispell hash file and 20 string characters
> klondike.ini: ispell hash file and 20 string characters
> neko.ini: ispell hash file and 20 string characters
More ascii magic work needed...
> wizunzip.wav: data
> boom1a.snd: data
I think these are standard sound formats; I even seem to recall
seeing some magic entries posted a while back, but I don't know
if I saved them.
> menu.ini: DOS executable, magic(4)-> ascii text
Oops...(goes with a DOS executable, though).
> p1.pal: ascii text (with escape sequences)
> power.pal: ascii text (with escape sequences)
Some sort of palette file for a DOS game, I think (not a standard,
but interesting to see what file comes up with).
> tetris.scr: 8086 relocatable (Microsoft)
> TrashMan.SCO: VAX-order2 68k Blit mpx/mux executable
Score files don't seem to fare too well with magic.
> WOLF3D.EXE: DOS executable, magic(4)-> DOS executable (lzexe compressed)
Impressive.
> DDE4MODL.DLL: DLL
> DDE4NBS.DLL: 32-bit DLL
> DDE4RESS.DLL: DLL
> DDE4SBM.DLL: 32-bit DLL
I'm surprised that these aren't all 32-bit, but assuming the ID
is correct, I'd make the same comment as above: insert "16-bit"
first?
=============================================================================
Date: Wed, 4 Aug 93 21:23:35 CDT
From: Darrel R Hankerson <hankedr>
To: roe2@midway.uchicago.edu
In-Reply-To: <9308041930.AA09245@midway.uchicago.edu> (roe2@midway.uchicago.edu)
Subject: Re: notes on file39a
Btw, the notes say something about the copy du.exe problem being
fixed in 2.1; I'm still running 2.0(+SP), and everything works for
me now, too. Recompiling must have helped somehow (or maybe there
was a pointer bug in the wildcard/readdir code?).
I think it is a bug in 2.0. I can not repeat the bug in 2.1, and your
note below about "menu.ini" indicates that you still see the bug.
very short batch files), it would be difficult to make a positive
ID, but ascii magic might check the file extension under OS/2 and
MS-DOS (or VMS, for that matter).
This is one of the additions that Jouni Miettunen suggests. Some
combination of "isascii" and the file-extension might be ok.
Sometimes the input will be from a pipe, however.
> cpumeter.exe: PM executable [WINDOWAPI]
Would it be better to label these as 16-bit PM exe's? Or possibly
even 16-bit OS/2 PM exe's? "PM" might not mean a lot to clueless
MS-DOS types...
> diskmap.exe: OS/2 executable
Ditto (16-bit OS/2 exe?)
The idea (basically from Mattes' apptype) is to use DosQueryApptype to
report on the file. I simply print messages corresponding to the
flag-bits. However, this call has some problems (it cannot properly
identify all executables). In addition, there is no "16-bit" flag,
and it may be fine to assume 16-bit (I wish I knew...).
This call is only done on the OS/2 exe, so perhaps the "PM" message
is ok.
> phoenix.exe: OS/2 executable
This is actually a PM program--I don't understand why it doesn't
show up as such.
The flag is not set in the header (try markexe).
> pbrush.hlp: data
> wizunzip.hlp: data
These are both Windows-format HLP files.
I've made a guess on Windows HLP files. Know any good info on this?
0 string ?_\3\0 Windows HLP
> syscols.ini: ispell hash file and 20 string characters
> Mine.INI: ispell hash file and 20 string characters
> klondike.ini: ispell hash file and 20 string characters
> neko.ini: ispell hash file and 20 string characters
[[fixed both ispell and INI]]
> wizunzip.wav: data
> boom1a.snd: data
I think these are standard sound formats; I even seem to recall
seeing some magic entries posted a while back, but I don't know
if I saved them.
I've made some guesses:
0 string RIFF
>8 string AVI movie
>8 string WAVE sound
0 string MThd\0 MIDI sound
> menu.ini: DOS executable, magic(4)-> ascii text
Oops...(goes with a DOS executable, though).
I think this illustrates the DosQueryAppType bug under 2.0.
> DDE4MODL.DLL: DLL
> DDE4NBS.DLL: 32-bit DLL
> DDE4RESS.DLL: DLL
> DDE4SBM.DLL: 32-bit DLL
I'm surprised that these aren't all 32-bit, but assuming the ID
is correct, I'd make the same comment as above: insert "16-bit"
first?
I suppose these could have the corresponding problem to that
illustrated by phoenix.exe. Since I'm not certain about this, I think
I'll leave the code as-is (just report which bits are set). You
are probably correct on the 16-bit thing.
=============================================================================
Date: Thu, 5 Aug 93 08:27:51 CDT
From: Darrel R Hankerson <hankedr>
To: roe2@midway.uchicago.edu
In-Reply-To: <9308041930.AA09245@midway.uchicago.edu> (roe2@midway.uchicago.edu)
Subject: Re: notes on file39a
Just a few more notes...
this is the one I've most missed [reg exp], along with indirect offsets.)
There is some indirect-offset capability discussed in magic(4). I agree
on the reg exp. If I recall correctly, Darwin did mention the idea (but
this was long ago, and I don't see any such note in notes.os2).
> cpumeter.exe: PM executable [WINDOWAPI]
Would it be better to label these as 16-bit PM exe's? Or possibly
even 16-bit OS/2 PM exe's? "PM" might not mean a lot to clueless
MS-DOS types...
I've thought a little more on this. The msg "PM executable" implies
"WINDOWAPI", so this is redundant (but perhaps it is consistent with
the other types).
> syscols.ini: ispell hash file and 20 string characters
I see that someone had hacked the ispell entry (must have caused
problems on some system). I think I've fixed this (for both ispell
and INI).
> boom1a.snd: data
I'm not familiar with this one. I do see
0 string .snd audio data:
>12 long 1 8-bit u-law,
>12 long 2 8-bit linear PCM,
...
============================================================================
Date: Thu, 5 Aug 93 10:13:53 CDT
From: "Cave Newt" <roe2@midway.uchicago.edu>
To: hankedr@mail.auburn.edu
Subject: Re: notes on file39a
> I've thought a little more on this. The msg "PM executable" implies
> "WINDOWAPI", so this is redundant (but perhaps it is consistent with
> the other types).
I think it's fine.
> I see that someone had hacked the ispell entry (must have caused
> problems on some system). I think I've fixed this (for both ispell
> and INI).
Great!
>> boom1a.snd: data
> I'm not familiar with this one. I do see
> 0 string .snd audio data:
I should have looked more closely at it. I still can't tell if it's
a standard format or not, but the following entry works (these are
found in the Nick Macki beta, btw):
0 string Creative\ Voice\ File audio data (%.19s)
=============================================================================
Date: Thu, 5 Aug 1993 06:21:37 -0400
From: ian@sq.com
To: hankedr@mail.auburn.edu
Subject: Re: file 3.9 for OS/2
X-Sun-Charset: US-ASCII
Content-Length: 710
Date: Mon, 2 Aug 1993 08:45:08 -0400
From: Darrel R Hankerson <hankedr@mail.auburn.edu>
>I will soon be updating file 3.9 for OS/2 (the original port was
>completed several months ago). I have not received any mail
>from you since
>
> Date: Mon, 5 Apr 1993 10:51:49 -0400
Sorry, I've not done much with file in several months. I'm thinking of
turning it over to the FSF so they can bloat (er expand) it over
time. Or somebody else, I'm not sure yet. It's a bit of a job to
maintain all the variants and changes.
Will keep you posted. Thanks.
Ian
=============================================================================