home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 35 Internet
/
35-Internet.zip
/
ytsg4.zip
/
ytsg.doc
< prev
next >
Wrap
Text File
|
1998-01-21
|
11KB
|
287 lines
Preramble:
Much of this program has changed from the original version:
- Now handles version .9x Yarn news files.
- Actual regular expressions have been added.
- Search capabilities have been greatly enhanced.
- HTML output has been added.
- Wildcards in the filenames should make it much easier to use.
- Output is now sorted by the date field in the header.
- Some parts have been sped up a lot (although I found a big
slowdown during testing when many input files must be sorted).
- Dos version has been dropped.
I am hoping that the HTML/cgi-bin capabilities will be useful. PLEASE
let me know if you would like me to pursue this!
- Rick Curry
Wed 98-01-21
YTSG version 4.01
ytsg - Convert Yarn files to SOUP offline reader format using grep
syntax to select messages.
Copyright (C) 1998, Richard Curry Consulting: trindflo@fishnet.net All
Rights Reserved
Parts of this software were copied from source code made available as
part of IBM's Developer's Connection. IBM copyright information has been
retained in the source.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
SUMMARY:
This file contains the code to translate/copy Yarn news files into
soup offline reader format. This was initially is intended to allow
cleanup of a corrupted Yarn newsbase.
Usage:
ytsg [-{BFHNPSW}[{\:!}]str] [-D{SE}date] [-{GL}n] [-{QUX}] [-Oh] File[s]
Flags:
-{BFHNPSW}:str -
Specify one or multiples of these switches to search the Body,
From, Header, Newsgroups, Path, Subject or Whole message for the
string 'str' and disqualify a message which fails to match any one
of the strings specified.
If none of these strings is specified, then all messages will
selected.
'str' may start with \, :, and/or ! (order is important).
\ causes case insensitive matching.
: causes str to be interpreted as a unix style regular expression.
If not using a unix regular expression, ! can be used to negate the
search (the search will match the absence of 'str').
Multiple search criteria are allowed, but only one per field (one
subject search, one body search, etc.)
An example of a search of the subject using regular expressions:
-s\:[[(][0-9]+/[0-9]+[)]]
-D{SE}: Dates (start/end) to include. You may specify one or both
to restrict the search. The format is rigid: YYYYMMDD e.g.
-DS:19970225. All 4 digits of the year must be specified. There
is one alternate form (suggested by Tim Middleton) of a number of
days previous to today: e.g. -DS-5 would restrict the search to
exclude all messages with an internal date stamp more than 5 days
earlier than today.
-Gn Messages containing more than 'n' bytes are disqualified.
-L Logging threshold. Errors and informational messages are logged
to the file 'ytsg.log'. A smaller number logs less information.
Any number can be specified, but a value greater than 600 does not
create any more logging (and produces a lot of text). When the log
level is 500 or more the index file is not deleted at the end of
the run and it remains in your %TMP% directory. The default is 300
and equates to warning level messages. 400=Informational,
500=Debug, and 600=Noise.
-Oh Output links in html to message location for later retrieval.
This will generate output which can be used from a WEB browser to
read messages. This feature allows you to create a custom Yarn
newsbase and share it over the WEB.
-Q Silence as much of the work messages as possible. Normally a dot
(.) prints for each message which matches. This options will
remove this output and some other "warm fuzzy" messages designed to
let the user know that work is actually being done.
-U Scan "unused" (deleted) messages as well. Only applies to
version .9x Yarn news files. Ytsg can not detect if a message has
been deleted in earlier versions of Yarn.
-X Show the hex offset of each selected message. Generally this is
useful for diagnostics.
Notes:
Switches are not case sensitive ( i.e. -DS: = -dS: ).
The default search is case sensitive: it is faster.
The date algorithm is fairly simple and when it gets confused it
allows the date.
Usually a space can separate the switch letter and it's associated
value.
A value is required for all switches which accept a value: if you
specify the switch, plug in the value.
A search of the 'Path' field will match either Path: or Return-Path:.
STDIN not allowed as an input source. This is to allow for ytsg to
operate as a cgi-bin.
File[s] can be a mixture of .8x or .9x Yarn files.
Wildcards may used in specifying File[s].
The ytsg command searches the specified input file(s) for news or mail
messages which match your specified pattern(s).
The ytsg command outputs message(s) which match your criteria to either
soup formatted files, or as HTML summary information. The patterns that
you specify must exist within the first 64K bytes of the article.
If you are outputting the information as soup, the string 'Newsgroups:'
is searched for. This line is searched for strings which already exist
in your newsrc file (these are newsgroups that you are already reading).
If no Newsgroups string is found, the message is formatted into email
format. If no string in your newsrc file matches the Newsgroups string,
then the message is identified as belonging to the first group named in
the Newsgroups line. This search will recognize and ignore the
'X-Newsgroups' line added by Yarn version .85.
In order to create newsgroups properly, ytsg needs to know where your
newsrc file is located. It uses the 'HOME' variable to find this file.
If this variable is not set and you are requesting soup output, ytsg
will complain and terminate.
In order to avoid corruption of existing soup files, if the file 'AREAS'
exists in the current directory ytsg will complain and terminate.
SETUP:
With thanks to several Yarn users who have BETA tested and suggested
improvements for this software.
You need to have your HOME environment variable set. The program uses
this to locate your newsrc file.
You need to have a newsrc file. It can be an empty file, but you should
probably use the one you had. I use the the newsrc file to group the
messages into groups that you probably want to see. If I don't know
which what groups you use, I just go with the first thing on the
Newsgroups line of the message. So you will end up seeing strange
newsgroups which you never subscribed to appear in Yarn because of
cross-posting. I don't use any of the numeric information in the newsrc
file, but I do pay attention to the ':' and '!' marks.
Certain errors will crash ytsg after the AREAS file has already been
created. As a safety mechanism, ytsg will always terminate if an AREAS
file already exists in the directory. If the AREAS file resulted from a
a crashed or terminated run, just delete the AREAS file.
Ytsg is now using a B+ index which it places in the directory specified
in your %TMP% environment variable (I think this is the current
directory if you do not set TMP). This file can get quite large (on the
order of 7% the size of the input files which are being analyzed). If
Ytsg crashes for any reason, you should make sure that files like *.CTN
in your TMP directory are deleted (Actually ytsg is not the only program
which might leave garbage in this directory!) Ytsg intentionally leaves
this file if you set the log level to 500 or greater with the '-l'
switch.
USES:
As stated earlier, this program was initially intended to allow cleanup
of a corrupted Yarn newsbase. All of the files in the %YARN%\news
directories are converted back into SOUP files and the whole lot is
imported into a fresh copy of Yarn. The 'Grep' or search capability was
later added and allows limited filtering of a newsbase: this has been
used to manipulate pseudo-newsgroups created with the "newgroup" command
for email filtering.
INBOX and folder files can also be processed by ytsg with a warning:
they will go back to whatever groups they originated from. This means a
message in a folder which originally was part of a newsgroup will be
imported into that newsgroup. A message without a Newsgroups: line will
be imported into email -- this includes email messages 'filtered' by
Yarn into pseudo-newsgroups even if the message contains a X-Newsgroups:
line.
EXAMPLES:
Full rebuild:
This software is fully tested and nothing can go wrong. So first we
make a copy...
SET YARN=H:\OLDYARN
SET HOME=H:\OLDYARN\RRC
Change to a drive/directory which is convenient to hold the output of
ytsg. The output will be about the same size as the files you are
reading. David Meade tells me that during testing of ytsg, he
experienced a noticeable improvement in speed by running ytsg on a
different physical drive than the one he was reading from.
Rem The following represents where the old style, .8x Yarn news files live.
ytsg %YARN%\news\*
Rem The following represents where the new style, .9x Yarn news file lives.
ytsg %YARN%\news.dat
SET YARN=H:\newYARN
SET HOME=H:\newYARN\RRC
clearall
makegps
import -u (pathname of the directory where you ran ytsg)
- Finis -
Refilter INBOX:
SET YARN=H:\YARN
SET HOME=H:\YARN\RRC
cd %HOME%\mail
copy INBOX INBOX.sav
ytsg INBOX
cd %YARN%
import -u %HOME%\mail
- Finis -
-Begin clearall.cmd (I think this works for version .9x) ----------------------
cd %YARN%
del history.*
del supersed.*
del active
del overview
del news
del news.dat
cd %HOME%\yarn
del readart.*
del newsrc
del scores\saved
-End -- Cut -------------------------------------------------------------------
-Begin makegps.cmd (only a small excerpt. mine has over 200 lines.) ----------
newgroup junk 14
newgroup list.yarn 9999 9999 yarn-list@lists.colorado.edu
newgroup rain.local.general 9999
newgroup rain.local.rain-l 9999
newgroup rain.local.help 9999
newgroup tri.politics 9999
newgroup tri.politics.sb 9999
-End -- Cut -------------------------------------------------------------------