home *** CD-ROM | disk | FTP | other *** search
-
-
-
- CookieTool V2.1
- ===============
-
- A team of programs to help you maintain your cookie database:
-
- "CookieTool" itself eliminates duplicate entries,
- sorts alphabetically if you wish.
- "CdbSplit" extracts parts to a seperate file,
- by keyword, by size, as groups of 'similar' cookies, or a fixed number.
-
-
-
- 0. Who needs it?
- ----------------
-
- These tools are intended for users of "Cookie", "IntuiCookie" (both
- available on Aminet, util/misc/), or generally for any plain text cookie
- database with entries separated by "%%" lines. They are nice for
- crunching your cookie collection by a few KByte, but also for splitting
- it into seperate files of e.g. poems, quotations, anecdotes and miscel-
- laneous.
-
- Note that "CookieTool" and "CdbSplit" know how to handle the database
- itself, but not the corresponding index file (also called 'hash file').
- That means you still need "cookhash" (which should be included with your
- cookie display program).
-
-
- 1. CookieTool command summary
- -----------------------------
-
- cookietool [options] <cookiefile> [logfile]
-
- The crunched cookie database will be WRITTEN BACK to the input file (quite
- different from cookietool V1.x behaviour). The deleted cookies will be
- written to <logfile>, if one is specified. (Thus one could restore the
- original database by appending the logfile to the cookiefile again.)
-
- options: meaning:
- -c case-sensitive comparisons (for both deleting and sorting)
- -d[0-3] how fussy about word delimiters?
- -d3: fussy, compare character by character
- -d2: ignore number and kind of spaces between words (DEFAULT)
- -d1: treat punctuation signs as spaces, too
- -d0: completely ignore punctuation signs and spaces
- -a delete cookies that are "abbreviations" of another, too
- -p passive, don't delete anything
- -s sort output
- -sl " , looking at the last line only \ intended to
- -sw " , looking at the last word only }- sort quotations
- -s<sep> " , starting at the last occurence / by source
- of <sep>, e.g. '-s--' or '-s...'
- -o overwrite the input file directly (no tempfile), risky!
- Use this *only* if your disk is so full that cookietool
- couldn't create its tempfile.
-
-
- 2. CdbSplit command summary
- ---------------------------
-
- cdbsplit [options] <cookiefile> <hitfile>
-
- The input file will always be OVERWRITTEN by a reduced version of the
- database, so that cookies are moved (not copied) to the hit file.
- An existing hit file will never be overwritten, but may be appended to.
-
- options: meaning:
- -c case-sensitive comparisons (for both keywords and groups)
- -d[0-3] how fussy about word delimiters? (see above for details)
- -f<n> copy only the first <n> cookies
- -F<n> copy all but the first <n> cookies
- -k<keywd> search for a keyword
- -K<keywd> avoid a keyword
- -l<l_min> accept only cookies with <l_min> lines or more
- -L<l_max> " " " " <l_max> lines or less
- -w<w_min> accept only cookies <w_min> chars wide or more
- -W<w_max> " " " <w_max> chars wide or less
- -m<n> find groups of cookies starting with <n> matching characters
- (database must have been sorted!)
- -a append, if <hitfile> exists (instead of failing)
-
-
- 3. Examples
- -----------
-
- These examples assume that your cookie database is in a single file
- called "cookies" (tacky name, hah :). Oh, and I'd suggest that you make
- a backup of your cookies somewhere before trying "cookietool" on them.
-
-
- 3.1. Do what "onecookie" used to do
- -----------------------------------
-
- The classic "onecookie" could only delete verbatim copies of a cookie,
- where even two spaces instead of one would make a difference. CookieTool
- can be told to behave like this, too:
-
- cookietool cookies -c -d3
-
- The default settings are a bit more generous:
-
- cookietool cookies
-
- might delete a few cookies more. Upper- and lowercase letters are now
- considered the same, and it doesn't matter if two words are seperated by
- one or several spaces, by a tab sign, by a line break, etc. So two
- copies of the same text, but formatted in different ways, will still be
- recognized as identical.
-
- The question is: do you really want such copies deleted automatically, or
- would you rather decide yourself which one of such *almost* identical
- cookies should be deleted? This question arises even more with the real
- liberal settings like
-
- cookietool cookies -d0
-
- which for example recognizes "Kill ugly radio. -- Frank Zappa" and
- "Kill ugly radio... Frank Zappa" as identical. (Both of these two styles
- of supplying sources to quotations are frequently used.) More on that
- question later.
-
-
- 3.2. Deleting abbreviations
- ---------------------------
-
- It occurs rather frequently that one cookie seems to be an "abbreviation"
- of another. Sayings may consist of more than one sentence, but the first
- sentence is sometimes quoted by itself. And quotations are sometimes
- written down with, sometimes without their author. In both cases the
- shorter cookie may be deleted, and cookietool can do that, too (-a).
-
- However, one should not ignore puctuation signs with this option (don't
- use -d1 or -d0), because that would consider "A penny saved is a penny."
- as an abbreviation of "A penny saved is a penny earned.", which is not
- desireable. It might be a good idea to create a log file of the deleted
- cookies and look at least at the shortest ones among them:
-
- cookietool cookies -a log
- cdbsplit log log2 -L1 -W50 ; extract the shortest cookies
- Ed log2 ; edit to leave only those cookies you want to put back
- cdbsplit log2 cookies -a ; put them back
- Delete log log2
-
- Using 'cdbsplit -a' without any search options is a nice way of moving
- cookies back into your main database. Personally, I usually prefer
- "Type log2 >>cookies", "Delete log2" to do this, but note that this is
- risky: If you accidentally type '>' instead of '>>', that would overwrite
- your main database instead of appending to it! Such a thing can't happen
- with cdbsplit.
-
-
- 3.3. Move cookies to and fro between files
- ------------------------------------------
-
- Let's say you want to keep cookies which are quotations in a seperate
- file. That's easy, they should be recognized by the "--" which precedes
- the source of the saying:
-
- cdbsplit cookies quotes -k--
-
- However, sometimes "--" is used in the middle of sentences, too. You
- might want to edit these occurences to single "-"'s, so you can put those
- cookies back which aren't really quotations. This is where the "avoid
- keyword" feature comes in handy. And I'd suggest to sort the quotes file
- by source first, it usually becomes easier to read:
-
- cookietool quotes -sl ; you might also try "-s--"
- Ed quotes
- cdbsplit quotes cookies -K-- -a
-
- Or another example: You're looking for a rather short keyword, that may
- appear as part of other words as well. Let's say you want to move all
- Bart Simpson quotes to a separate "simpsons" file. At first in a cautious
- way:
-
- cdbsplit cookies simpsons "-kBart " -d1 -c
-
- Note how -d1 will make "Bart!" but not "Barton" be identified as "Bart ".
- But as this keyword fails if "Bart" appears at the very end of a cookie,
- you still have to collect the rest:
-
- cdbsplit cookies simpsons -kBart -a
-
- Now look at the end of your "simpsons" file and check if anything went
- wrong in this second pass. In my case, I found a quotation by a guy named
- "Barth". Put it back:
-
- cdbsplit simpsons cookies -kBarth
-
-
- 3.4. Support for editing manually
- ---------------------------------
-
- CdbSplit can help you collect all cookies that need reformatting (because
- they are too wide) in an extra file, and put them back later:
-
- cdbsplit -w76 cookies wide
- Ed wide ; add some line breaks
- cdbsplit wide cookies
-
- Now this was easy. But cdbsplit can even help you to find groups of
- "similar" cookies! That's helpful to eliminate cookies that differ only
- by some typing error (e.g. 'seperate'/'separate'), something that
- cookietool will *never* handle automatically. To do this, you must sort
- your database first, then tell cdbsplit how many agreeing characters make
- "similar" cookies (I think 10 - 20 characters is usually a good choice):
-
- cookietool cookies -s -d0 -p
- cdbsplit cookies temp -d0 -m20
- Ed temp ; delete some manually
- cdbsplit temp cookies -a
-
- When editing the "temp" file, you should find groups of two or more
- cookies with identical beginnings. If you think they are really the same,
- you can delete all but one (!) of each group. This is a tedious work,
- I know, but it's far easier than just sorting the database and looking
- for similar cookies with your eyes only. :)
-
- Here's a more sophisticated procedure that will extract groups of cookies
- starting and ending with the same word (well, almost):
-
- cookietool cookies -s -d1 -p ; regular sorting first
- cookietool cookies -sw -d1 -p ; *then* sort by last word
- cdbsplit cookies temp -d1 -m3 ; yes, 3 matching characters will do!
- Ed temp
- cdbsplit temp cookies -a
-
- Applying -s-- instead of -sw in the second pass could help you find
- similar sayings that are attributed to the same person.
-
-
- 3.5. Joining "good" and "bad" cookie files
- ------------------------------------------
-
- Suppose you have a well maintained cookie database, without double
- entries, all the cookies are formatted the way you want them, and all the
- authors of quotations are written down in your preferred style. Now you
- find an archive with new cookies somewhere and you want to add them to
- your database, but you have reason to believe that this will introduce a
- lot of double entries. Here's how I would proceed.
-
- In the following, assume that your good cookies are in a file called
- "cookies", the new cookies are in a file called "visitors".
-
- First make sure there are no double entries left in your main file, at
- least none that cookietool can find:
-
- cookietool cookies -d0 log
-
- And look at the number of cookies that cookietool reported, suppose it's
- 4711, you'll need it later. B.t.w., normally this pass shouldn't delete
- anything, if your database is really in such good shape ;). (Don't worry
- if it did, those cookies are in the "log" file now, but if you want to put
- them back, please do that only after this procedure is complete!)
-
- Now append the "visitors" file, then delete all doubles from the new and
- larger "cookies" file:
-
- cdbsplit visitors cookies -a
- cookietool cookies -d0
-
- This will delete only new cookies (if any), because cookietool starts
- deleting from the end of the file. Of course, for this to work, it is
- essential that you assemble the files in this order (i.e. don't append
- "cookies" to "visitors")!
-
- Finally you might want to move the new cookies to their own file again.
- That's easy, tell cdbsplit to extract all but the 4711 first:
-
- cdbsplit cookies visitors -a -F4711
-
- Now you can look at "visitors" to see what you've got, edit and reformat
- where needed, and then finally join the two databases for good.
-
-
- 3.6. Extract all poems :)
- -------------------------
-
- You might do this by browsing through your database using a text editor
- and marking all poems by an extra "#P" or some other unique piece of text.
- Extracting the poems is very straightforward then:
-
- cdbsplit -k#p cookies poems
-
- Don't forget to edit "poems" once more to remove the "#P" marks, but this
- should be very easy using search/replace.
-
- Of course, such a method is very versatile and powerful, but mainly
- because it involves a lot of manpower :->. Fortunately, there is another
- solution for this problem: Would you agree that a poem is something that
- has at least four lines, but doesn't use the full line width?
- So let's try this:
-
- cdbsplit -l4 -W60 cookies poems
-
- You should check the contents of "poems" manually now, and maybe you will
- want to move some of the wider cookies back. Not a problem:
-
- cdbsplit poems cookies -w51
-
-
- 4. Background information
- -------------------------
-
- Just like "onecookie", "cookietool" has to load the complete database into
- memory first. (Tough luck for those with a 1 Meg Amiga and a 1.2 MB
- database :-). But unlike "onecookie" does, the cookies aren't compared
- each against all others (O(n*n) operation) but sorted first and then
- compared against their neighbours only (O(n*log n) operation). For a
- database of 1000 cookies, that's about 100 times faster!
-
- Overwriting input files is done by creating a tempfile and renaming it
- when all else is done. So breaking (or crashing :) the programs won't
- lead to data loss. Unless, of course, you use cookietool with the '-o'
- option, but I already warned you about that! (For those who absolutely
- need to know: Breaking cookietool while it is still reading data is
- safe, even with -o, because the output file won't be opened until after
- all deleting and sorting is done. But please, kids, don't try this at
- home! Or better still: Don't use -o at all.)
-
- Note that breaking "cdbsplit" while it is appending to another file is no
- good idea. All cookies that were already copied are then present in both
- files, and most likely the output file even ends with an incomplete
- cookie! The same can happen without your fault, if cdbsplit encounters a
- "Disk Full" error.
- In both cases, don't append any further data to this output file, or the
- first of the new cookies will be merged with that incomplete cookie, due
- to the missing %% separator! You might run "cookietool" once on the
- output file, that will ensure a valid file format again, and the
- incomplete cookie will be removed.
-
-
- 5. History
- ----------
-
- V1.0 -
- V1.3 forget them, they were all crap, too hard to use
-
- V2.0 no more reformatting of cookies, sorry for those who miss it :'(
-
- V2.1 fixed a bug that would unnecessarily lose data after "Disk Full"
- errors
-
-
- 6. The author
- -------------
-
- Wilhelm Nöker <wnoeker@t-online.de>
- Hertastr. 8, D-44388 Dortmund
-
- Drop me an eMail,
-
- - if you like these programs,
- - if you want to suggest some more features,
- - or if you know a good source for cookies (perhaps other than Aminet).
-
- But please *don't* mail me your cookies, at least unless I ask for it.
- I have to pay for my online time, and receiving a 2 Meg mail or such
- would really spoil my day. :-(
-
-
- 7. Credits
- ----------
-
- CookieTool and CdbSplit were written using EdWord Pro
- and the GNU C compiler (with libnix).
-
- Thanks to Christian Kemp (author of IntuiCookie :) for reporting the Big
- Bug in V2.0.
-
-