home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Media Share 9
/
MEDIASHARE_09.ISO
/
dbase
/
dbdup11.zip
/
DBDUP.DOC
next >
Wrap
Text File
|
1991-11-22
|
76KB
|
1,789 lines
DB-Dupe 1.1
by BC Enterprises
Contents:
I. Introduction to DB-Dupe ...................... 2
I.A. Features ................................ 2
I.B. System Requirements ..................... 3
I.C. Speed and Compatibility ................. 3
I.D. Safety .................................. 4
I.E. Shareware ............................... 5
I.F. Disclaimer .............................. 5
II. Using DB-Dupe ............................... 6
II.A. Files Needed ........................... 6
II.B. Extended Video Support ................. 6
II.C. Help and Function Keys ................. 7
II.D. The Main Menu .......................... 8
II.E. The File Menu .......................... 8
II.E.1. Use(ing) a database ............... 9
II.E.2. Info .............................. 11
II.E.3. Setting Search Parameters ......... 11
II.E.4. Save .............................. 18
II.E.5. Load .............................. 18
II.E.6. Index ............................. 18
II.E.7. Recall ............................ 19
II.E.8. Return ............................ 19
II.F. The Search Menu ........................ 19
II.F.1. Delete 1 & Delete 2 ............... 19
II.F.2. Zoom .............................. 20
II.F.3. Memo .............................. 20
II.F.4. Undelete .......................... 21
II.F.5. Continue .......................... 21
II.F.6. Stop .............................. 21
II.F. The Goto Menu .......................... 21
II.G.1. Goto # ............................ 22
II.G.2. Top ............................... 22
II.G.3. Next and Previous ................. 22
II.G.4. Advance ........................... 22
II.G.5. Deleted ........................... 22
II.G.6. Return ............................ 23
II.H. Utils .................................. 23
II.J. Quit ................................... 23
III. Suggestions for Use ........................ 24
Appendix A - DLX Format ......................... 25
Appendix B - Errors ............................. 26
Appendix C - Using DB-Dupe with Windows ......... 27
Index ............................................28
Order Form ......................................
BC Enterprises
P.O. Box 18
Front Royal, VA 22630
(703) 636-9990 (Voice)
(703) 635-7528 (Modem and Fax)
1
I. Introduction to DB-Dupe
I.A. Features
The DB-Dupe program is a duplicate record deleter for
FoxBase+, FoxPro, DBase III Plus, DBase IV*, and compatible
files. If you have a database of more than a thousand or so
records, you know that you get duplicates and they are hard to
weed out. You could make a print out of your file and then go
through, record by record, trying to find duplicates. But that
would take a very long time, and not be very effective.
A better way to find duplicate records is to let your
computer do it. That's where DB-Dupe comes in. DB-Dupe is more
than just an easier way to delete duplicate records. It is a
whole environment dedicated to giving you all the tools you need
to make deleting duplicates simple, and even fun.
Most duplicate record deleters don't give you enough
information or enough ways of viewing your data. They don't let
you use format files, or browse through your data, or even start
checking at any record other than the first. DB-Dupe is not like
that at all. It is a real record deletion system. DB-Dupe is
loaded with features to help you. Some of these distinctive
features include:
- 43 or 50 line mode using EGA or VGA monitor
- Fully network compatible
- Compatible with wide number of database products
- Use of FoxBase+/FoxPro .IDX indexes, convert DBase
III/IV indexes to fast .DLX index format, and
create new .DLX indexes on the fly
- Use Dbase/FoxBase style format files, including support
for multiple page format files
- View attached memo files
- 3 search modes
- View two records in separate windows at one time
- Soundex searches to find records that sound alike
- Alphanumeric only searches
- Field swapping between records
- Deleted record recall function
- Save and load search-criteria files
- Deleted record reports
- Extensive on-line help
- User-defined color scheme
- Friendly user interface
- Fast operation
* Products mentioned, other than products of BC Enterprises, are
trademarks of their respective companies. Dbase III Plus and
DBase IV are trademarks of Ashton Tate (Borland?). FoxBase+,
FoxPro, and the Fox name are trademarks of Fox Software Inc.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 2 ***
I.B. System Requirements
To run DB-Dupe, you need an IBM or compatible computer
with at least 384K of RAM (or approximately 320K of free RAM).
The actual RAM use of the program depends in part on what
features you use. For example, using a database with a format
file and index requires more RAM than a database used with
neither of these "extras."
A hard drive is strongly recommended, but is not
absolutely required, although anyone who could benefit from this
program more than likely has a hard drive.
A disk-caching program is also strongly recommended. The
DB-Dupe program has been written with the assumption that those
running it will make use of a disk cache. The cache does not
need to be large, even 64K is enough, although a larger cache
will probably improve performance. If a disk cache is not used,
program speed will be SEVERELY reduced.
Disk cache programs are very easy to come by, if you
don't already have one. Virtually every program Microsoft sells,
including Windows and all their language programs, comes with a
disk cache called SMARTDRV.SYS. Many utility packages include
caches as well, and there are also numerous cache programs
available in the public domain or as shareware.
Putting some files on a RAM disk is also helpful, as
explained later on.
I.C. Fast Operation! Compatibility!
The DB-Dupe program is not written in one of the many
common DBase compatible database languages. It is written in
Microsoft BASIC 7. What does that mean to you?
It means that the program runs FAST! Database languages
are good for writing large database applications, but they run
notoriously slowly. Even the fastest database language cannot
hope to compete with BASIC. Some unscientific testing by BC
Enterprises has shown that database operations using BASIC can be
up to 8 times faster than using FoxBASE+ -- and FoxBASE+ is one
of the fastest database languages.
The DB-Dupe program, depending on the database, searches
through approximately 8000-12000 records per minute, on average,
doing automatic search and using the special DLX index format (of
course, that's on a 25Mhz 80386). The fastest possible search
speed, under ideal conditions on a 386, is probably close to
60,000 records per minute, or 1000 records per second.
Since the program is written in BASIC, we have also been
able to make it compatible with a wide range of databases that
are not compatible with each other. For example, DB-Dupe is
compatible with FoxPro and DBase IV, including indexes and memo
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 3 ***
files, even though FoxPro and DBase IV are not compatible with
each other. DB-Dupe is also compatible with any program
compatible with FoxBase/FoxPro/DBase III/DBase IV. We have
verified compatibility with such program as Dbxl, Alpha Four, PC-
File:db, Wampum, Zephyr, and DataPlus. Many, many programs,
however, use the Dbase file structure. If you are unsure about a
file, just try to load it. DB-Dupe will tell you whether or not
it can be used.
I.D. Safety Considerations
Whenever you use a program such as DB-Dupe which can
write data to your database file, MAKE A BACKUP FIRST!!!!!!!!!!
PLEASE!!!!!!!! The DB-Dupe program is very careful not to harm
your data in any way, but power glitches, program bugs, etc., do
happen. If you have an important database, it should be
important enough to back up.
DB-Dupe does not actually remove deleted records from a
database. Rather, it marks deleted records with an asterisk,
which Dbase-format files use to signify that a record may be
deleted, whereas a space character signifies it should not be
deleted. A record is only physically removed when a "pack" is
performed on the file. This pack deletes all marked records,
and, optionally, updates any associated indexes.
Why does the DB-Dupe program not perform a pack itself?
First, because generally indexes are associated with a database
file, and these must be updated after a pack. If the DB-Dupe did
the pack, it would also have to update indexes, which would make
it a much more complicated problem. Second, because it is simple
to go into the database program that created the file and pack
it. Any Dbase-compatible systems should use the asterisk
designation for deleted records (if it doesn't, it's not Dbase-
compatible), and should have a pack function. Third, by not
packing the file itself, the DB-Dupe program won't do any
accidental harm to your data.
DB-Dupe is very protective of your data. It will not
write an asterisk (to mark for deletion) or a space (to undelete,
or "recall") over a character which is not currently a space or
an asterisk. In other words, if you ask the DB-Dupe program to
mark a record for deletion, but the record contains a misplaced
character where a space or asterisk should be, the DB-Dupe
program will not write over it.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 4 ***
I.E. Shareware
The DB-Dupe program is marketed as shareware. This means
that you may try the program out, to see if it suits your needs,
before you buy it. However, if you use the program on an on-
going basis, then you are expected (and morally obligated) to pay
for it. We ask that you limit your evaluation time to 30 days.
That should be more than enough time to put this software through
extensive use.
The registration price of the DB-Dupe program is $59.
This single registration allows you to use the program on all
computers in one location, regardless of the number of computers
used. You do NOT need to purchase a copy for each computer.
When you register, you will receive the latest version of
the program, along with a typset printed manual. You will also
receive free or low-cost upgrades. Unlike some other companies,
we never charge fees for upgrades that fix program bugs.
You will also receive discounts on further Dbase-
compatible utility software. Our next utility will probably be a
data integrity checker. This will search your databases for
corrupted records, such as garbage characters, string characters
in numeric fields, etc. If you have other suggestions for high-
speed database utilities, we'd be happy to hear from you.
Please refer to the end of this documentation for more
information on ordering, and for an order form.
I.F. Disclaimer
This software, along with any manuals and supporting
materials, is sold "as is," without warranty as to their
performance, merchantability, or fitness for a particular
purpose.
BC Enterprises, along with its owners, agents, employees,
and/or associates specifically disclaim all liability for the
operation of this program, including, but not limited to,
consequential, special, and/or indirect damages, even if the
possibility of such damages has been made known. Some states do
not allow the exclusion or limitation of incidental and
consequential damages, so the above limitations might not apply
to you.
The entire risk of using this software is borne solely by
the user of this software. In no case will BC Enterprises, along
with its owners, agents, employees and/or associates be liable
for more than the purchase price of this software. YOUR USE OF
THIS SOFTWARE SIGNIFIES YOUR AGREEMENT TO THESE TERMS.
The above statement will be construed and interpreted
according to the laws of the state of Virginia.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 5 ***
II. Using DB-Dupe
This documentation is written with the assumption that
those using it are fairly knowledgeable about Dbase-format
databases. People who need to maintain large databases are
generally not computer novices. Thus, this documentation file is
devoted to telling you about the operation of the DB-Dupe
program, and not about the operation of Dbase III/IV or
compatible databases.
I.A. Files Needed to Run DB-Dupe
The files on your disk should include:
DBDUP.EXE -- the DB-Dupe executable program
DBDUP.HLP -- the DB-Dupe on-line help file
DBDUP.MSG -- DB-Dupe message file
DBDUP.DOC -- this documentation file
DBDUP.PIF -- Windows 3.0 PIF file
DBDUP.ICO -- Windows 3.0 icon
QUICK.DOC -- Quick start instructions
SORTDLX.EXE -- High-speed DLX index sorter
The only two of these files you absolutely need to run
the DB-Dupe program are DBDUP.EXE and DBDUP.MSG. The DBDUP.HLP
file is good to have for on-line help, but it is not necessary to
run the program. If the SORTDLX.EXE program is not available,
the DB-Dupe program defaults to internal routines. Of course,
you only need the two Windows files if you intend to run the
program under Windows.
The program generates some other files itself as you run
it. For example, if you change the colors, it will generate a
file called DBDUP.COL. When you create DLX indexes or save
search parameters, it creates additional files.
To run DB-Dupe, just copy the files from the diskette to
your hard drive. Copying into the your database directory is
probably most convenient, but not necessary. Start the program
by typing DBDUP at the prompt.
II.B. Extended Video Support
One of the most important features of DB-Dupe is that it
runs in 43 or 50 line modes on EGA and VGA monitors,
respectively. This makes it much easier to compare duplicate
records, because you are able to see almost twice as much in 43-
line mode, and exactly twice as much in 50-line mode.
The DB-Dupe program automatically uses the highest
number of lines possible on your monitor (up to 50 lines).
However, it allows you to override this manually. To start DB-
Dupe with a certain number of lines, you must specify the number
after the program name on the command line, using the format
"/L:number". For example, to start DB-Dupe in 25-line mode on
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 6 ***
an EGA or VGA monitor, you would type:
DBDUP /L:25
The only three values that are valid on the command line are 25,
43, and 50. Any other value on the command line is ignored.
If you have a CGA or monochrome monitor, you would never
need to use this option, since the program can only run in 25-
line mode on those displays.
Another video option you can specify on the command line
is to run in monochrome mode. You do this by adding "/M" on the
command line. This forces the program to run in monochrome mode
when it would normally go into color mode. It is helpful on AT&T
computers and some laptops that can fool programs into thinking
they are color monitors when they are really monochrome. Note,
however, that if you have used the color picking function to
choose colors other than the default colors, your color
selections override the "/M" on the command line.
II.C. Help System and Function Keys
The DB-Dupe program offers extensive on-line help. Help
is available at all menus by pushing F1. Help is also available,
by pushing 'H', from virtually every pop-up message that DB-Dupe
shows you.
You can scroll through the help message by pushing the up
and down arrows. To remove the help, push Esc.
Besides F1, the other function keys and what they do are
listed along the bottom edge of the Record window. These keys
perform various operations on your database records: Swap, Delete,
Undelete, Zoom, and View Memo. Excluding F2, the odd-numbered
keys work on the top window, and the even-numbered keys work on
the bottom window. Precisely what each of these operations does
is discussed later.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 7 ***
II.D. The Main Menu
The Main Menu of the DB-Dupe program has six options on
the menu bar: File, Search, Goto, Utils, About, and Exit.
FILE lets you load a database, along with indexes,
formats, etc. You also specify duplicate record search
parameters from the FILE sub-menus. Whenever you run DB-Dupe,
you will want to go to this sub-menu first.
SEARCH does the actual searching through the loaded
database for duplicate records. You cannot use SEARCH until you
have loaded a database.
GOTO lets you move through a loaded database. You can
view records, advance the record pointer, look at memos attached
to records, and delete or undelete records from this menu. You
cannot use GOTO until you have loaded a database.
UTILS lets you set colors for the program.
ABOUT tells about the program, and begs you to PLEASE
register the program.
EXIT closes all files and ends the program. Note that
the DB-Dupe program does not have a "Shell" function. But we
might add one if you write and say it needs one.
II.E. The FILE Sub-Menu
The File sub-menu is where you load all database files,
indexes, and formats, etc. You can load these files
individually, or load them all at once by loading a previously
saved format file.
The menu options on the File sub-menu are: Use, Info,
Params, Load, Save, Index, Recall, and Return.
USE opens a database file. It is functionally equivalent
to the USE statement in Dbase. If you have another database
loaded when you choose this, it asks you if you want to save the
current file parameters before loading another file.
INFO tells you some things about a loaded database file,
such as the last update, the number of records, the size of a
record, and the size of the file.
PARAMS opens up another window where you can set search
parameters, specify an index and format, etc.
LOAD lets you load an SPT (Search ParameTer) file that
you previously saved.
SAVE lets you save an SPT file for later use. The SPT
file saves the name of the database, index, and format file you
are using, along with other search parameters. When you load the
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 8 ***
SPT file, these are automatically reloaded and opened. If you do
regular searches on certain files, you will want to use this
function.
INDEX lets you create a specially designed DLX (DeLete
indeX) file for search your database. The index created is
automatically loaded as the current index, replacing any other
loaded index. The DLX index is optimized for fast searching, and
is much faster than using other index formats. The DLX index is
discussed later in this document.
RECALL undeletes all records in the database which have
been marked for deletion.
RETURN takes you to the previous menu.
II.E.1. USE(ing) a Database
As mentioned before, you can load a database by the USE
function, or by loading a previously saved search parameter (SPT)
file.
The DB-Dupe program is far more liberal than most
programs as to what it considers to be a DBase-compatible
program. Most DBase-compatible databases, such as FoxPro and
DBase IV, use what is called a "signature byte." This is a
special code at the beginning of the file which identifies the
database as in a certain format (also whether there is a memo
file associated with the database). If this signature byte is
not correct, then the database will not be loaded, regardless of
whether the data in the file is actually compatible or not. This
is the reason that DBase IV and FoxPro are incompatible. They
use different signature codes, and simply refuse to load a file
without the right code.
The DB-Dupe program, however, ignores such signature
bytes. Thus, it will attempt to load any file you tell it to,
whether you tell it to load a database file, a word processing
file, a spreadsheet file, or an executable program. Only when
the file itself is found to be incompatible is it rejected. A
file is deemed incompatible by the DB-Dupe program when its
header information does not end at the right place, or when the
file has more than 128 fields, or when the header reports a
record length of 0, or when the year of the last update to the
file is less than 80 or greater than 99, or the record length is
less than 1. These five checks provide pretty good safety that a
file deemed compatible is compatible. It is possible, although
unlikely, that a non-Dbase file could have data that would fool
the DB-Dupe program into thinking it was a DBase file when it was
not. However, from the record display the DB-Dupe makes upon
loading the file, you could easily see the file was not
compatible. Needless to say, you should not do a duplicate
record search on a file that is clearly not compatible. The
results might be unpredictable, to say the least.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 9 ***
Because the DB-Dupe program is so liberal as to what it
considers a compatible file, the program should work with almost
any program calling itself Dbase-compatible.
a. The DB-Dupe Screen Layout
When you load a database, the program displays the
first database record, or as many fields as will fit,
in the top window. It displays the database fields,
or as many fields as will fit, along with their type
and length, in the bottom window. (If there are too
many fields to see the whole database structure in
the second window, use the F8 key to "Zoom" the
window.) If this information does not look right,
then perhaps you have loaded a file which is not
actually compatible. You should not perform a
duplicate search on a file that does not appear to
have the proper data and/or fields. You most
emphatically should not perform an automatic search on
such a file.
The name of the currently loaded database appears in
the upper left corner of the screen. The record
number of the record currently being read appears in
the top center. This number changes as a search is
performed.
Along the bottom line of the first window, and the
bottom line of the second window, notices about the
record in the window are displayed. At a minimum, the
record number of the record in each window will be
shown. Notices are also placed on these lines if the
records have attached memos, or if the records have
been marked for deletion.
b. Network Compatibility
If you are using DOS release 3.10 or later, database
files are automatically opened for shared use, whether
you are using a network or not. If you are using DOS
3.0 or earlier, all files are opened for your own
exclusive use.
The DB-Dupe program is fully network compatible, which
means that it opens files in shared mode, and checks
for file and record locks as it operates. Thus, other
people can continue to use a database which the DB-
Dupe program is processing. The only problem you
might encounter when others are using the database
you are also using is that the program may not let you
delete or undelete or swap fields using a record which
is being edited by another person. This will not
cause an error -- the program simply won't let you
access the record.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 10 ***
You should also be aware of the difference between
hard locks and soft locks on a network. A hard lock
uses DOS calls to physically prevent any other program
from writing to a locked record. This is the type of
lock which DB-Dupe uses. A soft lock is internal to a
particular program which "remembers" what records are
locked, and does not let anyone else use those
records. Soft locks will not prevent DB-Dupe from
altering records. Thus, you could inadvertently alter
a record while someone else is editing it, causing the
changes made to be lost. Unfortunately, there is no
way that DB-Dupe can detect such a situation.
Fortunately, it would be a rather unlikely occurrence.
Of course, you will need network versions of whatever
database you are using in order to make use of the
network functions. Single-user databases usually open
files in exclusive mode, which means that no other
program may access them. If a file is being used
exclusively by another, the DB-Dupe program will
not be able to open the database.
Files which are used only by the DB-Dupe program
(such as DLX indexes, SPT files, etc.) are opened in
exclusive mode, since it is highly unlikely that any
other program would need to access them. All
database-related files (DBF's, IDX's, NDX's, and
FMT's) are opened for shared use.
Even though the DB-Dupe program can use files across a
network, it runs much faster when files are located on
the local computer. Thus, you should try to use the
DB-Dupe program on local files, rather than network
files, whenever possible.
II.E.2. Info
This is pretty straightforward. It tells you the number
of records in the file, the file size, the last update, and the
length of a record. It doesn't really have any utility as far as
searching for duplicates goes, but sometimes the information is
helpful. For example, if the info doesn't look right the program
might not really be compatible with the DB-Dupe program.
II.E.3. Setting Search Parameters
Setting search parameters is done via the 'Params'
menu option. Once you choose this option, a pop-up box will
appear with ten search parameters listed. These parameters
determine the search fields, the way in which you will view your
data, and the type of search.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 11 ***
II.E.3.1. Search Fields
Lines 1 and 2 of the pop-up box ask you which fields of
the database you want to compare when you are doing an
interactive search (more on interactive vs. automatic later).
You can only specify 2 fields because, frankly, more than that
would be superfluous. (You can, however, specify only one search
field, which is often more effective.) If a record is the same
on two fields of your choosing, then it is a pretty good bet you
will want to see the record.
For example, suppose the two fields you pick are LASTNAME
and ADDRESS1 (which contains the street address). That would
mean that the program would show you records for people who have
the same last name and the same street address. Such people are
probably the same person, and, even if they aren't, there is
enough chance that they are the same that you would want to see
the record. On the other hand, if you want to check as many
duplicates as possible, choose only one search field.
When specifying fields, the program opens up another pop-
up box from which you may choose the field you want. The first
"field" listed is "Not defined". If you choose "Not defined", it
wipes out any previously chosen field. For example, if you have
previously chosen "ADDRESS" and "NAME" as search fields, and you
decided you only wanted to search by name, you could pop up the
Search Fields box again and replace "ADDRESS" with "Not defined"
Numeric fields can present a special problem when they
are used as a compare field. A numeric field which has not been
edited at all will be filled with blanks, but a numeric field
which has been edited, even if the cursor simply passed through
it, is filled in with at least 1 zero character. If you the
field has decimal places, then the field is filled with zeros to
the right of the decimal. Blank numeric fields and zero filled
fields are *not* considered to be the same, no matter what search
mode you are using.
Memo fields cannot be chosen as fields to be compared.
II.E.3.2. Sort Field
Sort Field tells the DB-Dupe program what field the
current database is sorted by. The database must either be
physically sorted by some field, or an index must be loaded, for
the DB-Dupe program to effectively find duplicates. Because of
the way the program searches, DB-Dupe cannot find duplicates in
an unsorted, un-indexed file. YOU MUST SPECIFY A SORT FIELD IN
ORDER FOR THE DB-DUPE PROGRAM TO SEARCH FOR DUPLICATES.
The DB-Dupe program searches by comparing all records
which are the same on the sort field. When it reaches a record
with a different sort value, it stops comparing. Thus, the
program would only find duplicates in an unsorted file if they
just happened to be physically next to each other -- not the most
efficient way to purge duplicates.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 12 ***
Although you could do a physical sort of your file by zip
code (or some other field), the program searches much more
quickly when an index is loaded. The reasons for this are
explained in the next section.
When you specify an index, you cannot change the sort
order of the database unless you unload the index, or load
another index.
II.E.3.3. Index
Along with using a database file, you will usually want
to load an index file (SET INDEX TO in DBase lingo). Searching
using an index is astronomically faster than not using an index.
This is because the DB-Dupe program compares index keys directly,
a function not available in Dbase.
To compare database records themselves, the DB-Dupe
program must read the raw data from disk, then break it up into
its component fields, and place it into an array. It performs
this operation very quickly, but the larger number of fields a
database has, the longer it takes to break the raw data into
fields.
When the DB-Dupe program has an index loaded, however, it
compares only index keys until it finds an index key match. Only
when it finds such a match does it access the database itself,
and loads the associated records. Since indexes are usually
small, they are faster to access (the hard drive heads don't need
to move as far) and they can usually be held entirely within a
RAM cache.
The index file you specify to load can be one of three
types: a FoxBase/FoxPro .IDX index, a Dbase III/IV .NDX index, or
a DB-Dupe .DLX index. DB-Dupe does not currently support Dbase
IV MDX indices, or FoxPro 2.x compact indices or compound
indices.
a. DLX indexes
The "native" index format of the DB-Dupe program is
the DLX index. Whereas NDX and IDX indexes are
optimized for finding records quickly, a DLX index is
optimized for moving quickly through the index itself
sequentially. Duplicate record searches using the DLX
index are generally substantially faster than searches
using the FoxBase format IDX indexes, and several
times faster that using no index at all.
You can create a DLX index by using the Index command
from the File sub-menu, or the DB-Dupe program
automatically creates one when you load a Dbase
format NDX file. Note that DLX indexes created from
the Index command are indexed solely on the contents
of one database field (although "Soundex" or
"Alphanumeric only" can be specified). They cannot
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 13 ***
index on multiple fields or use Dbase expressions.
b. Dbase NDX
Dbase NDX file are not used directly by the DB-Dupe
program. Rather, they are converted to a simpler DLX
index. When you tell DB-Dupe to load an NDX file, it
automatically converts the index to a file with the
same root, but with the extension .DLX. (The original
NDX file is left unchanged, of course.) If the
program finds that a DLX index with the same root as
the NDX file already exists, it will ask you if you
want to use that, or if it should go ahead and convert
the index.
For example, if you load an index called
D:\DB4\MASTER.NDX, the program will automatically
convert it the first time you load it, and create a
file called D:\DB4\MASTER.DLX. If you tell the DB-
Dupe program to load D:\DB4\MASTER.NDX another time,
it will tell you that D:\DB4\MASTER.DLX already
exists, and ask you if you want to use it. If you
haven't made any changes to your data since the DB-
Dupe made D:\DB4\MASTER.DLX, then you probably will
want to use it. Otherwise, you will want to convert
the NDX index again.
Why convert an existing NDX index instead of just
generating a new DLX index? Two reasons. First, as
mentioned above, the DLX created by the Index command
itself cannot use Dbase functions or expressions, and
it cannot index on multiple fields. DLX's converted
from NDX's can have multiple field values and Dbase
functions. Second, the process of converting an NDX
file usually takes less time than creating a new DLX
file. The DB-Dupe program converts about 4000 NDX
entries per minute (on a 10 Mhz AT). It converts the
same number of entries per minute regardless of the
total index size. The creation of a new DLX index is
very similar to a record sort, and sorting does take
exponentially longer with more records.
c. FoxBase/FoxPro IDX indexes
The DB-Dupe program has the capability to use
FoxBase+/FoxPro IDX indices directly, without
converting them. They are faster for searches than
NDX's would be. However, they are not as fast as
DLX's. Since IDX indices are directly supported,
there is no facility to convert them to DLX format.
You can, however, use the Index command to create DLX
indices for FoxBase files.
Generally speaking, if you are indexing on a straight
database field, you are better off creating a DLX
index than using the IDX index. However, for a small
database (fewer than 2000 records), it probably
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 14 ***
doesn't matter much one way or the other. Which index
type to use is a function of deciding whether a slower
search balances off the fact that generating DLX
indexes takes a little while, whereas your IDX index
would be immediately available.
To unload an index, you can simply load another index.
Alternatively, to unload an index, without loading another index,
use the delete key to delete the name of the current index. This
will unload the index, and set you back at physical record number
1.
II.E.3.4. Loading a format file
Normally when you load a database, the fields are
displayed one field per line, along the left side of the screen.
Depending on how many fields your database has, and how many
lines your monitor can display, you may or may not be able to see
the data in all fields.
In order to let you see all fields (or more fields at any
rate), DB-Dupe supports the use of DBase format files. Dbase
format files are files which use "@ say" and "@ get" to display
field data and field prompts. Normally these files have the
extension ".FMT". If you are unfamiliar with format files,
you should look in the documentation for your database, or pick
up a book about Dbase at your local computer store.
The DB-Dupe program does not support all format functions
and commands that the Dbase supports. It is limited to straight
"@ say"'s using either field names or quoted text strings, and
"@ get"'s followed by field names. It further ignores any
"Picture" or "Valid" clauses in format files. If the DB-Dupe
program encounters an "@ say" which is not a quoted text string
or field name, it displays an exclamation point. If it
encounters an "@ get" which is not followed by a field name, it
displays a one-character blank field.
The DB-Dupe program supports lines containing both "SAY"
and "GET" in the same lines, such as:
@ 1,1 say "First Name" get FIRSTNAME
It also supports the alternate format, often used by program
generators, of:
@ 1,1 say "First Name"
@ 1,10 get FIRSTNAME
The program also supports multiple page format files,
which use the "READ" or "READ SAVE" command in between pages.
Fields shown on pages other than first format page can be seen by
using the DB-Dupe "Zoom" function (discussed later).
One other limitation on format files (which would
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 15 ***
probably never actually come up), is that format files used by
DB-Dupe are limited to 256 lines, or a combination of 384 total
SAY's and GET's. A format file which was 256 lines would be an
extremely long format file, and would be very rare indeed.
The DB-Dupe program does not support the use of compiled
format files. The format files must be straight ASCII text.
Dbase IV Note: DBase IV uses a .SCR source file from
which .FMT files are generated. DB-Dupe cannot load the .SCR
files, but only the .FMT files which are generated from them.
For information on generating .FMT files from .SCR files, see
your Dbase IV manual. FoxPro 2.0 also uses many different screen
files. Why can't people keep things simple?
II.E.3.5. Soundex
The Soundex function of the DB-Dupe program is a very
powerful method of searching for duplicates. This function finds
records which sound alike.
Oftentimes, due to misspellings, bad handwriting, etc., a
person's name may be spelled incorrectly in your database, and,
hence, be entered twice with slightly different spellings. An
exact search will show these two records as being different, but
a Soundex search will often show them as the same.
For example, suppose someone named "Tim Johnson" orders
something from your company and is added to your mailing list.
The next time he orders, his name is entered as "Tim Jonson". A
Soundex search will flag these two records as being the same.
Similarly, it would find records such as "Jerry" and "Gerry", or
"Stephens" and "Stephans".
Setting Soundex on does make searches slightly slower,
and so it is not the default setting. Its effectiveness at
finding duplicates, however, more than offsets the slower speed.
You will be amazed at how many more duplicates you find with
Soundex on than off. Soundex should only be used, however, when
you are comparing fields which are primarily text, such as names
and addresses. Zip codes, being all (or mostly) numeric data,
are not suitable for Soundex searching, since Soundex would show
all zip codes as being the same.
The Soundex setting is ignored when you are doing an
Automatic search, since records must be identical to be deleted
during an Automatic search.
II.E.3.6. Alphanumeric Only
When this is turned ON, any punctuation characters in
fields are ignored. For example, the following two fields would
be seen as the same:
"123 W. Main St., Apt #452" and "123 W Main St, Apt. 452"
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 16 ***
because both would evaluate to: 123WMainStApt452
Using the Alphanumeric only setting helps you to find
records which are only different because the punctuation is
different, or where spacing is different. This is especially
helpful for comparing Address fields, which usually contain a
good deal of punctuation. It is not good when comparing ZIP
codes, because searching for punctuation in the ZIP code just
slows things down.
If you have turned Soundex ON, then Alphanumeric Only is
ignored, because Soundex already strips punctuation.
II.E.3.7. Search Type
Three types of duplicate searches are supported --
"Interactive", "Automatic", and "Auto, Exclude...".
"Interactive" search shows you duplicate records, based
upon Search Fields and Soundex setting, and lets you decide
whether to delete one of the records. Interactive search is much
more effective at deleting duplicates than automatic search, but
it takes much longer.
"Automatic" search compares database records and deletes
exact duplicates without any user intervention. Records must be
EXACT duplicates in order to be deleted, including upper/lower
case, and memo field number (if any). A window in the middle of
the screen shows you how many records have been marked for
deletion.
"Auto, Exclude..." searches allow you to exclude one
field from an automatic duplicate search. You can exclude any
field except memo fields. For example, many databases use date
fields to track when a record was entered. Thus, if a person is
entered into the database in twice, the two records will be
exactly the same except for date of creation. Normally, this
would require you to use interactive search, but with automatic
exclude, you can exclude the date field and automatically delete
records which are the same on all other fields. This can be a
real time-saver, since automatic searches are must faster than
interactive searches.
II.E.3.8. Report
This function allows you to generate a report of what
records have been deleted, or recalled, by DB-Dupe. This is
especially helpful after an automatic search, in case you want to
go through the database and check records just to make sure that
no records you want to keep have been deleted.
Besides sending the report to the printer, you can
generate a text file with the above information. If the file you
specify exists, the new information is appended to the end. That
way, you can keep a running record of all deletions made to a
database over time.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 17 ***
II.E.4. Save
The Save command saves the currently loaded search
parameters to a file that can be loaded later. The name of the
currently loaded database, along with all parameters in the
Params pop-up menu are saved. SPT files are saved only in the
default directory, so you cannot add a file path on to them. The
file extension .SPT (Search ParameTers) is added to the name you
specify.
II.E.5. Load
The Load command loads a previously saved search
parameter (.SPT) file. This function automatically loads the
database file, and opens any index or format files associated
with the database.
II.E.6. Index
The Index command creates a new DLX format index file for
the currently loaded database. The new DLX index becomes the
active index.
The index is created based upon a single field of the
database which you specify. Any field except memo fields can be
used in creating the index. Character fields are converted to
all upper case, so the DLX index is the functional equivalent of
indexing on UPPER(fieldname). Also, date fields are indexed
according to their raw data format of YYYYMMDD, which is the
functional equivalent of indexing on DTOC(Datefield,1).
You can specify that a DLX index be created using the
Soundex or Alphanumeric Only options (see above). These options
are appropriate for character fields, but not appropriate for
primarily numeric fields such as ZIP codes. Specifying either of
these options slows down index generation time.
The indexing process uses the Quicksort, which is
believed to be the fastest sort available these days. After
writing index keys to the index file, the process sorts them in
place, so it does not create lots of temporary files, and does
not require extensive disk space, as a sort using Dbase often
can. Furthermore, the streamlined DLX format only takes up about
50%-70% of the disk space that a comparable NDX or IDX index
would take.
Because the indexing is performed primarily on disk, and
not in RAM, it can take a while to index a very large file. You
can cut down on the time somewhat if you index onto a RAM disk.
If you are working over a network, you should always index onto a
local hard drive, whenever possible. If you do not specify a
path for the index, the current default directory and drive are
used.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 18 ***
This version of the program (1.1) comes with an external
indexing program called SORTDLX.EXE. If the DLX index has a file
length of more than 30000 bytes, this external program is used.
It is generally much faster than the internal indexing routine,
because it makes much better use of RAM. The internal indexing
routines use a RAM buffer of about 12K, but the external routine
uses a RAM buffer of 128K. If the external indexing routine
fails for any reason, the internal routine automatically takes
over.
The format of the DLX index is very simple. See Appendix
A for more information.
II.E.7. Recall
Recall undeletes records in the loaded database. It is
the equivalent of issuing the RECALL ALL command in DBase. DB-
Dupe may perform this function faster than your regular database,
depending on which database you use. You can interrupt the
process by hitting -Esc- during the recall.
The Recall function does not recall in index order, but
rather in true record order. This makes it a lot faster.
II.E.8. Return
Return takes you to the previous menu.
II.F. The Search Menu
The Search command on the main menu immediately begins a
search, as long as you have specified the required search
parameters. During this search, the record numbers of the
records being read are displayed at the top of the screen. These
will go in numerical order if there is no index, and will jump
around if there is an index loaded.
You may hit -Esc- to temporarily halt the search, and
return to the main menu. You can start the search up again
later from where it left off, unless you specify a new sort order
or load a new index.
Once a duplicate record has been detected, the Search
sub-menu comes up. This sub-menu contains several functions and
several sub-menus. These are: Delete 1, Delete 2, Zoom, Memo,
Undelete, Continue, and Stop.
II.F.1. Delete 1 & Delete 2
Delete 1 and Delete 2 mark the record in window 1 or
window 2, respectively, as deleted. After you choose one of
these, the program waits approximately one second, and then
continues searching. If you want to delete both records shown,
use the function keys, and then hit "C" for continue.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 19 ***
If you delete record #2, the program continues to search
for records which duplicate record #1. However, if you delete
record #1, then the program immediately skips to the next record,
and starts looking for duplicates of it.
II.F.2. Zoom
The zoom function allows you to see all of the present
record, even if it will not all fit into its window. This can
happen either because there are more fields than available lines
in a window, or because one or more database fields is too long
(more than 65 characters) to see the whole field. (Of course,
you can still use the function keys to zoom.)
If no format file is loaded, the zoom function pops up
the record data in a window in the middle of the screen. If
there are more fields than can fit in this window, then you can
move the field highlighter up and down to see the remaining
fields. Field which are longer than 65 characters are stretched
over multiple lines in the window, so you will be able to see all
data in the record.
If a format file is loaded, then the record is shown
using the entire screen, pretty much as it would appear in Dbase.
You can use the PgUp and PgDn keys to see subsequent pages of
multi-page format files, just as you would in Dbase. You can
also hit -R- to remove the little instruction box from the lower
right corner, if it is getting in the way of your data.
When you are through using the zoom function, push -Esc-
to clear it.
II.F.3. Memo
The memo function allows you to view the contents of
memos attached to database records. When a record does have a
memo attached, the DB-Dupe program displays a notice saying
"Memo" beneath the record.
The DB-Dupe program supports the memo formats of DBase
III Plus, DBase IV, FoxPlus, and FoxPro. It also supports the
use of memos with any program which has a compatible memo format.
To display a memo file, the DB-Dupe program pops up a
window in the upper half of the screen. If a memo is more than
13 lines long then the up and down arrow keys can be used to
scroll the memo.
Just as when using Dbase/FoxBase, a memo file must be in
the same directory as its host database. Dbase III, Dbase IV,
and FoxPlus memo file must have the extension ".DBT". FoxPro
memo files have the extension ".FPT". DB-Dupe determines what
type of memo file it is dealing with based upon information
contained in the header of the database.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 20 ***
To cut down on RAM used by memos, the DB-Dupe program
limits how much of a memo it will show you to either 100 lines or
5K. If a memo is truncated based upon the number of lines, it
will give the message "*** Memo truncated at line 100 ***". The
program does not give any special message if it cuts a memo off
at 5K, but it should not be too hard to spot such a cut off.
This limitation should not be much of a problem, since few people
use memos of such length. And, in any case, you will be able to
see enough of a memo to determine if you want to delete the
attached record.
Note about FoxPro Memos: FoxPro memos can contain any
type of information in them, even graphics files. DB-Dupe,
however, treats all memos files as text. So, depending on the
type of information stored, it may or not be displayed properly
by the DB-Dupe program.
II.F.4. Undelete
Undelete unmarks a record that had been marked for
deletion. This is equivalent to the RECALL command in Dbase.
II.F.5. Continue
This command continues the duplicate record search
without marking either of the record shown for deletion.
II.F.6. Stop
Stop suspends the current search and returns you to the
main menu. It is functionally equivalent to hitting the -Esc-
key while the program is searching.
II.G. The Goto Menu
The Goto menu allows you to advance through your
database.
Many times when searching for duplicate records, you may
not want to start searching at the first record. It may be that
you were interrupted in the middle of searching, and want to take
up where you left off. Or perhaps with a zip-code-indexed
database, you have collected several hundred records at the
begining that have no zip, and hence take a long time to search
through. The Goto menu allows you to begin your search wherever
in your database you want.
The Goto menu is slightly different depending on whether
you have an index loaded or not. If an index IS NOT loaded, then
the menu has the options: Goto #, Next, Previous, and Return. If
in index IS loaded, the menu has the options: Top, Next,
Previous, Advance, and Return.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 21 ***
II.G.1. Goto #
Goto # is only an option when an index is not loaded.
This option allows you to specify an absolute record number to go
to. This is functionally equivalent to the GOTO statement in
Dbase. The program shows you how many records are in the
database. Obviously, you cannot goto a record number higher than
the number of records in the database.
Goto # is not available when an index is loaded because
all records are relative to an index. Specifying an absolute
record number to go to would really you do you any good, because
you wouldn't know if you were actually going forward or backward
through the index.
II.G.2. Top
Top is only available when an index is loaded. This
takes you to the first record listed in the index.
II.G.3. Next and Previous
Next and Previous are available whether an index is
loaded or not. Next takes you to either the next physical record
or the next record based on the index. Previous takes you to
either the previous physical record or the previous record based
on the index.
II.G.4. Advance
Advance is only available when an index is loaded. This
option skips through the index a certain number of records. This
is equivalent to the SKIP [#] command in Dbase when an index is
loaded. Advance only skips forward through a database, not
backwards. If you need to go to a previous record, you can
either just use the Previous option, or you can use Top and then
Advance.
The Advance function is somewhat slow when a
FoxBase/FoxPro index is loaded. This is because it must actually
work its way through the index a certain number of times. To
show you its progress, a box pops up in the middle of the screen.
When you have a DLX index loaded, the Advance function is
instantaneous, no matter how many records you are advancing.
II.G.5. Deleted
This function allows you to advance to the next deleted
record. If you choose this and then find yourself waiting a long
time, you can hit -Esc- to stop.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 22 ***
II.G.6. Return
Return takes you back to the main menu.
II.H. Utils
The Utils menu currently only has one option, setting
program colors.
Setting colors allows you to change all the colors the
DB-Dupe program uses. There are 11 different colors setting you
can change. You cannot change all colors directly, but you can
change all colors at least indirectly. (For example, the colors
used for on-line help are based upon the colors you pick for
Dialog Boxes.)
Start changing colors by hitting 1-9, or A-B to specify
which color setting you want to change. Once you do this, a
large block cursor will appear in the color box at the left. You
can move this cursor with the arrow keys to the new color you
want. As you move the cursor, the new color will be shown in the
box at the right. When you have the color you want, push -Enter-.
You can then go on to set the rest of the colors similarly.
When you are done setting colors, you can hit -S- to save
the new colors, -X- to exit without saving, or -R- to restore the
program color defaults.
Your new colors take effect immediately upon exiting the
color setting function.
II.J. Quit
Quit closes all files and exits the program. It is
functionally equivalent to the QUIT statement in Dbase.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 23 ***
III. Suggestions for Use
1. Always make sure you are using a RAM cache with DB-Dupe.
Use a RAM disk also when possible for indexing.
2. When you are creating an index, create it with a field
that will have few duplicates. For example, address databases
are usually best indexed by street address, since there will
likely be few duplicates; but, you could also try indexing by
zip, or last name, or city, etc.
3. If you have a database with lots of blanks in the
beginning, use the advance function to skip them. Checking
through several hundred blanks can take a couple of minutes, and
is usually not worth the time.
4. You can delete all blank records in the a file by indexing
on any field, and then letting the program do an automatic
delete. Let the program go until the record counter at the top
of the screen goes from 1 to 2. When the record counter reaches
2, hit escape to stop searching. The program will have deleted
all blanks except the first one, which you can delete manually.
5. Always make backups before doing a duplicate record
search. Really, you should have a program of regular backups,
every day or at least once a week. Such backups are especially
necessary when using a new piece of software.
6. Use the report function, especially when you are using
DB-Dupe for the first time on a database. That way you can make
sure that the program worked properly.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 24 ***
Appendix A
Format of DLX Index
We provide this information on DLX indexes because we
feel that file formats should always be made available.
DLX indexes are somewhat like IDX or NDX indexes in that
they have a header, followed by the actual index values and
record numbers. However, the DLX headers and record entries are
much simpler and, hence, faster.
The DLX header is 255 characters in length. The first 13
bytes contain the string "DBDup 1.1 DLX". The next two bytes are
an integer specifying the length of the database field
(FieldLendth) upon which the index was created. The next four
bytes are a long integer specifying the file length of the DLX
index. The next 64 bytes are a string specifying the Key
Expression of the DLX index. The next 2 bytes are an integer
specifying whether the DLX index was successfully completed. Any
value other than 5 means it is an invalid index.
The actual record entries in the DLX are strings of the
length of FieldLength, followed by a long integer representing
the record number. These record entries begin at byte number 256
(255 if you consider the first byte to be 0), and the length
between entries is the FieldLength + 4.
A BASIC type representation of the header would be:
Type DLXHeader
ID as String * 13
FieldLength as Integer
FileLength as Long
KeyExpression as String * 64
DlxCode as Integer
Reserved as String * 169
End Type
A BASIC type representation of the record entries would
be:
Type RecordEntry
FieldValue as String * FieldLength
RecordNumber as Long
End Type
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 25 ***
Appendix B
Errors
There are three types of errors which you might encounter
when using the DB-Dupe program.
The first type of error is a program logic/code error.
This is what is typically called a "bug". When using a program
as complicated as DB-Dupe, which supports multiple file formats
in multiple screen formats, bugs are not terribly uncommon.
We've done our best to catch all the bugs, but no one can ever
catch them all. The bugs that might remain in the program are
probably display bugs, meaning that the screen might not be
updated correctly for some reason. Other bugs might occur when
you are loading unusual combinations of files, or loading and
unloading many different files during one session. If such bugs
do occur, the easiest thing to do is simply to exit the program
and restart. We would appreciate it if you would contact us with
bug reports if you find any bugs. We will fix the bug, and send
a corrected copy to you for free, whether you have registered
this program or not.
The second type of error is an incompatible or corrupted
file error. The program tries to trap such errors and keep
going, but this is not always possible. These errors can occur,
for example, if you attempt to load a bad index or format file
(although the program tries very hard to catch such errors). If
you load a corrupted (or compiled) format file, the program may
try to process it and encounter an error. In the same way, a
corrupted index file could cause an error. Also, if you load an
index file from an unsupported database, an error might occur.
The third error is a hardware error. This can occur if
you have a bad sector on your hard drive, or some other hard
drive problem, or a computer memory error.
If the DB-Dupe program encounters an error from which it
cannot recover, it will show you the number of the error, and ask
you if you want to save current file parameters. The numbers the
program shows are equivalent to BASIC error numbers listed in the
BASIC manual which came which your copy of MS-DOS.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 26 ***
Appendix C
Using DB-Dupe with Microsoft Windows
Although DB-Dupe is not a "Windows program," you may find
it convenient to run DB-Dupe from Windows. If you run Windows in
386 enhanced mode, then DB-Dupe can run in a background window
while you do something else in the foreground. This is
especially handy when you are doing a long indexing job, or when
you are using DB-Dupe in automatic mode.
You can only run DB-Dupe in the background when you are
in 386 enhanced mode. If you are not running in enhanced mode,
then you are better off exiting Windows and running DB-Dupe
separately.
We provide a Windows .PIF file (DBDUP.PIF) and a Windows
Icon (DBDUP.ICO) for DB-Dupe on the distribution disk.
*** DB-Dupe 1.1, Copyright 1991, BC Enterprises. Page 27 ***
DB-Dupe 1.0
Order Form
Name ________________________________________________________
Company _____________________________________________________
Street Address ______________________________________________
City _______________________ State _______ Zip ___________
Date _____________________ Your P.O. # ____________________
I require 3.5 inch _______ 5.25 inch _______ diskettes.
Number of Copies of DB-Dupe (at $59 per copy) _______________
Total Cost _______________
Optional:
What database program are you using? ________________________
How would you rate DB-Dype on speed? ________________________
How would you rate DB-Dupe on ease-of-use? __________________
Suggestions? ________________________________________________
_____________________________________________________________
Note: Registering ONE copy of DB-Dupe allows you to use it on all
computers at one site (normally one building, or one group of
buildings owned by one owner, such as a college). If you plan to
use it at more than one site, then you must register multiple
copies.
In return for your registration, you will recieve the latest
version of DB-Dupe, along with a printed manual for every copy
purchased, as well as free or low-cost updates, as well as large
discounts on new Dbase-compatible software.
We do accept purchase orders from business, schools, and
government agencies.
Please mail this form, along with your check, to:
BC Enterprises
P.O. Box 18
Front Royal, VA 22630