home *** CD-ROM | disk | FTP | other *** search
-
-
- (**************** SplitBase Data Management Systems ****************
- * *
- * Copyright (c) 2001 Leon O. Romain *
- * *
- * leon@kafou.com *
- * *
- *******************************************************************)
-
- HISTORY
- I conceived SplitBase about 8 years ago as an alternative to the
- available Database Management Programs and also to alleviate some of
- the burden imposed on programmers by Borland's BDE. During that time,
- I have been involved on projects that either did not require database
- programming or those for which the clients requested specific
- systems. I had also hoped that by now the public domain and/or the
- open-source communities would have been flooded with either good or
- excellent alternatives to both the BDE and commercial databases.
- Although there have been many outstanding implementations in that
- direction, none of them had addressed all the issues I am, to this
- day, still concerned with. Now, being faced with the prospect of many
- data management projects in my immediate future, I decided to finally
- start working on my ideas, and that is how I came up with SplitBase.
-
- THE CONCEPTS AND PHILOSOPHY
- Simplicity, efficiency and speed are the guiding principles behind
- SplitBase, and the reasons I prefer to program in Delphi. Delphi
- allows the creation of powerful applications with all their
- functionality confined in a single small executable file. In many
- other popular programming environments, a "Hello World" program
- necessitates a complex multi megabytes sized installation program in
- order to run in a user's computer. Dynamic link libraries are a
- wonderful concept that ran out of control. They often create serious
- conflict or even worse, overwriting them sometimes may cause other
- programs to crash or behave erratically. I wanted to develop a
- structure that was simple enough for average programmers to
- comprehend and modify easily according to their needs. That structure
- should be efficient, fast and also powerful enough to be used in real
- world situations. By efficient I mean it should be able to manage
- substantial amount of data (at least one million records) safely and
- accurately. Fast refers to the ability to execute available
- transactions instantaneously (one fifth of a second or less).
- Efficiency and speed should not rely on the power of today's
- processors and other sort of equipment, but rather should be achieved
- on minimal machines such as the original IBM PC XT and AT with all
- their relative limitations. I believe that with SplitBase I have
- achieved my goals and I humbly present it to your judgment while
- hopping that it might be useful to other programmers and that they
- may improve upon the quality and functionality of SplitBase according
- to the preceding guidelines.
-
- DESIGN
- In its current incarnation, SplitBase is a multilevel indexed file
- structure. Simply put, it is composed of abstract containers of index
- data or keys as they are more commonly referred to. These keys belong
- to a structure that attaches them to a simple pointer that relates
- them to the corresponding records or other lower level containers.
- When a container reaches its full capacity, it splits into two half
- filled containers. It is very similar to the B-Tree structure used in
- many Database indexes, but it is simpler to manage and store. When
- full, the top-level container, that never splits, indicates that the
- Database has reached its maximum capacity and cannot accommodate any
- more records.
-
- In its present phase, SplitBase consists of two levels of key
- containers: a top level with a single container and a secondary level
- with multiple containers. The top level contains a duplicate of the
- first key of each secondary container. Attached to that key is a
- pointer to the physical position in the database file of the related
- secondary level container. When a secondary container splits or when
- its first datum changed, that information is updated in the top
- container. Secondary-container keys are attached to pointers to the
- location of the actual records within the file. The top-level
- container always resides in random access memory (RAM) when a
- database file is active. That is whenever it is opened or created.
- This reduces the number of disk access to only two to locate and
- retrieve any record. To write new records, disk access is also
- limited to an average of two or a maximum of three if the top-level
- container must be updated.
-
- At this point, it is important to note that the maximum number of
- records that can be entered in a two level container system as
- described above, is equal to the square of the full capacity of a
- container. In the worst case scenario, when all secondary-level
- containers are only half full and the top-level container reaches
- full capacity, that number is reduced by half, thereby fixing the
- guaranteed minimum number of records that can be entered in the
- SplitBase system. New data such as containers generated by a split
- and all new records are always written to the end of the database
- file and a pointer to their position within that file is attached to
- the corresponding key. A SplitBase file is therefore a moderately
- complex linked list of containers and records. In its original state
- when the file is created, it only contains an empty top container and
- a header that provides the manager with information about the file
- structure such as length of records and of fields or the number of
- deleted records.
-
- For the sake of simplicity, I have decided to save all fields in a
- record as strings that can be easily converted, in most programming
- languages, to other data type such as numbers, currency, date etc.
- Data within the containers are automatically sorted in ascending
- order with a simple insert sort algorithm. This allows for extremely
- fast binary search of data in RAM. For every search and insertion
- request, the manager looks up the top container to determine which
- secondary container holds the position for that specific key. It then
- uses the corresponding pointer to locate and load that container into
- RAM. New data are then inserted into their correct position and found
- data provide the pointers to their actual records for retrieval. The
- whole system is then updated if necessary with very minimal disk
- access.
-
- The decisive factor in determining container capacity has been
- available memory in our targeted minimal computer, the XT and AT
- version of the IBM PC. Both machines have been plagued by the 64k
- block limit, which was chosen as the maximum physical size of a
- container. On the other hand since only two containers are present in
- memory during data manipulation this will allocate for more than
- enough room in the limited memory of these machines (640k max.) for
- the records, the database manager, the operating system and other
- programs. In our original model, we propose a maximum key length of
- 26 bytes, which is appropriate for the majority of real life
- situations where keys are usually simple and short such as a social
- security number, a last name, a phone number, a zip code etc. This
- choice provides a capacity of 2000 key/pointer combinations yielding
- to databases of a maximum of 4 million records and a minimum of 2
- million records if data was entered sequentially resulting in all
- half full split indexes by the time the top container reaches its
- full capacity.
-
- If you did not understand anything in what was said in this section,
- you should not be programming databases anyway. However the functions
- in SplitBase are few and much simpler than with other databases;
- hence you should be able to use it without too much difficulty.
-
- IMPLEMENTATION
- SplitBase is built around a small set of Boolean functions. It is
- composed of 29 structurally identical functions that return true when
- the function succeeds and false otherwise. Only 20 of those functions
- are necessary for a programmer to know in order to use SplitBase.
- Many of these routines are not structural but complementary and were
- implemented only for the convenience of the programmer and the
- integrity of the system. All the others are mainly basic but
- optimized file manipulation routines. All functions are written in
- simple, object free turbo Pascal syntax with very few exceptions.
- This simplicity makes it extremely easy to port SplitBase to other
- programming environments and early versions of turbo Pascal. It also
- renders the system compatible with current and future releases of
- Delphi with practically no modifications. It could even be adapted as
- easily to run in the original Pascal language but I would not
- recommend anyone to do so simply because Wirth's Pascal did not allow
- the programmer to access files and memory directly. This would result
- in a very slow system and would definitely not meet our previously
- stated standards of speed and efficiency.
-
- In this current version, SplitBase is released in two formats. An
- include file (splinc.pas) to be included in Turbo Pascal or Delphi
- programs, and a component (Tspl) that can be added to practically all
- current releases of Delphi and most likely all future implementations
- unless they contain some very dramatic structural modification in
- their basic syntax or functionality.
-
- INSTALLATION
- For the include-file version, just add the compiler directive {$I
- Splinc.pas} at the top of your Turbo Pascal programs or at the
- beginning of the implementation section of your Delphi projects. For
- the component version, install the file spl.pas as a new component
- according to the methods available in your version of Delphi. You can
- install it in an existing package or a new one preferably Tspl. At
- any rate, if you follow the right steps you should end up with a
- color logo of SplitBase in the "Samples" palette of your Delphi
- environment. Drag and drop that logo into your forms to use the Spl
- component that will provide you with all the functionality of
- SplitBase. The file Splinc.pas is located in the "\Splbase\Include"
- sub directory of the original release of SplitBase. The component
- Spl.pas is located in the "\Splbase\Class" sub directory of the same
- release of SplitBase.
-
- PROGRAMMING
- SplitBase is very simple to program. The following is a brief
- description of the necessary functions as well as basic principles
- that should be observed to safely and easily implement these
- routines. It is the programmer's responsibility to allocate memory
- space for the system by using InitSpl and ReleaseSpl at the beginning
- and end of your program. You must also initialize system variables by
- using InitBase before creating or opening any database file. You have
- to ensure that your function calls are successful before proceeding
- with any database session. This can simply be achieved by checking
- the return value of all your calls. By following the preceding
- principles you should enjoy a trouble free experience while
- programming and using SplitBase. Here are the main functions:
-
- InitSpl: Initializes the Splitbase system by allocating memory to
- accommodate the top and secondary system containers. Initspl must be
- called at the beginning of your programs. If it fails, all other
- calls to the system must be cancelled.
-
- ReleaseSpl: Releases allocated memory back to the operating system.
- This function must be called at the end of your programs. Failure to
- do so will prevent other programs to use that portion of memory
- reserved for SplitBase.
-
- InitBase: Initializes important variables used by Splitbase. Initbase
- must be called before creating or opening SplitBase files. Failure to
- do so may generate unpredictable errors or otherwise compromise the
- stability of the system.
-
- SetSpl: This is a very important function that you must use to tell
- SplitBase the size of each field in a new SplitBase file. It must be
- called before creating a new database and uses a string as parameter.
- This string contains the size of all fields that define the record
- structure. Each field size is described with three characters
- starting with the first field in the record. Fields which sizes are
- less than 100 should be filled with zeroes on the left of its
- allocated portion. For example, a field of length 3 followed by a
- field of length 12 should be written as follows: '003012'.
-
- CreateSpl: This function receives a string parameter to create a new
- SplitBase file. The parameter is the name of the file to be created
- without any extension. The system will add the proper extension to
- it, currently (.spd), and create the file if possible. The functions
- InitBase and SetSpl must be called to set the necessary variables
- prior to calling CreateSpl.
-
- OpenSpl: This function requires a string parameter to open a
- SplitBase file. The parameter is the name of the file to be opened
- without any extension. The system will add the proper extension to
- the name, currently (.spd), and open the file if possible. A call to
- the InitBase function must be made to set all the necessary variables
- prior to calling OpenSpl.
-
- AddField: This function uses two parameters, a string and an integer,
- to insert a field into the record structure of the SplitBase system.
- The string contains the value of the field and the integer its index.
- All fields must be inserted before saving a record into the file.
-
- GetField: This function uses an integer parameter to retrieve a field
- from the record structure of SplitBase. The parameter contains the
- index of the field. A record must be retrieved from the data file
- before using this function. Getfield must be called once for each
- field in the current record. If the function fails, SplitBase returns
- an empty string.
-
- AddRec: This function saves a record into the SplitBase file. It
- receives the key attached to that record as a string parameter. All
- fields must have been inserted into the record structure prior to
- calling AddRec.
-
- GetRec: This function retrieves a record from the data file. It
- receives the key attached to that record as a string parameter. If
- the record is found, all fields are loaded into the record structure
- of SplitBase.
-
- DelRec: This function deletes the current record from the data file.
- A record must be active in memory in order to use DelRec. DelRec has
- no parameters.
-
- ModRec: This function updates the current record in the data file. It
- actually deletes that record from the file then adds the modified
- record immediately. It uses a string parameter as key.
-
- FirstRec: Locates and retrieves the first indexed record.
-
- LastRec: Locates and retrieves the last indexed record.
-
- NextRec: Locates and retrieves the next indexed record.
-
- PrevRec: Locates and retrieves the previous indexed record.
-
- RecCount: Returns the total number of records in the data file.
-
- ActiveDb: Returns true if a data file is active or opened.
-
- ActiveRec: Returns true if a record is active or loaded.
-
- DBEmpty: Returns true if the current data file is empty.
-
- Many variables are important in the use of SplitBase. Most of them
- should never be manipulated directly by the programmer. However,
- there are a few such as Splerr that the programmer should check
- regularly. Splerr returns the last error number and description.
- These values are returned with the following calls: splerr.recnum and
- splerr.recstr. In the component version the properties ErrorNumber
- and ErrorString must be accessed in lieu of splerr. The variable
- CurDtb or its equivalent CurrentDB contains the name of the current
- data file. The allrec variable contains a lot of valuable information
- such as the number of fields in the current SplitBase record. You may
- read it with allrec.size or with FieldCount from the SplitBase
- component. Finally limrec is a number you can set to limit the number
- of record that may be entered into the data file. For the component,
- use Reclimit instead.
-
- RELATED PROGRAMS AND TESTS
- The original release of SplitBase comes with two test programs Split
- and Split2. These two programs are identical in looks and
- functionality except that the former uses the include file and the
- other the component. These programs allow a user to test the major
- functions of SplitBase. They contain buttons to create new data file
- and open existing ones. Other buttons also allocate for adding new
- records as well as finding and deleting existing data within a
- splitbase file. They also provide navigation buttons to locate the
- first, last, next and previous records. Finally a generate button
- will automatically insert 1 million records into a specific data
- file. That file is made of a two fields record and is called
- test.spd. It is automatically created and opened if you click the
- related buttons in the program. Generate will automatically add 1
- million even numbers from 2 to 2 million to that file. For fun and to
- simulate data entry overhead, generate will convert all those numbers
- into their English spelling and put the actual number in the first
- field and its spelling in the second before saving the record in the
- data file. These two fields are also output in two edit fields for
- added overhead. An algorithm is used that forces SplitBase to update
- the top container every four records that in real life applications
- should only happen on average every thousand records. Test on a 667
- megahertz custom built computer equipped with a 5400 RPM Ultra ATA/66
- hard drive yield a result of 139 records per second at completion of
- generate. The function can be stopped at any time by clicking on the
- 'Stop' button.
-
- FUTURE EXPLORATIONS
- Where do we go from here? Well from the basic functionality of the
- system many routes may be considered. One of the more obvious may be
- to increase the number of records that the system can handle. The
- easiest way to do this might be to hold the keys in the top and
- secondary containers into an ANSI string just like 'rechld' that
- holds the fields in any given record. The two-gigabyte theoretical
- limit claimed by Borland is well beyond anything we might need in any
- real world situations.
-
- The LongInt and Int64 Problem
- The only other hurdle is the 2-gigabyte limit of the longint type
- that is used to determine file sizes and to locate records and
- containers within the data file. This problem may be addressed by
- using a blocking factor greater than 1 for creating and opening data
- files. Two hundred fifty-six (256) and one thousand twenty four
- (1024) may be good values to experiment with. The only problem might
- be in synchronizing container and record size with the chosen
- blocking factor. It would be easy to add fillers like in the good old
- days when Cobol was king. This will certainly add to the waste in a
- data file. But who cares in times when 80-gigabytes hard drives cost
- less than 200 dollars in most major computer stores. An easier way
- would be to use the new int64 type that appeared with version 4 of
- Delphi. But Borland is mute about using it with the reset, rewrite
- and seek procedures. Int64 can hold up an integer the size of 2 to
- the power 63.
-
- What about garbage collection? Much like in other databases, deleted
- records and containers are not actually erased from the file. They
- just sit there useless with no actual pointers relating to them.
- Heavy usage of delete transactions tends to increase considerably the
- amount of garbage within the data file. A garbage collection or
- rebuild routine can easily be implemented by using a new data file
- and add all the live records from the previous file to it.
-
- Multiple indexes data files may be more easily implemented by saving
- the indexes and the records in different files. It will actually
- allocate for more records if longint is used. This is also the case
- if one would like to emulate the functionality of relational
- databases. The straightforward structure of the data file without the
- containers will make it easy to emulate most general functions of a
- relational database system. In the case of a multi-user environment
- and client/server implementation, a separate manager should be
- written to handle calls from the different users. The most important
- issue in multi-user system is probably the ability to lock records.
- The manager can easily implement this by adding an extra field to
- records in data files that holds the status of that record. A simple
- protocol should also be implemented between the manager and the
- client programs to properly handle transactions.
-
- These were a few suggestions for programmers to modify the SplitBase
- system to easily achieve their goals. However, we believe that
- SplitBase is very useful as is and can be adapted to solve a great
- deal of real life problems. Particularly in the retail industry
- Splitbase may be easily and effectively used without modification to
- help manage most problems such as stock, accounting, customers etc...
- How many times did the clerks used complex data analysis to locate
- your record or a specific item in the store? They usually type a
- single key string in order to locate the needed information. Most
- stores even the major chains hold less than 2 million items (I know I
- usually go around and count them :-). Many cities in America and
- around the world have less than two millions citizens, most libraries
- hold less than two million books and the list can go on and on about
- real life situation in which SplitBase may be used without
- modifications.
-
- If you come up with important additions to SplitBase you are free to
- publish them along with the original distribution of SplitBase with
- all files present and unchanged. I would recommend that you follow
- the guidelines stated at the beginning of this file. However, if you
- have a REALLY important addition that uses some arcane procedures or
- an extremely complex routine, well go ahead, publish it with the
- restrictions stated above as far as keeping original files intact.
-
- COPYRIGHT AND LICENSES
- SplitBase is not based on any previous work other than basic computer
- Science and Data Structure principles. It is not a public domain
- material but a copyrighted publication of Leon O. Romain. However, it
- is freeware and is being released under the GNU license agreement.
- See the file licence.txt for more information. You may use the
- SplitBase system as you see fit in your personal, professional or
- commercial programs without paying any loyalty to the author
- providing you relieve him of any and all liabilities that may result
- from the use of SplitBase. In fact SplitBase is released as is. The
- author decline all responsibility and liability from damages either
- direct or incidental that may result from the use of SplitBase and
- its accompanying routines and other software.
-
- KNOWN BUGS
- There are no known bugs. If any are found, they should be easily
- corrected due to the simplicity of SplitBase and its modular
- structure.
-
- ADDENDUM
- I checked the int64 type and it worked perfectly with the reset and
- seek functions on files much bigger than 4 gigabytes. This yields the
- possibility of creating SplitBase systems capable of accessing
- billions of records with very little modification to the original
- program when using Delphi version 4 or later. By changing the type of
- the necessary variables from longint to int64 and by modifying the
- index field of the splitbox record to work as an ansistring instead
- of an array is all that it takes for SplitBase to access those
- billions of records. I also added an activex version of SplitBase to
- the original release. It works fine except for the fact that the icon
- does not appear on the form when selected. The necessary files are
- located in the activex subdirectory and the file SplBaseXControl1.ocx
- must be registered before using the control. This opens the door for
- visual basic, c++ and other programming environments to use SplitBase
- before the advent of native implementations of the system. OCX
- controls may be registered using the Regsvr32.exe utility usually
- located in the windows or system directories.
-
- COMMUNICATION
- For any comments, suggestions, corrections, criticism, bug reports
- and other communications please Email me at leon@kafou.com.
-
- SplitBase and the SplitBase logo are trademarks of Leon O. Romain.
-
- The SplitBase Data Management Systems and this user guide are
- copyrighted materials of Leon O. Romain.
-
- Copyright (c) 2001 Leon O. Romain.
-
-
-
-
- / \
-
- / \
- S
- | \ / |
-
- | \ / |
- P L
- / \ | / \
-
- / \ | / \
- I T
- | \ / | \ / |
-
- | \ / | \ / |
- B A S E
- \ | / \ | /
-
- \ | / \ | /
-
- DATA MAMAGEMENT SYSTEMS
- -----------------------
-