In article <1992Aug12.190713.26205@linus.mitre.org>, kalagher@mwunix.mitre.org (Dick Kalagher) writes:
|>
|> Organization: The MITRE Corporation
|>
|> I have a question related to response time of a centralized vs. a
|> distributed database. Suppose I have a database made up of one record
|> type that is fairly simple, consisting of 20 or so data elements. Suppose
|> there are X records, where X is a large number, lets say 10*6. Suppose
|> I am concerned about the response time. If I had all records stored in
|> one database on a single computer, would the response time be better than having the database distributed among 10-20 computers on a wide area network. Assume that network traffic is not a problem and that I am using a modern relational database such as |> Oracle
|> 7. I may also have thousands of users accessing the database.
|>
|> Any help, either theoretical or operational would be appreciated.
|>
|> Dick Kalagher
|>
|>
In this model, you have not specified the access behavior, ie. whether it will
be primarily used as a read-only database etc.
The other factor is that even though it has a homogeneous data structure, the data itself
may not be homogeneous. An example of this, is lets say you have a library
system, and users belong to different univ. departments. The chance of
a comp. sc student interested in fine arts may be dim - Hence you can
parition the data horizonatally. An index can be maintained with the book
category (fine arts, comp. sc) and the corresponding table.
In the distributed alternative, you say 10-20 computers on a WAN. Does that
imply that the users get to the centralized database over a a WAN? This is not
clearly stated in the model. If users are spread across geographically, then
a distributed solution would make sense in a read situation. In this scenerio,
data will be replicated in the different sites. However, a write is going
to be awfully expensive, since it will have to update all the site databases.
If the data is horizontally partitioned then by keeping the data close
to the access point should yield better results.
Another factor is response time itself. Not all users may need the same
thruput. I do not know if the users themselves can be categorized based