WWC snapshot of http://www.nbs.gov/nbii/distributed.html taken on Mon May 29 0:13:07 1995
Network-based Analysis on the National Biological Information Infrastructure and other Distributed Data Systems
The NBII will be a network of many distributed data bases, information, and applications that users may discover, browse, and download over the Internet. To deal effectively with data -- in contrast to information -- the NBII and its sister NII data systems need technologies that will allow users to run applications and perform analyses right on the network, rather than downloading data sets to their own sites for conventional, standalone analyses. The National Biological Service is investigating the creation of technologies for performing network-based analyses and distributed computing on the National Biological Information Infrastructure (NBII), including other NII distributed data systems. The technologies would provide the following capabilities:

Network-based analysis
Distributed computing
Collaborative computing
Data interoperability
Metadata creation
Transaction lineage tracking (and documenting)
Use-profile generation
Capture of user feedback, including failures
Use of a virtual repository
Use of fee-processing procedures

Users would make use of these tools through predefined, menu- accessible applications available on the network. Examples of applications include waterfowl production models and associated wetlands habitat management (the suggested testbed for this project), air quality permitting, ecosystem risk assessment, gene flow in populations of endangered species, landscape evaluation, and grazing lands simulation. The applications may utilize GIS, remote sensing, visualization, statistical analyses, expert and rule-based systems, or other processes that yield a desired result. The universe of applications is limited only by their availability and potential to be put up on the network. Some may be as simple as recalling a map of given area; others may be the result of a complex interaction of many variables. In time users will have a menu of many applications from which to choose.

The processes may be run singly, in combination with one another, or in some complex iterative feedback stream, in which results generate subsequent results. A high-speed network will connect the sites, creating a virtual analysis space in which distributed tools process distributed data on distributed machines, returning desired results to users wherever they may be.

Although the needs of the NBII are the specific impetus for this initiative, the tools that will be developed will be extensible and applicable to data systems other than the NBII.

Use of distributed computing technologies will allow users discretion in the choice of computing environments for conducting their analyses. Costs, location, capabilities, and speed are characteristics that users may consider in choosing their (virtual or actual) compute platform. Dynamic services advertisement allows users to learn about and select from among various compute-service providers. Distributed computing enormously increases computational horsepower by partitioning and distributing computational tasks over many machines on the network. Thus, processes that might otherwise take hours or days may be done in minutes or seconds. The performance advantages of distributed computing can be all the more enhanced through the use of supercomputing capabilities, which will also be an option.

Data interoperability tools attempt to "normalize" data structures and formats for integrated analyses. Examples include interoperating with data from disparate GIS or relational databases. Metadata creators will automatically help create metadata elements for generated analytical products. These elements are essential for advertising new products to subsequent analysts and users.

Transaction lineage trackers keep records of analytical processes, so that users may know how results were generated. This is vital for an understanding how any given results were arrived at. It may also be used to keep a global record of the kinds of analyses that users are performing. The use- profile generator keeps records on and generates reports about the uses of the system. The virtual repository of generated products facilitates users' access to and benefiting from the results of others' analyses. Fee processing procedures will allow users to purchase and pay for products and services over the network.

Finally, a simple, intuitive, graphical interface will provide users with access to all of these capabilities and services. Users will be able to access, select, and apply tools of analysis to data right on the network. The interface will allow people easily and simply to marry data and tools on the network. Through simple points-and-clicks, users will be able to find data and identify analytical products desired (through the application of appropriate tools).

Testbed Description

The testbed for the project will be a waterfowl population and habitat management model for the Northern Great Plains. The model has been in use for nearly a decade by the U.S. Fish and Wildlife Service to predict populations of waterfowl and to help determine appropriate habitat management practices. The model will be migrated to the testbed, where it will form the basis for the creation, testing, and implementation of the capabilities listed above.

Potential Resource and User Nodes

The following resource nodes, which will provide data, information, computational tools, analysis tools, or staff, are suggested for this activity:

The two HAPET sites as well as the NPSC, where the actual waterfowl population research and management will take place, will serve as the initial user nodes. Users collocated at the other resource nodes may also participate. In the future, additional ecosystem management user nodes may be added to support "back-end" users.

Data Requirements

The following datasets or databases will support the project:

Wetlands data bases:

Consolidated wetland basins:

Models and Analytic Tools

The following models and analytic tools are currently in use by waterfowl researchers. They will form the basis for the distributed computing and analysis to be done under this project.

Possible Operational Scenario


http://www.nbs.gov/nbii/distributed.html
Last Updated 5/16/95