Developer --> Technical Publications
PATH  Mac OS X Server Documentation > Mac OS X Server Release Notes


DOCUMENT ROUTING

In addition to searching for documents, you can use the IAT to route documents into particular categories. For example, say you have a collection of unsorted documents related to transportation that you want to divide into three categories: planes, trains, and automobiles. After setting up the various categories, you can use the IAT to compare documents from the unsorted collection with examples for each category. For each category you must have specified one or more documents as examples of what it should contain; that is, you would "prime" the trains category with documents heavily-related to trains, and so on. The category that provides the best fit (the highest ranked score) is the most appropriate place to put the unsorted document.

This chapter describes how to use the IAT to define document categories and route documents among those categories.

Clusters

A category for related documents is called a cluster . Clusters are represented by the IACluster class, which must be subclassed to handle particular document types. For example, the IAT provides the subclass HFSCluster , which represents a cluster of HFS documents (that is, Mac OS files). When subclassing IACluster , you must override the GetNextDoc method, which returns the next document in the cluster, and the Reset method, which resets the iterator.

For a given cluster, you must provide one or more example documents, which are used to establish weighting criteria when comparing the cluster to an unsorted document. For an HFSCluster, these documents are contained in the folder associated with the cluster.

The Document Router

After establishing your clusters, the IAT router , defined in the class IARouter , lets you identify the cluster that offers the best fit for an unsorted document.

The router does not copy or move the unsorted document; it only identifies the cluster to which the document belongs. Any moving or copying actions must be done by your application.

When seeking the best fit for a document, you can add the weighting (that is, the normalized TWVector) of that document to the weighting of the appropriate cluster, if desired. That is, each additional document that fits in the cluster helps define what should be there.

See Sorting documents using the router shows an example that displays the best fit cluster for each of a number of unsorted documents.

Sorting documents using the router
// enter the name of folder containing the items to route
 
	StringPtr unSortedItems = "\pMyDisk:Folders:Disorganized Stuff";
 
// Enter name of the index
	StringPtr singleIndexName = "\pMyDisk:Folders:test.index";
 
// Enter the names of the router folders
	StringPtr clusterFolders[] = {
		"\pMyDisk:Folders:Planes",
		"\pMyDisk:Folders:Trains",
		"\pMyDisk:Folders:Automobiles",
		"\p" // empty string to mark end
		};
 
 
void DemoRouting() 
{
	FSSpec fsSpec;
	char str[256];
 
// Create/initialize our index
	(void)FSMakeFSSpec(0, 0, singleIndexName, &fsSpec);
 		
	IAStorage* myStorage = MakeHFSStorage(fsSpec.vRefNum, fsSpec.parID,
								fsSpec.name);
	myStorage->Initialize();
 
	HFSCorpus myCorpus = new HFSCorpus;
	IAAnalysis* myAnalysis = new SimpleAnalysis();
 
	VectorIndex* myIndex = new VectorIndex(myStorage,myCorpus,
								myAnalysis);
	myIndex->Initialize();
 
 
// Setup clusters
	uint32 clusterCount = 0;
	for (clusterCount = 0; clusterFolders[clusterCount][0] != 0;
			clusterCount++ ) {}
 
	HFSCluster** folders = new HFSCluster*[clusterCount];
 
	for (uint32 i  = 0; i < clusterCount; i++ ) {
		folders[i] = new HFSCluster(myIndex, clusterFolders[i]);
		}
 
 
// Instantiate a router and initialize with the corpuses representing
// our clusters.
	IARouter myRouter (myIndex);
	myRouter.InitializeClusters((IACluster**)folders, clusterCount);
 
	AddItemsToIndex(unSortedItems, myIndex);
	myIndex->Flush();
 
	HFSTextFolderCorpus* source = new HFSTextFolderCorpus(unSortedItems);
 
	IADocIterator* docs = source->GetDocIterator();
	IADoc* doc = docs->GetNextDoc();
 
 
// Now loop through each unsorted document and find the best cluster
	while (doc) {
		uint32 clusterIndex = myRouter.WhichCluster(doc, false);
		printf ("%s belongs in cluster %d\n",
			PToCStr(((HFSDoc*)doc)->GetFileName(), str), ++clusterIndex);
		delete doc;
		doc = docs->GetNextDoc();
		}
 
 
// Cleanup
	delete docs;
	delete source;
 
	myRouter.Store();
	myIndex->Flush();
	myStorage->Commit();
 	
	delete myIndex;
	delete myStorage;
 
	for (uint32 i  = 0; i < clusterCount; i++ ) {
		delete folders[i];
		}
	delete [] folders;
	}
// End DemoRouting
 
 
// This method is called by the DemoRouting method. 
void AddItemsToIndex(StringPtr folderPathName, VectorIndex* inIndex)
{		
	FSSpec myFsSpec;
	OSErr err = FSMakeFSSpec(0, 0, folderPathName, &myFsSpec);
	IAAssertion(err == noErr, "Can't get folder", IAAssertionFailure);	
 
	HFSIterator folderIterator(fsSpec.vRefNum,FSSpecToDirID(&myFsSpec));
 
	IATry {
		while (folderIterator.Increment()) {
			CInfoPBRec* pb = folderIterator.GetPBRec();
 
			HFSDoc* doc = new HFSDoc((HFSCorpus*)inIndex->GetCorpus(), 
				pb->hFileInfo.ioVRefNum, pb->hFileInfo.ioFlParID,
				pb->hFileInfo.ioNamePtr);
			inIndex->AddDoc(doc);
			}
		}
	IACatch (const IAException& exception) {
		printf("%s, %s\n", exception.What(), exception.GetLocation());
		}
	}
// End AddItemsToIndex

In this example, after creating the index, the corpus, and specifying the type of analysis, the DemoRouting method sets up clusters based on what was defined in clusterFolders[] . Each folder in the array should contain example documents defining the type of document that should belong to the cluster. Documents to be routed should be in the unSortedItems folder.

After initializing the clusters by calling the InitializeClusters method, the router ( myRouter ) then simply cycles through the corpus representing the contents of unSortedItems and calls the WhichCluster method for each document. If you set the second parameter in WhichCluster to true, the weighting of the document to be routed is added to the appropriate cluster when a match is made.

In this example, after the DemoRouting method routs all the documents in unSortedItems , it calls the Store method before removing all instantiated objects. Doing so saves the cluster information and weightings so you can retrieve them at some later time. If you specified that the cluster accumulate weightings as documents were routed, the saved settings will reflect the additional weightings. If you want to rout additional documents later using the stored cluster settings, you call the Restore method instead of instantiating clusters and calling InitializeClusters .

The sections that follow describe the classes and methods used for routing documents using the IAT.

The IARouter Class

Ancestors None.

Subclasses None.

Header file IARouter.h

Description

The following methods allow you to use the IAT to sort documents into arbitrary categories. Note that the IAT only specifies which category to put a document in; your application must copy or move the document based on the categorization.

Public Methods

IARouter

Constructor for this class.

IARouter (
				VectorIndex* index,
				TProgressFn* progressFn, 
				clock_t progressFreq, 
				void* appData);

index   A pointer to the vector index.

progressFn   A pointer to an application-defined progress function. If not NULL , the IAT calls this function periodically to give the client application control.

progressFreq   The wait time between callbacks, in clock ticks (using the ANSI clocks_per_sec standard).

appData   A pointer to application-specific data that is passed to the client application when the callback occurs.

~IARouter

Destructor for this class.

virtual ~IARouter();

InitializeClusters

Specifies clusters to use in the routing.

void InitializeClusters (
				IACluster** clusters,
				uint32 howManyClusters);

clusters   A pointer to an array of IACluster pointers specifying the clusters to use.

howManyClusters  The number of IACluster pointers in the array.

DISCUSSION

You call InitializeClusters when you begin routing using a new set of clusters. If you want to route documents using an older saved set of clusters, you should call Restore instead.

WhichCluster

Specifies the cluster to which a document belongs.

uint32 WhichCluster (
				IADoc* doc, 
				bool accumulate);

doc   A pointer to a document to be routed.

accumulate   A Boolean value. True adds the normalized weighting of the specified document (that is, its TWVector) to the weighting of the cluster. The default is false.

method result An value specifying the index of the cluster to which the document belongs.

DISCUSSION

The WhichCluster method does not move or copy the document to the indicated cluster. If you want to move the document, your application must do so itself.

Store

Stores the router settings.

void Store (
				IAStorage* storage,
				IABlockID block) const;

storage   A pointer to a IAStorage* object. The default (obtained by passing NULL ) is the storage instance that contains the index used by the router.

block  The ID of the block in which you want to store the router settings. The default block ID is 0.

Restore

Restores saved router settings.

void Restore (
				IAStorage* storage,
				IABlockID block);

storage   A pointer to a IAStorage object containing the router information. If you specify NULL here, the IAT attempts to restore the setting from the storage instance used by the index associated with the router.

block   The ID of the block containing the router settings you want to retrieve. The default block ID is 0.

StoreSize

Stores the size of the current router.

IABlockSize StoreSize () const;

method result The size of the router.

GetProgressFn

Returns the pointer to the application-defined progress function.

TProgressFn* GetProgressFn () const;

method result A pointer to the application-defined function. The IAT calls back to this function to allow the client time to do other things, if desired.

GetProgressData

Returns the progress function data.

void* GetProgressData () const;

method result A pointer to the data passed to the application at the time of the progress function callback. You specify the location of this data when the IARouter constructor is called.

GetProgressFreq

Returns the time between calls to the progress function callback.

clock_t GetProgressFreq () const;

method result The wait time between callbacks, in clock ticks (using the ANSI clocks_per_sec standard).

Protected Methods

BestCluster

Returns the best cluster for a given TWVector.

uint32 BestCluster (TWVector *vector) const;

vector   A pointer to a TWVector object.

method result The index of the cluster to which the TWVector best fits.

DISCUSSION

If you want to subclass IARouter and implement your own best cluster algorithm, you may want to override this method.

ClearAccumulator

Clears the accumulator associated with the router.

void ClearAccumulator (void);

AddDocVectorToAccumulator

Adds a document's TWVector to the accumulator.

void AddDocVectorToAccumulator (TWVector* newDocVector);

newDocVector   A pointer to the TWVector object representing the document.

SEE ALSO

The See AccumulateDocVector method See AccumulateDocVector .

AccumulateDocVector

Adds the TWVector representing a document to the accumulator.

void AccumulateDocVector(IADoc* doc);

doc   A pointer to a document.

SEE ALSO

The See AddDocVectorToAccumulator method See AddDocVectorToAccumulator .

AddToAccumulator

Adds the specified TWVector to the weighting of a given cluster.

void AddToAccumulator (
				uint32 cluster,
				TWVector *docVector);

cluster   The cluster to whose weighting you want to add the TWVector.

docVector   A pointer to the TWVector for a given document.

DISCUSSION

This method adds a TWVector to the weighting of a particular cluster, not the accumulator that is generated during the WhichCluster call.

The IACluster Class

Ancestors None.

Subclass HFSCluster

Header file IARouter.h

Description

This abstract class represents a cluster of documents. You must subclass this class to represent clusters of an actual document format. For example, the HFSCluster class is a subclass of IACluster that you use to represent clusters of HFS format documents (that is, Mac OS files).

Public Methods

IACluster

Constructor for this class.

IACluster (IAIndex* index);

index   The index that contains this cluster.

~IACluster

Destructor for this class.

virtual ~IACluster ();

GetNextDoc

Gets the next document in the cluster.

virtual IADoc* GetNextDoc () const;

method result A pointer to the next document in the cluster.

DISCUSSION

The type of document returned depends on the IACluster subclass. For example, the HFSCluster subclass of IACluster returns documents of type HFSDoc .

Reset

Resets the iterator.

virtual void Reset ();

DISCUSSION

After reset, the See GetNextDoc function See GetNextDoc begins with the first document in the cluster.

Protected Method

GetCorpus

Retrieves the corpus in the index associated with the cluster.

IACorpus* GetCorpus () const;

method result A pointer to the corpus.

The HFSCluster Class

Ancestor IACluster

Subclasses None.

Header file HFSCluster.h

Description

The HFSCluster class is a subclass of IACluster that handles clusters of HFS documents (that is, Mac OS files).

Public Methods

HFSCluster

Constructor for this class.

HFSCluster (
				IAIndex* index
				StringPtr clusterName);

index   The index to contain this cluster.

clusterName   A pointer to the pathname of the folder containing document examples for the cluster.

~HFSCluster

Destructor for this class.

virtual ~HFSCluster ();

GetNextDoc

Gets the next HFS document in the cluster.

IADoc* GetNextDoc () const;

method result A pointer to the next HFS document in the cluster.

DISCUSSION

The HFSCluster subclass of IACluster returns documents of type HFSDoc .

Reset

Resets the iterator.

void Reset ();

DISCUSSION

After reset, the See GetNextDoc function See GetNextDoc begins with the first document in the cluster.