Technote 1180Sherlock’s Find By Content LibraryBy John MontbriandApple Worldwide Developer Technical Support |
CONTENTS
Overview Find By Content C Summary |
This Technote describes the Find by Content libraries used by Sherlock for searching the contents of files. The Find by Content libraries export a full suite of routines and functions allowing applications to perform content based searches of files. With MacOS 8.6, Text Extractor Plug-ins were introduced. These allow Find By Content to extract textual information from binary files for inclusion in index files. Text Extractor Plug-ins are documented in Technote TN1181, “Find by Content Text Extractor Plug-ins.” This Note is directed at application developers who wish to access the Find By Content library directly from their applications. |
Working with Search SessionsFBC allows client applications to open and close a “search session.” A search session contains all of the information about a search, including the list of matched files after the search is complete. Clients of FBC can obtain references to search sessions, modify them, and query their state using the routines defined in this section. References to search sessions are defined as an opaque pointer type owned by the FBC library.
Developers should only access the search session structure using the routines defined herein. This includes using the appropriate FBC routines for duplicating and disposing of search sessions. Search sessions are complex memory structures that contain pointers to other data that may need to be copied when a search session is duplicated or disposed of when a search session is deallocated. The normal sequence of actions one takes when using the FBC library is to create a search session, configure the search session to target specific volumes, perform the search, query the search results, and dispose of the search. Other possibilities for searches include the ability to reinitialize a search session and use it over again for another search, to provide backtracking by cloning search sessions and performing additional searches using the clones, or to limit search results to files found in particular directories. |
Setting up a Search SessionCreating a new session and preparing it for a search, as
shown in Listing 6, requires at least two calls to the FBC
library. In this example, a new search session is created
and it is configured to search all local volumes that
contain index files. The call to
|
/* SimpleSetUpSession allocates a new search session and returns a FBCSearchSession value in the *session parameter. if an error occurs, *session is left untouched. */ OSErr SimpleSetUpSession(FBCSearchSession* session) { OSErr err; FBCSearchSession newsession; /* set up our local variables */ err = noErr; newsession = NULL; if (session == NULL) return paramErr; /* create the new session */ err = FBCCreateSearchSession(&newsession); if (err != noErr) goto bail; /* search all available local volumes */ err = FBCAddAllVolumesToSession(newsession, false); if (err != noErr) goto bail; /* store our result and leave */ *session = newsession; return noErr; bail: if (newsession != NULL) FBCDestroySearchSession(newsession); return err; } |
Listing 6. Setting up a search session to search all local, indexed volumes. |
FBC provides a complete set of routines for developers wanting more control over what volumes will be searched by the search session. Listing 7 illustrates how a new search session could be configured to search a particular set of volumes. |
/* SetUpVolumeSession allocates a new search session and returns a FBCSearchSession value in the *session parameter. if vCount is not zero, then vRefNums points to an array of volume reference numbers for volumes that are to be searched. if any of the vRefNums refer to a volume without an index, paramErr is returned. */ OSErr SetUpVolumeSession (FBCSearchSession* session, UInt16 vCount, SInt16 *vRefNums) { OSErr err; UInt16 i; FBCSearchSession newsession; /* set up our local variables */ err = noErr; newsession = NULL; if (vCount == 0) return paramErr; if (session == NULL) return paramErr; if (vRefNums == NULL) return paramErr; /* create the new session */ err = FBCCreateSearchSession(&newsession); if (err != noErr) goto bail; /* search the volumes specified in vRefNums */ for (i=0; i<vCount; i++) { if (!FBCVolumeIsIndexed(vRefNums[i])) { err = paramErr; goto bail; } else { err = FBCAddVolumeToSession(newsession, vRefNums[i]); if (err != noErr) goto bail; } } /* store our result and leave */ *session = newsession; return noErr; bail: if (newsession != NULL) FBCDestroySearchSession(newsession); return err; } |
Listing 7. Setting up a session to search a particular set of volumes. |
In this example, the Once a search session has been configured to search a
number of volumes, it can be used again after a search has
been conducted without having to reconfigure its target
volumes. After performing a search and examining the
results, the search session can be prepared for another
search by calling the routine
Making a copy of a search session using the routine
|
Performing SearchesWhen FBC performs a search, it will generate a list of files that were matched. This list is referred to as the ”hits,” and it is stored inside of the search session. FBC can be asked to perform a content-based search using a query string containing a list of words, a similarity search based on one or more hits obtained in a previous search, or a similarity search based on a list of example files. Listing 8 illustrates how a query-based search can be performed. Here, the query is used to search for matching files on all local indexed volumes. |
OSErr SimpleFindByQuery (char *query, FBCSearchSession *session) { OSErr err; FBCSearchSession newsession; /* set up locals, check parameters... */ if (query[0] == 0) return paramErr; if (session == NULL) return paramErr; newsession = NULL; /* allocate a new search session */ err = SimpleSetUpSession(&newsession); if (err != noErr) goto bail; /* Here is the call that does the actual search, storing the results in the search session. */ err = FBCDoQuerySearch(newsession, query, NULL, 0, 100, 100); if (err != noErr) goto bail; /* save the results and return */ *session = newsession; return noErr; bail: if (newsession != NULL) FBCDestroySearchSession(newsession); return err; } |
Listing 8. A Query based search of all local, indexed volumes. |
Searches conducted using either the
routine All three of the search routines—
|
enum { kMaxVols = 20, maxHits = 10, maxHitTerms = 10 }; OSErr RestrictedFindByQuery (char *query, UInt16 dirCount, FSSpec* dirList, FBCSearchSession* session) { UInt16 vCount, i; SInt16 vRefNums[kMaxVols], normalVol; FBCSearchSession newsession; vCount = 0; newsession = NULL; if (dirList == NULL || dirCount == 0) return paramErr; if (query == NULL) return paramErr; if (*query == 0) return paramErr; if (session == NULL) return paramErr; /* collect all of the unique volume reference numbers from the list of FSSpecs provided in the parameters. */ for (i=0; i<dirCount; i++) { Boolean found; HParamBlockRec pb; /* ensure the vRefNum is a volume reference number */ pb.volumeParam.ioVRefNum = dirList[i].vRefNum; pb.volumeParam.ioNamePtr = NULL; pb.volumeParam.ioVolIndex = 0; if ((err = PBHGetVInfoSync(&pb)) != noErr) goto bail; normalVol = pb.volumeParam.ioVRefNum; /* make sure it’s not already in the list */ for (found = false, j=0; j<vCount; j++) if (vRefNums[j] == normalVol) { found = true; break; } /* add the volume to the list */ if (!found && vCount < kMaxVols) vRefNums[vCount++] = normalVol; } /* set up a session to use the volumes we found */ err = SetUpVolumeSession(&newsession, vCount, vRefNums); if (err != noErr) goto bail; /* Here is the call that does the actual search, storing the results in the search session. */ err = FBCDoQuerySearch(newsession, (char*)queryTxt, dirList, dirCount, maxHits, maxHitTerms); if (err != noErr) goto bail; /* save the result and return */ *session = newsession; return noErr; bail: if (newsession != NULL) FBCDestroySearchSession(newsession); return err; } |
Listing 9. Searching a particular set of directories. |
Here, volume reference numbers extracted from the array
of |
Retrieving Information from a Search SessionAfter a search is conducted using a search session, the
search session may contain information about one or more
matching files. Clients can access information about
individual hits including the file’s |
typedef OSErr (*HitProc) (FSSpec theDoc, float score, UInt32 nTerms, FBCWordList hitTerms); /* SampleHandleHits can be called after a search to enumerate the search results. For each search hit, the hitFileProc function parameter is called with information describing the target. */ OSErr SampleHandleHits (FBCSearchSession session, HitProc hitFileProc) { OSErr err; UInt32 hitCount, i; FSSpec targetDoc; float targetScore; FBCWordList targetTerms; UInt32 numTerms; /* set up locals, check parameters */ targetTerms = NULL; if (hitFileProc == NULL) return paramErr; if (session == NULL) return paramErr; /* count the number of hits in this session */ err = FBCGetHitCount(session, &hitCount); if (err != noErr) goto bail; /* iterate through the hits */ for (i = 0; i < hitCount; i++) { /* get the target document’s FSSpec */ err = FBCGetHitDocument(session, i, &targetDoc); if (err != noErr) goto bail; /* get the score for this document */ err = FBCGetHitScore(session, i, &targetScore); if (err != noErr) goto bail; /* get a list of the words matched in this document */ numTerms = maxHitTerms; err = FBCGetMatchedWords(session, i, &numTerms, &targetTerms); if (err != noErr) goto bail; /* call the call back routine provided as a parameter to do something with the information. */ err = hitFileProc(&targetDoc, score, numTerms, targetTerms); if (err != noErr) goto bail; /* clean up before moving to the next iteration. */ FBCDestroyWordList(targetTerms, numTerms); targetTerms = NULL; } return noErr; bail: if (targetTerms != NULL) FBCDestroyWordList(targetTerms, numTerms); return err; } |
Listing 10. Enumerating all of the files found in a search session. |
Find By Content ReferenceThis section provides a description of the CFM-based interfaces to the PowerPC FBC library. PowerPC applications using these routines link against the library named “Find By Content” (without the quotes). |
Data TypesFBC provides the following data types. Storage management for these types is provided by the FBC library. Clients should not attempt to allocate or deallocate these structures using calls to the Memory Manager.
Search sessions created by FBC are referenced through pointer variables of this type. The internal format of the data referred to by this pointer is internal to the FBC library. Clients should not attempt to access or modify this data directly.
An ordinary C string. This type is used when retrieving information about hits from a search session.
An array of
|
Allocation and Initialization of Search SessionsThe following routines can be used to allocate and dispose of search sessions. Storage occupied by search sessions is owned by the FBC library, and these are the only routines that should be used to allocate, copy, and dispose of search sessions.
|
Configuring Search SessionsSearch sessions can be configured to limit searches to a particular set of volumes. These routines allow clients access to the set of volumes that will be searched by FBC.
|
Executing a SearchFBC provides three different routines for conducting searches that are described in this section.
|
Getting Information About HitsOnce a search is complete, a search session will contain a list of hits that were found during the search. The routines described in this section allow clients to access information about hits stored in a search session. Hit records are indexed 0 through count-1.
|
Summarizing TextThis call produces a summary containing the “most relevant” sentences found in the input text.
|
Getting Information About VolumesFBC provides the following utility routines for accessing information about volumes.
|
FBCVolumeIndexPhysicalSize
returns the size
of the volume’s index file in bytes.
Indexing Volumes, Folders, and FilesA new API has been added to Find By Content allowing for the
immediate indexing of new or altered files. The
new routine is declared as follows:
|
Reserving Heap SpaceClients of FBC can reserve space in their heap zone for their callback routine before conducting a search.
|
Application-Defined RoutineClients can provide a routine that will be called periodically during searches. This routine will provide clients with both information about the status of a search, and opportunity to cancel a search before it is complete. Call back routines are defined as follows:
To avoid locking up the system while a search is in
progress, the callback should either directly or indirectly
call An ongoing search will be canceled if the call back
function returns
|
Find By Content C SummaryConstants
Data Types
Allocation and Initialization of Search Sessions
Configuring Search Sessions
Executing a Search
Getting Information About Hits
Summarizing Text
Getting Information About Volumes
Indexing files, folders, and volumes
Reserving Heap Space
Application-Defined Routine
|
AcknowledgmentsThanks to David Casseres, Pete Gontier, Tim Holmes, Ingrid Kelly, Michael J. Kobb, Eric Koebler, Alice Li, and Wayne Loofbourrow. |