CONTAINS Predicate

The CONTAINS predicate provides several text matching options for building queries. Typically, you use this predicate to query the Contents property - the text contents of a file. You can also query other textual properties with content, such as DocTitle, DocSubject, or a user-defined property.

Syntax

CONTAINS ( [Column_Reference, ] ' Content_Search_Conditions ') > 0

Column_Reference   

Specifies the column alias. If you do not specify a column reference, the default is the Contents property. For example, the following two clauses are equivalent:

WHERE CONTAINS (Contents, 'European') > 0
WHERE CONTAINS ('European') > 0
  
Content_Search_Conditions  

Specifies one or more search terms. When using more than one search term, connect the terms with AND, AND NOT or OR operators. When specifying boolean operators, both the text ( AND , OR , NOT ) and symbolic (&, |, ! ) forms are acceptable.

Note: Always place spaces around the text boolean operators.

When the content search condition consists of statements enclosed in parentheses, expressions in parentheses are evaluated first. After the parenthetical expressions are evaluated, the following rules apply:

  • NOT is evaluated before AND.
  • NOT can only occur after AND (as in AND NOT; the combination OR NOT is not allowed).
  • AND is evaluated before OR.
  • AND expressions are associative and can be applied in any order. For example, A AND B AND C, is the same as (A AND B) AND C, which is the same as A AND (B AND C).
  • OR expressions are associative and can be applied in any order.

Note: It is illegal to place NOT before content query predicates (CONTAINS and FREETEXT).

Note: The "> 0" part of the expression is required. It causes the expression to evaluate to a Boolean result.

Quotes in Basic String Literals

The CONTAINS predicate uses the following conventions when dealing with single and double quotes in basic string literals:

Types of CONTAINS Matches

The types of matches, along with descriptions of how they work, are listed in the following table. Click a match type for a more detailed explanation and examples.

Match Type Description
Simple Matches the specified word or phrase.
Word Prefix Uses the wildcard character "*" to match words containing the same prefix.
Proximity Searches for words required to be close to one another.
Linguistic Generation Matches various tenses of verbs, plural or singular of nouns.
List Returns rows that match a list of words.

Simple Match

In a simple match, the word or phrase must be an exact match; however, the matching is not case sensitive. Multiple consecutive words are treated as a single phrase, and must appear in exactly the same order in the matching document. If a phrase contains blank spaces between words or punctuation marks, it must be enclosed in double quotes.

Syntax

CONTAINS ([column_reference, ]
 '"Word_or_Phrase"
 [ Boolean_Operator '"Word_or_Phrase" ] ... ') > 0

Elements

Column_Reference  Specifies the column name (alias). Its data type must be compatible with the format of the Word_or_Phrase specified.
  
Word_or_Phrase  

Specifies the literal of type Basic String to match.

  
Boolean_Operator  Specifies the boolean operator to use, following the precedence rules previously stated, that combines the comparison predicates.

Examples

  1. The following example searches the DocAuthor property of all files from the default scope, and retrieves rows consisting of the DocTitle property for any file in which the DocAuthor is Smith or Jones.

    SELECT DocTitle
    FROM SCOPE() 
    WHERE CONTAINS(DocAuthor,'"Smith" OR "Jones"') > 0
  2. The following example searches the Contents of all files from the default scope, and retrieves rows consisting of the DocTitle and DocAuthor properties for any file containing the phrases "telecommunications industry" and "United Kingdom".

    SELECT DocTitle, DocAuthor
    FROM SCOPE() 
    WHERE CONTAINS('"telecommunications industry" AND "United Kingdom"') > 0

Word Prefix Match

A word prefix match is similar to a simple match, except that the wildcard asterisk (*) character is used to represent the variable component of a word or phrase. The matching is not case sensitive. Multiple consecutive words are treated as a single phrase, and must appear in exactly the same order in the matching document. If a phrase contains blank spaces between words or punctuation marks, it must be enclosed in double quotes.

Syntax

CONTAINS ( [ column_reference, ] ' "Word_or_Phrase*"
[ Boolean_Operator "Word_or_Phrase" ] ... ' ) > 0 

Elements

Column_Reference  Specifies the column name (alias). Its data type must be compatible with the format of the Word_or_Phrase specified.
  
Word_or_Phrase  Specifies the literal of type Basic String to use as the prefix.
  
Boolean_Operator  Specifies the boolean operator to use, following the precedence rules previously stated, that combines the comparison predicates.

Example

The following example searches the Contents of all files from the default scope, and retrieves rows consisting of the DocTitle property for any file containing words such as wine, winery, wineries, and so forth.

SELECT DocTitle
FROM SCOPE() 
WHERE CONTAINS('wine*') > 0

Proximity Match

In a proximity match, the query returns a row only if the location of two or more words in a property is close to each other.

Syntax

CONTAINS ([column_reference,]
'"Word_or_Phrase " { NEAR() | ~ }
"Word_or_Phrase " [ { NEAR() | ~ } "Word_or_Phrase " ] ...') > 0

Elements

Column_Reference  Specifies the column name (alias). Its data type must be compatible with the format of the Word_or_Phrase specified.
  
Word_or_Phrase  Specifies the literal of type Basic String to match.
  
NEAR() | ~  Specifies the proximity relationship between adjacent words or phrases. (You can also use the tilde (~) instead of NEAR().) Currently, NEAR() uses WORD as the unit of measurement and a value of approximately 50 words as the proximity distance; in other words, if the two phrases are within approximately 50 words, a row is returned.

Note: You can also use a proximity match to query for more than two words that are near each other.

Examples

  1. The following example returns files where wordA is near wordB, which is near wordC.

    ...WHERE CONTAINS ('"wordA" NEAR() "wordB" NEAR() "wordC"') > 0
  2. The following example returns files where Index is within approximately 50 words of the word Server:

    ...WHERE CONTAINS(DocSubject,'"Index" ~ "Server"') > 0
  3. The following example returns files containing the word Microsoft, and where the word Index is close to the word Server:

    ...WHERE CONTAINS(DocSubject,'"Microsoft" AND "Index" NEAR() "Server"') > 0

Linguistic Generation Match

A linguistic generation match is a type of fuzzy search in which the target of the search is expanded to include variations of the word. Currently, the SQL Extensions support inflectional morphology generation, in which the search expands to include verb tenses, and the singular or plural of nouns.

Syntax

CONTAINS( [column_reference, ]
'FORMSOF(INFLECTIONAL, "Generation_Term ")') > 0

Elements

Column_Reference  Specifies the column name (alias). Its data type must be compatible with the format of the Word_or_Phrase specified.
  
Generation_Term  Specifies the literal of type Basic String to expand.

Example

The following example returns files that contain words such as "driving", "driven", and "drives".

...WHERE CONTAINS('FORMSOF(INFLECTIONAL, "drive")') > 0 

List Match

The SQL Extensions support queries based on matching a group of words or phrases. Also called vector space queries, these queries work by weighting each of the match terms, or vector components. The rank of each returned row indicates how well it matches the query. Rows returned do not have to match every listed word or phrase. The matching algorithm uses all the weighted components together when determining a match.

Syntax

CONTAINS( [column_reference, ]
'ISABOUT(Vector_Term [ WEIGHT( Value ) ] [ , Vector_Term [ WEIGHT( Value ) ] ] ...') > 0

Elements

Column_Reference  Specifies the column name (alias). Its data type must be compatible with the format of the Vector_Term specified.
  
Vector_Term  Specifies the terms to match. The term is one of the following types: Simple, Word Prefix, Proximity, or Linguistic Generation.
  
WEIGHT( Value )  Specifies the weight of the Vector_Term. If you specify a weight for a term, you must enclose the term in double quotes. You can specify Value in the range of 0.0 to 1.0.

Example

The following example queries the Contents property for the words "railroad", "transportation", and "locomotive". Files whose Contents include the word "heavy" near the word "locomotive" will be weighted higher in the results ranking. Since the term "transportation" is ranked lower than the other two terms, files containing only that term are ranked lower than files containing either "railroad" or "locomotive".

...WHERE CONTAINS
    ('ISABOUT (railroad, "transportation" WEIGHT(0.5),
	         "heavy near locomotive" WEIGHT(0.9))') > 0

© 1997 by Microsoft Corporation. All rights reserved.