Tuesday, March 27, 2012

Can I force SQL Server to use the CONTAINS operator first?

If I do the query below, SQL Server does a table scan (thousands of rows) for fn_TestCol(), then evaluates the CONTAINS clause:

SELECT col1, col2
FROM myTable
WHERE CONTAINS((col1, col2), 'foo and bar')
AND fn_TestCol(col1) = 0

How can I force it to evaluate CONTAINS clause, which returns only a few rows, first? The best I've come up with is this:

SELECT sub.col1, sub.col2
FROM (
SELECT col1, col2
FROM myTable
WHERE CONTAINS((col1, col2), 'foo and bar')
) sub
WHERE fn_TestCol(sub.col1) = 0

It's much faster, but still not as fast as if I could just use the first query, but force SQL Server to evaluate CONTAINS first.

Actually CONTAINS is a predicate that can be used with the WHERE clause there are two versions of it defined by ANSI SQL and implemented by Microsoft CONTAINS and CONTAINSTABLE, the other two FULLTEXT predicates are FREETEXT and FREETEXTTBALE, try the link below for details. Hope this helps.

http://technet.microsoft.com/en-us/library/ms187787.aspx

|||

Hi Caddre,

Thanks for the reply -- but that didn't seem to have anything to do with my question!

|||

(It's much faster, but still not as fast as if I could just use the first query, but force SQL Server to evaluate CONTAINS first.)

It does because since CONTAINS is a predicate and not a clause what you can force it to do is determined by the restrictions and limitations listed in the link I provided.

|||no, you will have no control over the freetext functionality as in SQL Server 2005 the query in FT searches still is not connected to the query engine and therefore cannot be optimized and tweaked in your requested way.

Jens K. Suessmeyer.

http://www.sqlserver2005.de
|||

OK, I found a way to do it:

DECLARE @.ftQuery nvarchar(32)

SELECT col1, col2
FROM myTable
WHERE CONTAINS((col1, col2), @.ftQuery)
AND fn_TestCol(col1) = 0

OPTION (OPTIMIZE FOR(@.ftQuery = 'foo'))


This structure allows the optimizer to make assumptions about what might be in @.ftQuery -- so it correctly assumes that the CONTAINS predicate will be a faster starting point than the udf, and runs with the right plan.

|||

_jesse,
Interesting.. this as long been a "by design" problem with SQL FTS since SQL Server 7.0. You never said which version of SQL Server you are using, but I'm assuming it is SQL 2000 and not SQL Server 2005. Correct? Can you provide more details on your UDF fn_TestCol? How many rows do you have in your real table or your myTable example?

Thanks,
John

John T. Kane
Search Evangelist, Intellisearch
email: john.kane@.intellisearch.no

|||

Hi John,

It is SQL 2005.

The problem, as I understand it, is this: when you pass a parameter to CONTAINS, as in

CONTAINS(col1, @.searchText)

the optimizer isn't able to recognize that the CONTAINS operator is a good place to start. The execution plan changes if you replace @.searchText with a literal string, as in:

CONTAINS(col1, 'foo')

In my case, the table being scanned was relatively small (20,000 rows), but the UDF is slow -- it's doing a bunch of string manipulation & comparison. Since the CONTAINS clause is likely to be very selective, the query is very fast if you start with CONTAINS, then run the UDF on the few rows matching the fts, but very slow if you do it the other way around.

Declaring the @.searchText parameter as nvarchar, then adding the OPTIMIZE FOR hint gets it to consistently run in the right order.

No comments:

Post a Comment