lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wunderw...@netflix.com>
Subject Re: Phrase Query Performance Question
Date Thu, 01 Nov 2007 00:54:53 GMT
"hurricane katrina" is a very expensive query against a collection
focused on Hurricane Katrina. There will be many matches in many
documents. If you want to measure worst-case, this is fine.

I'd try other things, like:

* ninth ward
* Ray Nagin
* Audubon Park
* Canal Street
* French Quarter
* FEMA mistakes
* storm surge
* Jackson Square

Of course, real query logs are the only real test.

wunder

On 10/31/07 3:25 PM, "Mike Klaas" <mike.klaas@gmail.com> wrote:

> On 31-Oct-07, at 2:40 PM, Haishan Chen wrote:
> 
>> 
>> http://mail-archives.apache.org/mod_mbox/lucene-java-user/
>> 200512.mbox/%3c4397F720.9070007@getopt.org%3e
>> It mentioned that  http://websearch.archive.org/katrina/ (in nutch)
>> had 10M documents and a search of "hurricane katrina" was able to
>> return in 1.35 seconds with  600,867 hits.  Althought the computer
>> it was using might be more powerful than mine. I feel 937ms for a
>> phrase query on a single field is kind of slower. Nutch actually
>> expand a search to more complex queries. My index and the number of
>> hits on my query ("auto repair") is about one fifth of
>> websearch.archive.org and its testing query. So I feel a reasonable
>> performance for my query should be less than 300 ms. I am not sure
>> if I am right on that logic.
> 
> I'm not sure that it is reasonable, but I'm not sure that it isn't.
> However, have you tried other queries?  937ms seems a little high,
> even for phrase queries.
> 
>> Anyway I will collect the statistic on linux first and try out
>> other options.
> 
> Have you tried using the performance enhancements present in solr-trunk?
> 
> -Mike


Mime
View raw message