lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haishan Chen <hais...@msn.com>
Subject RE: Phrase Query Performance Question
Date Thu, 01 Nov 2007 06:54:26 GMT




> Date: Wed, 31 Oct 2007 17:54:53 -0700> Subject: Re: Phrase Query Performance Question>
From: wunderwood@netflix.com> To: solr-user@lucene.apache.org> > "hurricane katrina"
is a very expensive query against a collection> focused on Hurricane Katrina. There will
be many matches in many> documents. If you want to measure worst-case, this is fine.>
> I'd try other things, like:> > * ninth ward> * Ray Nagin> * Audubon Park>
* Canal Street> * French Quarter> * FEMA mistakes> * storm surge> * Jackson Square>
> Of course, real query logs are the only real test.> > wunder
 
 
 
These terms are not frequent in my index. I believe they are going to be fast. The thing is
that I feel 2 million documents is a small index.
100,000 or 200,000 hits is a small set and should always have sub second query performance.
Now I am only querying one field and the
response is almost one second. I feel I can't achieve sub second performance if I add a bit
more complexity to the query.
 
Many of the category terms in my index will appear in more than 5% of the documents and those
category terms are very popular search
terms. So the example I gave were not extreme cases for my index
 
When I start tomcat I saw this message:
The Apache Tomcat Native library which allows optimal performance in production environments
was not found on the java.library.path
 
Is that mean if I use Apache Tomcat Native library the query performance will be better. Anyone
has experience on that?
 
 
 
Thanks a lot
-Haishan
 
 
 
 
 
 
 
 
 
 
 
 
> > On 10/31/07 3:25 PM, "Mike Klaas" <mike.klaas@gmail.com> wrote:> > >
On 31-Oct-07, at 2:40 PM, Haishan Chen wrote:> > > >> > >> http://mail-archives.apache.org/mod_mbox/lucene-java-user/>
>> 200512.mbox/%3c4397F720.9070007@getopt.org%3e> >> It mentioned that http://websearch.archive.org/katrina/
(in nutch)> >> had 10M documents and a search of "hurricane katrina" was able to>
>> return in 1.35 seconds with 600,867 hits. Althought the computer> >> it
was using might be more powerful than mine. I feel 937ms for a> >> phrase query on
a single field is kind of slower. Nutch actually> >> expand a search to more complex
queries. My index and the number of> >> hits on my query ("auto repair") is about
one fifth of> >> websearch.archive.org and its testing query. So I feel a reasonable>
>> performance for my query should be less than 300 ms. I am not sure> >> if
I am right on that logic.> > > > I'm not sure that it is reasonable, but I'm not
sure that it isn't.> > However, have you tried other queries? 937ms seems a little high,>
> even for phrase queries.> > > >> Anyway I will collect the statistic on
linux first and try out> >> other options.> > > > Have you tried using
the performance enhancements present in solr-trunk?> > > > -Mike> 
_________________________________________________________________
Peek-a-boo FREE Tricks & Treats for You!
http://www.reallivemoms.com?ocid=TXT_TAGHM&loc=us
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message