lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Search performance under high load
Date Thu, 07 Apr 2005 07:24:55 GMT

On Thursday 07 April 2005 00:54, Chris Hostetter wrote:
> : Queries: The query strings are of highly differing complexity, from
> : simple x:y to long queries involving conjunctions, disjunctions and
> : wildecard queries.
> :
> : 90% of the queries run brilliantly. Problem is that 10% of the queries
> : (simple or not) take a long time, on average more that 10 seconds,
> : sometimes several minutes.
> without knowing the nature of the queries, these numbers are not outside
> the realm of possibility.  there have been examples on the list in the
> last few days of how BooleanQueries constructed with deep nesting have
> particularly bad performance.
> I would suggest you timing logs to your Search code so that you get one
> log line per search executed telling you:
> 1) the time of day the search was executed
> 2) the total time taken by the call
> 3) the Query.toString() of the search.
> 4) the Hits.length() of the result.
> 5) any tracking information to help you identify where the search came
>    from (ie: canned search from a category listing page, user entered
>    freeform text, your RSS feed genertor, etc...)
> This will help you determine:
>  a) is there a common element to the structure of queries that take more
>     then a certain amount of time?

In general, disjunctions (truncations, fuzzy queries) are slow, and
conjunctions (required terms, filters) are faster.

>  b) are the slow queries clustured by time of day? is anything else
>     happening on that box during that time?
>  c) are the "slow" queries all resulting in a high number of Hits?
>  d) are the slow searches all orriginating from a single source? (ie: are
>     the queries needed by categlory listing pages all really slow) can
>     they be re-implimented differently?
>  e) is there anything else the slow queries have in common?

I think your case is CPU bound, so you have a few options:
- use more CPU's,
- get in touch with the 'power' users, (via the logs as suggested by Chris)
  and find out it there are  simple measures you can take to help performance
  for them. For example, replacing a range that is repeatedly used by a cached
  filter can be quite effective.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message