lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: Big slowdown with phrase queries
Date Thu, 03 Jul 2008 22:30:20 GMT

On 3-Jul-08, at 3:04 PM, Chris Harris wrote:
>
> Now I gather that phrase queries are inherently slower than non-phrase
> queries, but 1-3 orders of magnitude difference seems noteworthy.
>
> This is on Solr r654965, which I don't think is *too* far behind the
> trunk version. 1200Mb RAM allocated to Solr. 8M documents. Lots of
> compressed, stored fields. Most docs are probably like 50Kb, but some
> of them might be 10Mb, 100Mb. The index as a whole is 106GB.
> maxFieldLength=10000. The index was recently optimized. (It has only
> one segment right now.)
>
> I'm thinking that even supposing I've indexed everything in a horrible
> inefficient manner, and even supposing my machine is woefully
> underpowered, that wouldn't really explain why the phrase queries
> would be *that* much slower, would it? Any ideas?

It is simply due to caching effects.  Probably the term count info is  
in the OS cache, but the positions aren't.  You are seeing disk vs.  
non-disk access differences, which is what accounts for the multi- 
orders of magnitude difference.

The important variable here isn't total index size, but size of .prx  
(positions) versus .frq (term counts), as compared with the total  
_free/cached_ memory available on the system (not allocated to the JVM).

> Indexing with
> termPositions wouldn't help, would it? (Now I'm not using
> termPositions or termVectors.) Or what if I used an alternative query
> parser, so phrase queries could be implemented in terms of the
> SpanNearQuery class rather than the PhraseQuery class?

No way to speed this up other than indexing less, buying more memory,  
or distributing across more machines.

-Mike

Mime
View raw message