lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject Re: Big slowdown with phrase queries
Date Thu, 03 Jul 2008 22:21:28 GMT
On Thu, Jul 3, 2008 at 6:04 PM, Chris Harris <> wrote:
> Now I gather that phrase queries are inherently slower than non-phrase
> queries, but 1-3 orders of magnitude difference seems noteworthy.

Phrase queries could be a couple times slower, but normally not to the
degree you show here.

The most likely factor is that phrase queries need to look at term
positions, and those are in a different part of the index that may not
be cached by the OS (esp if phrase queries are rare in your system).
You may not even have enough system RAM free to allow caching
positions also.

Check your index and look at the total size of the .tis files, the
.frq files, and the .prx files.
.tis and .frq is used to look up terms and what documents match those
terms.  .prx files are used for the term positions in each document.

You may also want to test things out in a more controlled manner (a
system with no live traffic, etc) to narrow things down some more.

> This is on Solr r654965, which I don't think is *too* far behind the
> trunk version. 1200Mb RAM allocated to Solr. 8M documents. Lots of
> compressed, stored fields. Most docs are probably like 50Kb, but some
> of them might be 10Mb, 100Mb. The index as a whole is 106GB.
> maxFieldLength=10000. The index was recently optimized. (It has only
> one segment right now.)
> I'm thinking that even supposing I've indexed everything in a horrible
> inefficient manner, and even supposing my machine is woefully
> underpowered, that wouldn't really explain why the phrase queries
> would be *that* much slower, would it? Any ideas? Indexing with
> termPositions wouldn't help, would it?

No.  TermVectors are not used for phrase queries.

> (Now I'm not using
> termPositions or termVectors.) Or what if I used an alternative query
> parser, so phrase queries could be implemented in terms of the
> SpanNearQuery class rather than the PhraseQuery class?

Span queries would be slower than phrase queries.


View raw message