lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: Crazy increase of MultiPhraseQuery memory usage in Lucene 5 (compared with 3)
Date Mon, 24 Aug 2015 06:17:13 GMT
I spent some time carving out a quick test of the bits that matter and
put them up here:
https://gist.github.com/trejkaz/a72b87277b1aec800c2e

The tests index 1,000,000 docs with just one instance of the
field/sub-field trick we're using, plus one unique value. So it's a
bit of an artificial test, but benchmarks tend to be like that.

Times for Lucene 3.6:
    Indexing: 3.365 s
    SpanQuery: 20.48 s
    MultiPhraseQuery: 9.641 s

Times for Lucene 5.2:
    Indexing: 4.423 s
    SpanQuery: 31.94 s
    MultiPhraseQuery: (never completes due to OOME)

An aside which is totally a red herring: it seems there is quite a bit
of slowdown on indexing and SpanQuery as well, which makes me wonder
whether I have incorrectly configured the FieldType when compared with
how the same field was indexed for 3.6.

You can also see from these numbers how MultiPhraseQuery used to be
much faster than SpanQuery, which was why we stopped using SpanQuery
for this particular query in the first place.

Timings aside, MultiPhraseQuery used to complete but now gets an OOME
when provided 2GB of RAM for this particular case.

I also tried hacking together a TermAutomatonQuery to see what
happened with that, and it gets an OOME as well.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message