lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Understanding performance characteristics of the new point types
Date Wed, 02 Nov 2016 21:32:22 GMT
Hi,

FYI, the old NumericRangeQuery is fast here, because it rewrites to a constant score BooleanQuery
for this low-cardinality case! If you have no real range, then it rewrites to a TermQuery!

Points are different, they are not so good for simple term-based lookups.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Florian Hopf [mailto:mailinglists@florian-hopf.de]
> Sent: Wednesday, November 2, 2016 8:19 PM
> To: Lucene Users <java-user@lucene.apache.org>
> Subject: Re: Understanding performance characteristics of the new point
> types
> 
> Thank you both for the explanation, we will switch to StringField with a
> TermQuery instead.
> 
> On 02.11.2016 20:09, Michael McCandless wrote:
> > Yeah it's best to use StringField for low-cardinality use cases.
> >
> > When cardinality is low (4 unique values in your case), legacy
> > numerics would rewrite to a BooleanQuery, which is much more
> > performant for MUST clauses, vs dimensional points which will always
> > need to construct an up front bitset for all documents with that
> > value.  Using StringField instead will ensure you always get a
> > BooleanQuery...
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Wed, Nov 2, 2016 at 2:43 PM, Fuad Efendi <fuad@efendi.ca> wrote:
> >> Hi florian,
> >>
> >> If my understanting is correct, you are using IntPoint to index 4 different
> >> document types which is overkill; why not to try classic “non-tokenized”
> >> keyword field (a.k.a. “legacy string”) for document types? Cardinality is
> >> only four for document types.
> >>
> >>
> >> --
> >>
> >> Fuad Efendi
> >>
> >> (416) 993-2060
> >>
> >> http://www.tokenizer.ca
> >> Recommender Systems
> >>
> >>
> >> On November 2, 2016 at 2:10:14 PM, Florian Hopf (
> >> mailinglists@florian-hopf.de) wrote:
> >>
> >> Hi,
> >>
> >> we are indexing different types of documents in one Lucene index. They
> >> have most fields in common but we need to filter some types for certain
> >> queries. We are using numeric values to determine the types of
> documents
> >> (1-4). Now, when querying these documents we see that the performance
> >> degrades the more documents of a type are in the index.
> >>
> >> Using a simple test that indexes 10 Mio documents I can see the
> >> following when filtering on everything but 100000 documents:
> >>
> >> * When issuing the query alone the new PointRangeQuery
> >> (IntPoint.newExactQuery) is a lot faster than term and legacy numeric
> >> (in my case around 2x the speed of the others)
> >> * When issuing a bool query that contains a term query that selects 5
> >> documents together with a must query that selects on the numeric the
> >> points are 5x slower than legacy numeric
> >> (LegacyNumericRangeQuery.newIntRange) and terms (TermQuery)
> >> * When doing the same thing with SHOULD instead of MUST for the
> >> additional term query the PointRangeQuery is fastests as well
> >>
> >> I suspect this to be related to the discussion in
> >> https://issues.apache.org/jira/browse/LUCENE-7254
> >>
> >> Of course there could be something wrong with the way I am measuring
> the
> >> performance, I'd be happy to share the code. But what I read in the
> >> ticket above seems to hint that the points are not suited for every use
> >> case? Is it recommended to use StringField in a case like this instead?
> >>
> >> Regards
> >> Florian
> >>
> >> --
> >> Florian Hopf
> >> Freelance Software Developer
> >>
> >> http://blog.florian-hopf.de
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> 
> --
> Florian Hopf
> Freelance Software Developer
> 
> http://blog.florian-hopf.de
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message