lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas Knize (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-6685) GeoPointInBBox/Distance queries should have safeguards
Date Tue, 21 Jul 2015 21:42:04 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635858#comment-14635858
] 

Nicholas Knize edited comment on LUCENE-6685 at 7/21/15 9:41 PM:
-----------------------------------------------------------------

Updated patch iincludes the following improvements:

* Dynamically compute detail level based on query size (includes min/max bounds on detail
level)
* Remove unnecessary ranges from PointDistanceQuery
* Updated javadocs


was (Author: nknize):
Updated patch iincludes the following improvements:

* Dynamically compute detail level based on query size (includes min/max bounds on detail
level)
* Remove unnecessary ranges from PointDistanceQuery

> GeoPointInBBox/Distance queries should have safeguards
> ------------------------------------------------------
>
>                 Key: LUCENE-6685
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6685
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: 5.3, Trunk
>
>         Attachments: LUCENE-6685.patch, LUCENE-6685.patch, LUCENE-6685.patch
>
>
> These queries build a big list of term ranges, where the size of the list is in proportion
to how many cells of the space filling curve are "crossed" by the perimeter of the query (I
think?).
> This can easily be 100s of MBs for a big enough query ... not to mention slow to enumerate
(we still do this again for each segment).
> I think the queries should have safeguards, much like we have maxDeterminizedStates for
Automaton based queries, to prevent accidental OOMEs.
> But I think longer term we should either change the ranges to be enumerated on-demand
and never stored in entirety (like NumericRangeTermsEnum), or change the query so it has a
fixed budget of how many cells it's allowed to visit and then within a crossing cell it uses
doc values to post-filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message