lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6547) Add dateline crossing support to GeoPointInBBox and GeoPointDistance Queries
Date Wed, 01 Jul 2015 12:49:04 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610054#comment-14610054
] 

Michael McCandless commented on LUCENE-6547:
--------------------------------------------

Thanks Nick!

bq. Addresses LUCENE-6562 to share ranges across segments. To guard against OOM exceptions
the ranges needed to be purged once all segments have been visited. See GeoPointTermQuery
line 87 for this check.

OK, it looks like this patch also absorbed the latest patch on
LUCENE-6562?  It's clever how you clear ranges after the last segment!

Unfortunately I think it's unsafe ... e.g. if IndexSearcher is using
an executor, multiple threads can call getTermsEnum I think ... and
also, in the cached case, on opening a new NRT reader, I think the
query would only see e.g. the 1 or 2 new segments.

Net/net I'm worried about the complexity of getting this working
correctly; I think for now it's best/simpler to just re-compute the ranges
per segment?

We should anyway try to keep the issue separate from this one...

{quote}
bq. Can we remove the separate bbox from GeoPointInPolygonQuery's ctor?

Yes! I believe so. The intent was to improve performance for detailed polygons (those with
many vertices) by providing precomputed or cached bounding boxes rather than having the query
recompute. Might be worth benchmarking to be certain whether we want to take away this utility.
Sure it can be used maliciously, maybe good documentation can raise awareness?
{quote}

But that cost will be miniscule compared to the overall query execution time right?

E.g. to filter just a single "borderline" hit, we also must walk the full polygon?  Multiplied
by the number of borderline hits ...

It just strikes me as such a trivial optimization that it's not worth polluting the API over
... APIs are hard enough as it is :)  And e.g. I've already screwed it up at least once when
I passed an invalid bbox before (when working on the test).

{quote}
Large ranges were previously discouraged because it takes on the order of ~2secs to correctly
compute the ranges for large search areas. Since this occurred for every segment large search
over large data was quite time consuming. Reusing ranges across segments has brought this
down to the amount of time it takes to compute the ranges.
{quote}

OK that's a good improvement but I think large ranges are still very dangerous for this query?
 E.g. the query will suddenly take ... 10s of MBs heap to hold its ranges as well?  Maybe
(separately!) we could improve how we store the ranges in RAM, e.g. use two PrefixCodedTerms
or something.

I think, like AutomatonQuery, we need to place a default limit on the number of ranges?  User
can increase it if they know what they are doing...

I put this nocommit in my last patch:

bq. // nocommit why on earth are we needing to pass lat1/lon1 here :)  the query doesn't get
these...

Any idea how to fix?  maxLat/maxLon cannot possibly matter here: they are randomly generated
and not used by the query, yet they are used by radiusQueryCanBeWrong.  This must be a test
bug?


> Add dateline crossing support to GeoPointInBBox and GeoPointDistance Queries
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-6547
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6547
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch,
LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch
>
>
> The current GeoPointInBBoxQuery only supports bounding boxes that are within the standard
-180:180 longitudinal bounds. While its perfectly fine to require users to split dateline
crossing bounding boxes in two, GeoPointDistanceQuery should support distance queries that
cross the dateline.  Since morton encoding doesn't support unwinding this issue will add dateline
crossing to GeoPointInBBoxQuery and GeoPointDistanceQuery classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message