lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-7159) improve spatial point/rect vs. polygon performance
Date Mon, 04 Apr 2016 16:32:25 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-7159:
--------------------------------
    Attachment: LUCENE-7159.patch

I added javadocs, asserts, a direct test, cleaned up the internal API, and re-ran benchmarks
with mike's openstreetmap benchmark:

Point in polygon only becomes a bottleneck for complex polygons, so it really only helps there,
but doesn't hurt simple ones.

More than half the gain here is unrelated to the grid, and only due to very minor optimizations
in Polygon.relates(), which actually is the bigger bottleneck.

I think we can do this as a start, and then as a followup implement relates() too? We should
be able to speed those up in a similar way, most calls would then be rectangle <-> rectangle
operations in integer space.

And of course it would still be nice to speed up the relation operations in Polygon.java (e.g.
compute if convex in ctor or whatever), but that stuff is complex.

{noformat}
n=5 25.9 QPS -> 27.0 QPS
n=50 18.2 QPS -> 18.1 QPS
n=500 8.0 QPS -> 9.5 QPS
n=1000 5.0 QPS -> 6.2 QPS
{noformat}

> improve spatial point/rect vs. polygon performance
> --------------------------------------------------
>
>                 Key: LUCENE-7159
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7159
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-7159.patch, LUCENE-7159.patch
>
>
> Now that we can query on complex polygons without going OOM (LUCENE-7153), we should
do something to address the current 🐢 performance.
> Currently, we use a basic crossings test ({{O\(n)}}) for boundary cases. We defer these
expensive per-doc checks on boundary cases to a two phase iterator (LUCENE-7019, LUCENE-7109),
so that it can be avoided if e.g. excluded by filters, conjunctions, deleted doc, and so on.
This is currently important for performance, but basically its shoving the problem under the
rug and hoping it goes away. At least for point in poly, there are a number of faster techniques
described here: http://erich.realtimerendering.com/ptinpoly/
> Additionally I am not sure how costly our "tree traversal" (rectangle intersection algorithms).
Maybe its nothing to be worried about, but likely it too gets bad if the thing gets complex
enough. These don't need to be perfect but need to behave like java's Shape#contains (can
conservatively return false), and Shape#intersects (can conservatively return true). Of course,
if they are too inaccurate, then things can get slower.
> In cases of precomputed structures we should also consider memory usage: e.g. we shouldn't
make a horrible tradeoff there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message