lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7179) GeoPoint and LatLonPoint test data should quantize once
Date Tue, 05 Apr 2016 18:45:25 GMT


Robert Muir commented on LUCENE-7179:

Because its a 32 bit space, though, data truncation is inevitable.

No, we are choosing to truncate the *user's data*. This is something to be taken seriously.

I care very much about the details about exactly how this truncation happens, that is what
I keep bringing up on this issue:
* stability
* rounding
* overflow

GeoPoint has a much slower bounding box query than LatLonPoint because it can't take advantage
of *exactly how its truncation happens* for these silly reasons. If these were fixed, this
query would be faster.

I just commented on the port of distance sort (LUCENE-7180) about how important integer space
bounding box is to reduce cpu in compareBottom.

And you can see speedups in LUCENE-7177 for GeoPoint's polygon queries which are based on
operations in integer space (i had to incorporate significant hair to accomodate the current
untamed quantization).

So there is 3 use cases right now on the table for why we should fix this: faster bounding
box, sorting, polygon queries. And code can still be simple and the quantization "effect"
is easier to reason about: e.g. fixing rounding means the 'double' we treat it as is always
the closest double in our integer space that is <= the user's value, instead of "rounded
half-hazardly in an unknown direction". It just requires we pay attention and fix the bugs
and write good tests.

> GeoPoint and LatLonPoint test data should quantize once
> -------------------------------------------------------
>                 Key: LUCENE-7179
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-7179.patch
> {{LatLonPoint}} and {{GeoPointField}} tests pre quantizes test data to ensure consistency
with indexed (encoded) data. The pre quantized data then becomes indexed, undergoing another
quantization. To guarantee numerical stability this should be changed such that the test data
is quantized after indexing.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message