lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas Knize (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-6712) GeoPointField should cut over to DocValues for boundary filtering
Date Tue, 04 Aug 2015 20:13:05 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nicholas Knize updated LUCENE-6712:
-----------------------------------
    Attachment: LUCENE-6712.patch

Awesome! Thanks for the review Mike! Updated patch to address comments is attached.

bq. this is mixing up separate changes I think? One change is cutover to doc values for the
point filtering of each lat/lon, and the other is changing the lower detail level and higher
prec step?

Indeed. The former gives search performance improvements, the latter gives indexing performance
improvements. I can split these into 2 patches if desired? That way we can separately investigate
the impact of changing the precision value?

bq. Shouldn't you iterate through all values and accept the docs if any of them were in-bounds?
Can you add a test case that exposes this?

++ Thanks for pointing that out! I had intended to change that.  Fixed in the attached patch
- I also added explicit multi-valued documents and testing to cover this.  Random multi-valued
documents would be nice, though I don't think it blocks the patch?

bq. Couldn't GeoPointTermsEnum just have an abstract acceptLatLon method?

++ I had gone back and forth about this a couple times. With DV post filtering it makes more
sense to now have GeoPointTermsEnum be abstract with an abstract postFilter method. Before,
most of the logic was shared, only crosses and within were fully overridden in Poly and Distance
query classes. I went ahead and made the change in the attached patch.

bq. It looks like you continue using full precision terms to approximate the shape's boundary
right?

No, the Range instances are now using lower precision terms for the boundaries (up to PRECISION_STEP
* MAX_SHIFT - which works out to no higher than level 18). GPTQConstantScoreWrapper iterates
the docIds in the postings list.  So full precision terms (32 > level >18) are never
used (really just wasting space in the index). I suppose I could modify GeoPointField to only
index up to a shift of PRECISION_STEP * MAX_SHIFT and further reduce the index size?

> GeoPointField should cut over to DocValues for boundary filtering
> -----------------------------------------------------------------
>
>                 Key: LUCENE-6712
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6712
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-6712.patch, LUCENE-6712.patch, LUCENE-6712.patch
>
>
> Currently GeoPointField queries only use the Terms Dictionary for ranges that fall within
and on the boundary of the query shape.  For boundary ranges the full precision terms are
iterated, for within ranges the postings list is used.
> Instead of iterating full precision terms for boundary ranges, this enhancement cuts
over to DocValues for post-filtering boundary terms. This allows us to increase precisionStep
for GeoPointField thereby reducing the number of terms and the size of the index. This enhancement
should also provide a boost in query performance since visiting more docs and fewer terms
should be more efficient than visiting fewer docs and more terms. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message