lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4942) Indexed non-point shapes index excessive terms
Date Thu, 18 Apr 2013 18:44:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635509#comment-13635509
] 

David Smiley commented on LUCENE-4942:
--------------------------------------

There definitely needs to be benchmarking for spatial; but I feel confident in this case that
that it'll be well worth it for RPT; I'm quite familiar with the algorithms in there.  It's
an unquestionable win-win for TermQueryStrategy.
                
> Indexed non-point shapes index excessive terms
> ----------------------------------------------
>
>                 Key: LUCENE-4942
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4942
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: David Smiley
>
> Indexed non-point shapes are comprised of a set of terms that represent grid cells. 
Cells completely within the shape or cells on the intersecting edge that are at the maximum
detail depth being indexed for the shape are denoted as "leaf" cells.  Such cells have a trailing
'\+' at the end.  _Such tokens are actually indexed twice_, one with the leaf byte and one
without.
> The TermQuery based PrefixTree Strategy doesn't consider the notion of 'leaf' cells and
so the tokens with '+' are completely redundant.
> The Recursive [algorithm] based PrefixTree Strategy better supports correct search of
indexed non-point shapes than TermQuery does and the distinction is relevant.  However, the
foundational search algorithms used by this strategy (Intersects & Contains; the other
2 are based on these) could each be upgraded to deal with this correctly.  Not trivial but
very doable.
> In the end, spatial non-point indexes can probably be trimmed my ~40% by doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message