lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martijn van Groningen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-7304) Doc values based block join implementation
Date Thu, 26 May 2016 14:03:13 GMT
Martijn van Groningen created LUCENE-7304:
---------------------------------------------

             Summary: Doc values based block join implementation
                 Key: LUCENE-7304
                 URL: https://issues.apache.org/jira/browse/LUCENE-7304
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Martijn van Groningen
            Priority: Minor


At query time the block join relies on a bitset for finding the previous parent doc during
advancing the doc id iterator. On large indices these bitsets can consume large amounts of
jvm heap space.  Also typically due the nature how these bitsets are set, the 'FixedBitSet'
implementation is used.

The idea I had was to replace the bitset usage by a numeric doc values field that stores offsets.
Each child doc stores how many docids it is from its parent doc and each parent stores how
many docids it is apart from its first child. At query time this information can be used to
perform the block join.

I think another benefit of this approach is that external tools can now easily determine if
a doc is part of a block of documents and perhaps this also helps index time sorting?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message