lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-6645) BKD tree queries should use BitDocIdSet.Builder
Date Tue, 30 Jun 2015 19:27:04 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-6645:
---------------------------------------
    Attachment: LUCENE-6645.patch

Here's a patch, just restoring what I had in earlier iterations on the original BKD tree issue
(LUCENE-6477) ... maybe I am doing something silly?

The BKD test passes ... I'll compare performance vs current trunk (FixedBitSet every time).

I had to make little reusable DISI classes to pass to the BitDocIdSet.Builder.or method.

It could be if we made the BKD wasteful by indexing prefix terms so that the number of DISIs
we need to or together are small, the perf hit wouldn't be so much ...

> BKD tree queries should use BitDocIdSet.Builder
> -----------------------------------------------
>
>                 Key: LUCENE-6645
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6645
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-6645.patch
>
>
> When I was iterating on BKD tree originally I remember trying to use this builder (which
makes a sparse bit set at first and then upgrades to dense if enough bits get set) and being
disappointed with its performance.
> I wound up just making a FixedBitSet every time, but this is obviously wasteful for small
queries.
> It could be the perf was poor because I was always .or'ing in DISIs that had 512 - 1024
hits each time (the size of each leaf cell in the BKD tree)?  I also had to make my own DISI
wrapper around each leaf cell... maybe that was the source of the slowness, not sure.
> I also sort of wondered whether the SmallDocSet in spatial module (backed by a SentinelIntSet)
might be faster ... though it'd need to be sorted in the and after building before returning
to Lucene.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message