lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen
Date Tue, 10 May 2016 09:42:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277866#comment-15277866
] 

Michael McCandless commented on LUCENE-6766:
--------------------------------------------

Thanks [~jpountz]!

I folded in most of your feedback, except:

bq. The only thing I am slightly worried about is how all optimized bulk mergers need to opt
out if a sort order is configured. I am wondering if our base consumer classes should have
two merge methods so that you would not have to check the sort order when overriding the method
for regular merges? This is just an idea, it has drawbacks too since there would not be a
single entry point to merging anymore and we would need another method in our API, but I'm
suggesting it anyway hoping that it might give somebody a better idea.

I think it's OK to keep a single merge method?  This merge method
already must deal with wild per-segment variabilities, e.g. different
fields across segments, some have deletions some don't, etc., so I
don't think we need to single out "has an index sort" into a separate
method?

Also, implementing merge methods is really an uber-expert thing to
do, so such devs should be up to the task of handling an incoming
index sort, I think.

bq. I think this is buggy since it ignores null sorts at the beginning of the list but not
at the end,

Nice catch!  I added test showing the bug, and then fixed it (pushed).

bq. Let's remove it for now and later see whether this is something that could be added back?

OK I did that.  I think at least there is a simple solution for doc-block
users: just index a doc values field with the "id" for each block, and
then sort on that.

bq.  but leveraging index sorting at search time looks like a big task to me so maybe we should
defer it to a follow-up issue like sorting on flush?

I did move the early terminating to core, and I do think going forward
we should make it easier to use this ... it should somehow be the
default, and not a "make your own Collector" situation ...

As Rob has pointed out, even today (before promoting index sorting)
we could early-terminate in cases where the query is sorting on
index order, such as collecting first N hits for a filter.

But I agree we should do this separately.  I will open follow-on issues
for "can we sort on flush too" and "searching should take advantage
of index sort by default".

bq. Should DocIdMerger.Sub.nextDoc throw an IOException?

I tried this out, but it started to sprawl: the doc values all wrap
`DocIdMerger` under a java `Iterator` which cannot throw `IOException`
... I could move the `try/except` up there, but there are many places
I'd have to move this to, so leaving it where it is seemed like the
lesser evil.


> Make index sorting a first-class citizen
> ----------------------------------------
>
>                 Key: LUCENE-6766
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6766
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge policy,
custom collectors, etc. I would like to explore making it a first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs on a sort
order that is a prefix of the sort order of a segment (and if the user is not interested in
totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message