lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] Commented: (LUCENE-2455) Some house cleaning in addIndexes*
Date Wed, 12 May 2010 10:58:41 GMT


Shai Erera commented on LUCENE-2455:

bq. Adding indexes using FilterIndexReader is useful 

I'm not against that Mike. addIndexes should allow for both IndexReader and Directory. It's
the registerIndexes (or whatever name we come up with) which should work with Directory only,
and then, even if the app calls addIndexes with its own custom IR, it can still call registerIndexes
w/ the Directory only, to do that fast copy/registration. Since no IR method will be involved
in the process.

So let's not confuse the two - addIndexes will exist and work as they are today. registerIndexes
will be a new one.

bq. assuming the codecs are identical (the "write" codec equals the codec used to write the
external segment), and assuming the doc stores of the external segment are private to it

Right. Thanks for pointing that out, as it will become an important NOTE in the documentation.
This method (registerIndexes) is definitely for advanced users, that have to know *exactly*
what's in the foreign indexes. For example, I need this because I'm building several indexes
on several nodes and then I want to add them to a central/master one. I know they don't have
deletions, and each is already optimized. Therefore traversing the posting lists (as fast
as it would be) is completely unnecessary.

bq. but renaming the segment in the process?

Sure! I think we should really 'register' them in the Directory, as if they are the newly
flushed segments. I'm sure you have a general idea on how this can be done? Assuming through
SegmentInfos or something?

> Some house cleaning in addIndexes*
> ----------------------------------
>                 Key: LUCENE-2455
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Trivial
>             Fix For: 3.1, 4.0
> Today, the use of addIndexes and addIndexesNoOptimize is confusing - 
> especially on when to invoke each. Also, addIndexes calls optimize() in 
> the beginning, but only on the target index. It also includes the 
> following jdoc statement, which from how I understand the code, is 
> wrong: _After this completes, the index is optimized._ -- optimize() is 
> called in the beginning and not in the end. 
> On the other hand, addIndexesNoOptimize does not call optimize(), and 
> relies on the MergeScheduler and MergePolicy to handle the merges. 
> After a short discussion about that on the list (Thanks Mike for the 
> clarifications!) I understand that there are really two core differences 
> between the two: 
> * addIndexes supports IndexReader extensions
> * addIndexesNoOptimize performs better
> This issue proposes the following:
> # Clear up the documentation of each, spelling out the pros/cons of 
>   calling them clearly in the javadocs.
> # Rename addIndexesNoOptimize to addIndexes
> # Remove optimize() call from addIndexes(IndexReader...)
> # Document that clearly in both, w/ a recommendation to call optimize() 
>   before on any of the Directories/Indexes if it's a concern. 
> That way, we maintain all the flexibility in the API - 
> addIndexes(IndexReader...) allows for using IR extensions, 
> addIndexes(Directory...) is considered more efficient, by allowing the 
> merges to happen concurrently (depending on MS) and also factors in the 
> MP. So unless you have an IR extension, addDirectories is really the one 
> you should be using. And you have the freedom to call optimize() before 
> each if you care about it, or don't if you don't care. Either way, 
> incurring the cost of optimize() is entirely in the user's hands. 
> BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler 
> nor MergePolicy, but rather call SegmentMerger directly. This might be 
> another place for improvement. I'll look into it, and if it's not too 
> complicated, I may cover it by this issue as well. If you have any hints 
> that can give me a good head start on that, please don't be shy :). 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message