lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Performance guarantees and index format
Date Mon, 04 Feb 2008 23:07:24 GMT

: What this issue doesn't discuss is what to do with partial results obtained
: when a timeout occurred. As the original poster points out, document lists are
: traversed in the order they were added and not the order of their importance,
: which introduces a bias to partial results in that they reflect results from a
: random sample (which is likely not the most relevant, i.e. there could have
: been more relevant results later in the traversal order).
: The answer to this issue is org.apache.nutch.indexer.IndexSorter, which

skimming this it doesn't seem like a refactored version that was less 
nutch specific cold make a handy contrib ... but it also seems like there 
may be a simpler approach for the (i assume) common case of prefering docs 
that were indexed later....

if we eliminate the requirement for *strict* preference of recent 
documents and make that a more loose desire, then we coulnd't we do a 
pretty good job if we just changed Segment merging to reorder reverse the 
order of the segments before each merge?  it wouldn't be very useful to 
start doing this on an index that's already a decent size, but if this was 
happening on every merge right from the very begining, then the most 
recent documents would percollate to the front of the index right?

The only downside i can think of would be that docids would frequently 
(not not very predictably) change even if there were no deletions .. but 
you'd pay that same penalty with something like the nutch's IndexSorter.

I'm not much of an expert on segment merging.. but with the exception of 
docid order i can'tthink of many reasons why there couldn't be a merger 
that revesed the order of hte segments.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message