lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From robert engels <>
Subject Re: Back Compatibility
Date Wed, 23 Jan 2008 03:18:26 GMT
A specific example:

You have a criminal justice system that indexes past court cases.

You do a search for cases involving Joe Smith because you are a judge  
and you want to review priors before sentencing. Similar issues with  
related cases, case history, etc.

Is it better to return something that may not be correct, or return  
an error saying the index is offline and is being rebuilt - please  
perform your search later?  In this case old false positives are just  
as bad as missing new records. I hope that demonstrates the position  

As I stated, there are several classes of applications where "any  
data" whether it is current or valid is acceptable, but I would argue  
that in MOST cases this is not the case, and if the interested  
subjects fully reviewed their requirements they would not accept that  
solution. It is easily summarized with the old adage "garbage in,  
garbage out".

The only reason that corruption is ok is that you need to reindex  
anyway, and rebuilding from scratch is often faster than determining  
the affected documents and updating (especially if corruption is a  

It was in fact me that brought about the issue that none of the  
"lockless commits" code fixed anything related to corruption.  The  
only way to ensure non-corruption is to sync all data files, then  
write and sync the segments file.  I think this change could have  
been accomplished in about 10 lines of code, and is completely  
independent of lockless commits, and in most cases makes lockless  
commits obsolete.  But to be honest, I am not really certain how  
lockless commits can actually work in an environment that allows  
updates to the documents (and or related resources), so I am sure  
there are aspects I am just ignorant of.

As an aside, we engineered our software years ago to work around  
these issues, which why we still use a 1.9 derivative, and monitor  
the trunk for important fixes an enhancements.

On Jan 22, 2008, at 8:35 PM, Mark Miller wrote:

> robert engels wrote:
>> I think there are a lot of applications using Lucene where  
>> "whether its lost a bit of data or not" is not acceptable.
> Yeah, and I have one of them. Which is why I would love the support  
> your talking about. But its not there yet and I am just grateful  
> that i can get my customers back up and searching as quick as  
> possible rather than experience an index corruption. Access to the  
> data is more important than complete access to the data for my  
> customers (though theyd say they certainly want both). After such  
> an experience I have to run through the database and check if  
> anything from the index is missing, and if it is, re index. Not  
> ideal, but what can you do? I find it odd that you don't think non  
> corruption is better than nothing. Its a big feature for me. If the  
> server reboots at night and causes a corruption, I have customers  
> that will be SOL for some prefer when the server reboots,  
> my index - whatever is left, is searchable. My customers need to  
> work. Can't get behind on a daily product :)
> I'd prefer what your talking about, but there are tons of other  
> things I'd love to see in Lucene as well. It just seems odd to  
> complain about them. I'd think that instead, I might spear head the  
> development. Just not experienced enough myself to do a lot of the  
> deeper work. You don't appear so limited. How about helping out  
> with some transactional support :)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message