lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Shane <sha...@LEXUM.UMontreal.CA>
Subject Re: Lucene 2.9.0-rc5 : Reader stays open after IndexWriter.updateDocument(), is that possible?
Date Mon, 28 Sep 2009 16:25:25 GMT
Oh boy!

It seems like I have found the problem in my case, which afaik, has 
nothing to do with lucene but rather the library we use to tokenize HTML 
document. Its just that we have changed our HTML parser at the same time 
as the version of Lucene and nekoHTML (cyberneko) does not close its 
HTML reader even when we call parser.abort()/parser.close() (which is 
placed in the close() of the lucene Tokenizer()).

Before that, the HTML parser would close the reader so I wrongfully 
thought it was the change of version of Lucene that caused this.

Bad news is that I had you all worked up for nothing, but good news is 
you don't have any bugs here.

However, they may be something with the fact that Lucene's Analyzers 
automatically close the reader when its done analyzing. I think this 
encourages people not to explicitly close them, and creates the 
potential of having open fd's if an exception is thrown in the middle of 
the analysis or before addDocument/updateDocument is called.

I don't think changing the API of Field to accept a "ReaderFactory" 
would solve anything because there are cases where you must index a 
reader that is already opened (like a network connection) and wrapping 
it with a dummy readerFactory does not look very good.

Daniel Shane

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message