lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <>
Subject Re: incorrect OO in lucene source?
Date Tue, 20 Apr 2004 15:26:38 GMT
The thread safety issues are on the search side usage of Analyzer, not indexing.

>>> 04/20/04 10:52AM >>>
Grant Ingersoll wrote:

>I agree with Robert, as I have had similar wishes about more interface capabilities, but
also agree with Eric in that Lucene works great in a lot of ways.    I have found the current
design causes you to have to hard code things that shouldn't need to be hard coded, especially
in the TokenStream area.  The idea of writing a new Analyzer every time you want to change
a Tokenizer or TokenFilter is very limiting.  In my application I need the flexibility to
re-index and evaluate fairly often.  The current Analyzer implementation would require me
to write a new Analyzer for every experiment and that is not manageable.  Do others have this
I had this issue. I have solved this by rewritting the API around 
TokenStream (mainly introducing an interface that allows resetting the 
source stream) and creating a generalized analyzer class. This analyzer 
class holds a reference to the TokenStream pipeline to which it 
delegates. A PerField analyzer is populated with Analyzers configured 
from JNDI (essentially Tokenizer and TokenStreamDecorator compositions). 
When TokenStream(String fieldName, Reader reader) is called the analyzer 
resets its TokenStream reference before returning it.

>I submitted a "broken" patch that converts the analyzers and token streams to interfaces,
but as Doug pointed out, it is not currently thread safe (I have another version that uses
reflection that is thread safe).  I intend to go back and make it thread-safe, but haven't
had the time.  Anyway, this patch contains an interface implementation of Analyzer and TokenStream
that we may find useful in the future and if someone else wants to take up the ball and make
it thread-safe, I don't think it would take too long.
What were the issues related to thread safety? Are invocations of an 
analyzer within an IndexWriter not single threaded? I was unsure of 
this, but planned to object pool my TokenStream compositions if needed.

To unsubscribe, e-mail: 
For additional commands, e-mail: 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message