lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements
Date Wed, 01 Jul 2009 13:55:00 GMT
After thinking one more time about it, there are still possibilities to
break BW in some corner cases with non-final classes. See my last comment in
the issue!


This new API is hard to integrate in a BW compatible way. But in all cases,
this case and the new TokenStream API is a show stopper for 2.9.




Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen


From: Uwe Schindler [] 
Sent: Wednesday, July 01, 2009 3:34 PM
Subject: RE: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API


I do not work on this at the moment. The current status is:

- All final token stream/filter classes only need to implement
incrementToken() as they have no backwards compatibility problem.

- If it is possible for somebody (or better likely the case) to override one
of both methods, where only one of them is implemented in lucene, there may
be the problem that overridden code is never called (because incrementToken
is always preferred).


So: For most analyzers from Lucene core this is no problem, as they are
final. Only classes like CharTokenizer, that are written explicitely for
subclassing must implement both. I will revert my changes (remove all
next(Token) from all streams) soon (post a new patch). But this problem is
the same we had before: If one overrides only next() in one of the non-final
streams, this method is also never called. So this was a break in backwards
compatibility in the past, too.


My recommendation: Even if classes are not final, we could convert
next(Token) to incrementToken() and remove the old ones, if it is not likely
the case, that somebody have tried to customize one of the deprecated
methods. And a clear not in the Changes.txt.


In my opinion, this small backwards break is better than creating a switch
like userNewAPI, that is never set to 1, so the code is never used in Lucene
2.9 and everybody breaks when 3.0 comes out.


In my opinion, this patch can be extensive tested, the questions about
backwards compatibility are solved. What is still broken is Sink and
TeeTokenizer, the old classes must be deprecated, because it is different if
you store attribute states in the map or Tokens and is not implementable in
one class (it is already broken in trunk, as the javadocs always talk about
Token instances in the Set, but when useneAPI is on they are suddenly
Attribute states). I hope Michael will have time to review this, too. It
seems that we are the only ones who can follow the thread :-) and all these
"hey here is a BW break." hick-hacks.




Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen


From: Mark Miller [] 
Sent: Wednesday, July 01, 2009 3:14 PM
Subject: Re: [jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API


Hows the progress here guys? I have 2 or 3 issues that relate to this, and I
really don't want to commit/finish them until this is done ...

View raw message