lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: New Token API was Re: Payloads and TrieRangeQuery
Date Mon, 15 Jun 2009 00:05:00 GMT
On 6/14/09 5:17 AM, Grant Ingersoll wrote:
> Agreed.  I've been bringing it up for a while now and made the same 
> comments when it was first introduced, but felt like the lone voice in 
> the wilderness on it and gave way [1], [2], [3].  Now that others are 
> writing/converting, I think it is worth revisiting.
>

I am and always was open to constructive suggestions about how to design 
this API.  I know these new APIs currently don't seem to have many 
advantages over the previous ones, but they're basically laying the API 
groundwork for future features like flexible indexing. Some concerns you 
mentioned were targeted against the first version of the patch in 
LUCENE-1422. But, you later said you liked how the next patch looked (in 
thread [2] that you mentioned).

> That being said, I did just write my first TokenFilter with it, and 
> didn't think it was that hard.  There are some gains in it and the API 
> can be simpler if you just need one or two attributes (see 
> DelimitedPayloadTokenFilter), although, just like the move to using 
> char [] in Token, as soon as you do something like store a Token, you 
> lose most of the benefit, I think (for the char [] case, as soon as 
> you need a String in one of your filters, you lose the perf. gain).  
> The annoying parts are that you still have to implement the deprecated 
> next() part, otherwise chances are the thing is unusable by everyone 
> at this point anyway.
>
I'm not sure why this (currently having to implement next() too) is such 
an issue for you. You brought it up at the Lucene meetup too. No user 
will ever have to implement both (the new API and the old) in their 
streams/filters. The only reason why we did it this way is to not 
sacrifice performance for existing streams/filters when people switch to 
Lucene 2.9. I explained this point in the jira issue:

http://issues.apache.org/jira/browse/LUCENE-1422?focusedCommentId=12644881&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12644881

The only time when we'll ever have to implement both APIs is between now 
and 2.9, only for new streams and filters that we add before 2.9 is 
released. I don't think it'd be reasonable to consider this disadvantage 
as a show stopper.
> Add on top of it, that the whole point of customizing the chain is to 
> use it in search and, frankly speaking, somehow I think that part of 
> the patch was held back.
>
I'm not sure what you're implying. Could you elaborate?

The search side of the API is currently being developed in Lucene-1458. 
1458 will not make it into 2.9. Therefore I agree that it is not very 
advantageous to switch to the new API right now for Lucene users. On the 
other hand, I don't think it hurts either.
> I personally would vote for reverting until a complete patch that 
> addresses both sides of the problem is submitted and a better solution 
> to cloning is put forth.
>
If we revert now and put a new flexible API like this into 3.x, which I 
think is necessary to utilize flexible indexing, then we'll have to wait 
until 4.0 before we can remove the old API. Disadvantages like the one 
you mentioned above, will then probably be present much longer.

I mentioned in the following thread that I have started working on a 
better way of cloning, which will actually be faster compared to the old 
API. I'll try to get the code out asap.
http://markmail.org/message/q7pgh2qlm2w7cxfx

I'd be happy to discuss other API proposals that anybody brings up here, 
that have the same advantages and are more intuitive. We could also beef 
up the documentation and give a better example about how to convert a 
stream/filter from the old to the new API; a constructive suggestion 
that Uwe made at the ApacheCon.

-Michael

> -Grant
>
> [1] http://issues.apache.org/jira/browse/LUCENE-1422,
> [2] 
> http://www.lucidimagination.com/search/document/5daf6d7b8027b4d3/tokenstream_and_token_apis#9e2d0d2b5dc118d4,

> and the rest of the discussion on that thread.
> [3] 
> http://www.lucidimagination.com/search/document/4274335abcf31926/new_tokenstream_api_usage

>
>
> On Jun 13, 2009, at 10:32 PM, Mark Miller wrote:
>
>> What was the big improvement with it again? Advanced, expert custom 
>> indexing chains require less casting or something right?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message