lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject Re: My new lemmatizer interfers with the highlighter
Date Tue, 16 Dec 2014 15:08:50 GMT

Thanks Ahmet,

I think I have solved the problem, but I didn't replace the line you 
suggested. Instead I added the createToken method with 
AttributeSource.State as a parameter and overrode the reset method. I 
cannot reproduce the problem anymore.

BTW, what's the purpose of AttributeSource.State? Perhaps that alone has 
solved the problem.

Erlend

On 15.12.14 16:13, Ahmet Arslan wrote:
> Hi Erlend,
>
> I have written a similar token filter. Please see :
>
> https://github.com/iorixxx/lucene-solr-analysis-turkish/blob/master/src/main/java/org/apache/lucene/analysis/tr/Zemberek2DeasciifyFilterFactory.java
>
> replace
>
> final String[] values = stemmer.stem(tokenTerm);
>
> with
>
> stack = stemmer.stem(tokenTerm);
>
> Ahmet
>
>
>
>
> On Monday, December 15, 2014 4:53 PM, Michael Sokolov <msokolov@safaribooksonline.com>
wrote:
> Well I think your first step should be finding a reproducible test case
> and encoding it as a unit test.  But I suspect ultimately the fix will
> be something to do with positionIncrement ...
>
> -Mike
>
>
> On 12/15/2014 09:08 AM, Erlend GarĂ¥sen wrote:
>> On 15.12.14 14:11, Michael Sokolov wrote:
>>> I'm not sure, but is it necessary to set positionIncAttr to 1 when there
>>> are *not* any lemmas found?  I think the usual pattern is to call
>>> clearAttributes() at the start of incrementToken
>>
>> It is set to 0 only if there are stems/lemmas found:
>> if (!terms.isEmpty()) {
>>    positionAttr.setPositionIncrement(0);
>>
>> The terms list will only contain entries if there are lemmas found.
>>
>> But maybe I should empty this list before I return true, just like this?
>>
>> if (!terms.isEmpty()) {
>>    termAtt.setEmpty().append(terms.poll());
>>    positionAttr.setPositionIncrement(0);
>>    terms.clear();
>>    return true;
>> } else if ...
>>


Mime
View raw message