lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Calderon <calderon....@gmail.com>
Subject Re: highlighter issue
Date Fri, 02 Apr 2010 21:19:11 GMT
i had tried it earlier with no effect, when i looked at the source, it
doesnt look at offsets at all, just position increments, so short of
somebody finding a better way i going to create a similar filter that
compared offsets...

On Fri, Apr 2, 2010 at 2:07 PM, Erik Hatcher <erik.hatcher@gmail.com> wrote:
> Will adding the RemoveDuplicatesTokenFilter(Factory) do the trick here?
>
>        Erik
>
> On Apr 2, 2010, at 4:13 PM, Joe Calderon wrote:
>
>> hello *, i have a field that is indexing the string "the
>> ex-girlfriend" as these tokens: [the, exgirlfriend, ex, girlfriend]
>> then they are passed to the edgengram filter, this allows me to match
>> different user spellings and allows for partial highlighting, however
>> a token like 'ex' would get generated twice which should be fine
>> except the highlighter seems to highlight that token twice even though
>> it has the same offsets (4,6)
>>
>> is there away to make the highlighter not highlight the same token
>> twice, or do i have to create a token filter that would dump tokens
>> with equal text and offsets ?
>>
>>
>> basically whats happening now is if i search
>>
>> 'the e', i get:
>> '<em>Seinfeld</em> The <em>E</em><em>E</em>x-Girlfriend'
>>
>> for 'the ex', i get:
>> '<em>Seinfeld</em> The <em>Ex</em><em>Ex</em>-Girlfriend'
>>
>> and so on
>>
>>
>> thx much
>>
>> --joe
>
>

Mime
View raw message