lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lasitha Wattaladeniya <watt...@gmail.com>
Subject Re: Highlighting words with special characters
Date Thu, 20 Jul 2017 02:31:48 GMT
Hi ahmet,

But I have NgramTokenizerFactory at the end of indexing analyzer chain.
Therefore I should still tokenize the email address. But how this affects
the highlighting?, that's what I'm confused to understand

Solr version : 4.10.4

Regards,
Lasitha

On 20 Jul 2017 08:28, "Ahmet Arslan" <iorixxx@yahoo.com.invalid> wrote:

Hi,
Maybe name of the UAX29URLEMailTokenizer is deceiving you?It does *not*
tokenize URLs and Emails. Actually it recognises them and emits them as a
single token.
Ahmet

On Wednesday, July 19, 2017, 12:00:05 PM GMT+3, Lasitha Wattaladeniya <
wattale@gmail.com> wrote:

Update,

I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and
now it shows highlighted text fragments in the indexed email text.

But I don't understand this behavior. Can someone shed some light please

On 18 Jul 2017 14:18, "Lasitha Wattaladeniya" <wattale@gmail.com> wrote:

> Further more, ngram field has following tokenizer/filter chain in index
> and query
>
> UAX29URLEmailTokenizerFactory (only in index)
> stopFilterFactory
> LowerCaseFilterFactory
> ASCIIFoldingFilterFactory
> EnglishPossessiveFilterFactory
> StemmerOverrideFilterFactory (only in query)
> NgramTokenizerFactory (only in index)
>
> Regards,
> Lasitha
>
> On 18 Jul 2017 14:11, "Lasitha Wattaladeniya" <wattale@gmail.com> wrote:
>
>> Hi devs,
>>
>> I have setup solr highlighting with default setup (only changed the
>> fragsize to 0 to match any field length). It worked fine but recently I
>> discovered it doesn't highlight for words with special characters in the
>> middle.
>>
>> For an example, let's say I have indexed email address test.fsdg@ran.com
>> to a ngram field. And when I search for the partial text fsdg, I get the
>> results but it's not highlighted. It works in all other scenarios as
>> expected.
>>
>> The ngram field has termVectors, termPositions, termOffsets set to true.
>>
>> Can somebody please suggest me, what may be wrong here?
>>
>> (sorry for the unstructured text. Typed using a mobile phone )
>>
>> Regards
>> Lasitha
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message