lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Leir <>
Subject Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory
Date Fri, 24 Nov 2017 12:19:57 GMT
There is a spec for which characters are acceptable in an email name, and another spec for
chars in a domain name. I suspect you will have more success with a tokenizer which is specialized
for email, but I have not looked at UAX29URLEmailTokenizerFactory. Does ClassicTokenizerFactory
split on hyphens? 
Cheers --Rick

On November 24, 2017 3:46:46 AM EST, Zheng Lin Edwin Yeo <> wrote:
>I am indexing email addresses into Solr via EML files. Currently, I am
>using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I
>found that we can also use UAX29URLEmailTokenizerFactory with
>Does anyone have any recommendation on which Tokenizer is better?
>I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.

Sorry for being brief. Alternate email is rickleir at yahoo dot com 
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message