lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory
Date Sat, 25 Nov 2017 04:38:40 GMT
Hi Ahmet,

Ok. Thanks for your advice.

Regards,
Edwin

On 25 November 2017 at 10:23, Ahmet Arslan <iorixxx@yahoo.com> wrote:

>
>
> Hi Zheng,
>
> UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps
> them single token.
>
> StandardTokenizer produce two or more tokens for an entity.
>
> Please try them using the analysis page, use which one suits your
> requirements.
>
> Ahmet
>
>
>
> On Friday, November 24, 2017, 11:46:57 AM GMT+3, Zheng Lin Edwin Yeo <
> edwinyeozl@gmail.com> wrote:
>
>
>
>
>
> Hi,
>
> I am indexing email addresses into Solr via EML files. Currently, I am
> using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I also
> found that we can also use UAX29URLEmailTokenizerFactory with
> LowerCaseFilterFactory.
>
> Does anyone have any recommendation on which Tokenizer is better?
>
> I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.
>
> Regards,
> Edwin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message