lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From suriya prakash <suriy...@gmail.com>
Subject Re: Email id tokenizer (actual email id & multiple terms)
Date Wed, 21 Dec 2016 12:23:58 GMT
Hi,

Thanks for your reply.

I might have one or more emailds in a single record.  So I have to index it
with white space analyser after filtering emailid alone(may be using email
id tokenizer).

Tokenization will happen twice( for normal indexing and for special emailid
field indexing) which is costly for content field.

Is there any way to do it efficiently? will TeeSinkTokenFilter help for my
case?



On Tue, Dec 20, 2016 at 7:45 PM, suriya prakash <suriya3x@gmail.com> wrote:

> Hi,
>
> I am using standard analyzer and want to split token for email_id "
> lucene@gmail.com" as "lucene", "gmail","com","lucene@gmail.com" in a
> single pass.
>
> I have already changed jflex to split email id as separate words(lucene,
> gmail, com). But we need to do phrase search which will not be efficient.
> So i want to index actual email id and splitted words.
>
> Can you please help me to achieve this. OR let me know whether phrase
> search is efficient for this case?
>
>
> Regards,
> Suriya
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message