lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: URL/Email tokenizer
Date Tue, 17 Feb 2015 09:38:12 GMT
Sounds like a job for
org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.


--
Ian.


On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> We have a requirement in that E-mail addresses need to be added in a
> tokenized form to one field while untokenized form is added to another field
>
> Ex:
>
> "I have mailed abc@xyz.com" . It should tokenize as below
>
> body = {"I", "have", "mailed", "abc", "xyz", "com"};
>
> I also have a body-addr field. Tokenizer needs to extract e-mail addresses
> from body field and add them as below
>
> body-addr = {"abc@xyz.com"}
>
> How to achieve this via tokenizer chain?
>
> --
> Ravi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message