lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: URL/Email tokenizer
Date Tue, 17 Feb 2015 11:57:58 GMT
Ah, you want to do it the hard way.  Sorry, can't help you there - I
prefer to do things the simple way - easier to write and to maintain
and, in my experience, usually more robust in the long run.


--
Ian.


On Tue, Feb 17, 2015 at 11:42 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> Thanks Ian
>
> What I am currently doing is duplicating the data into 2 different fields
> and having my own PerFieldAnalyzerWrapper just like you pointed out
>
> Is there a good way to do this in a single-pass? Like how Bi-Grams or
> Common-Grams do…
>
> --
> Ravi
>
> On Tue, Feb 17, 2015 at 3:08 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> Sounds like a job for
>> org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.
>>
>>
>> --
>> Ian.
>>
>>
>> On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
>> <ravikumar.govindarajan@gmail.com> wrote:
>> > We have a requirement in that E-mail addresses need to be added in a
>> > tokenized form to one field while untokenized form is added to another
>> field
>> >
>> > Ex:
>> >
>> > "I have mailed abc@xyz.com" . It should tokenize as below
>> >
>> > body = {"I", "have", "mailed", "abc", "xyz", "com"};
>> >
>> > I also have a body-addr field. Tokenizer needs to extract e-mail
>> addresses
>> > from body field and add them as below
>> >
>> > body-addr = {"abc@xyz.com"}
>> >
>> > How to achieve this via tokenizer chain?
>> >
>> > --
>> > Ravi
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message