lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Custom indexing
Date Tue, 12 Apr 2016 15:15:12 GMT
The standard analyzer/tokenizer should do a decent job of splitting on dot,
hyphen, and underscore, in addition to whitespace and other punctuation.

Can you post some specific test cases you are concerned with? (You should
always run some test cases.)

-- Jack Krupansky

On Tue, Apr 12, 2016 at 10:35 AM, Ahmet Arslan <iorixxx@yahoo.com.invalid>
wrote:

> Hi Chamarty,
>
> Well, there are a lot of options here.
>
> 1) Use LetterTokenizer
> 2) Use WordDelimeterFilter combined with WhiteSpaceTokenizer
> 3) Use MappingCharFilter to replace those characters with spaces
> .
> .
> .
>
> Ahmet
>
>
> On Tuesday, April 12, 2016 3:58 PM, PrasannaKumar Chamarty <
> tech.kumarpch@gmail.com> wrote:
>
>
>
> Hi,
>
> What is the best way (in terms of maintenance required with new lucene
> releases) to allow splitting of words on "." and "_" for indexing ? Thank
> you.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message