lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: SOLR Tokenizer “solr.SimplePatternSplitTokenizerFactory” splits at unexpected characters
Date Tue, 26 Feb 2019 16:56:50 GMT
On 2/26/2019 12:18 AM, Stephan Damson wrote:
> If we take the example input "operative", the analyzer shows that during indexing, the
input gets split into the tokens "ope", "a" and "ive", that is the tokenizer splits at the
characters "r" and "t", and not at the expected whitespace characters (CR, TAB). Just to be
sure I also tried to use more than one backspace in the pattern (e.g. \t and \\t<file:///\\t>),
but this did not change how the input is tokenized during indexing.

I tried your fieldType on 7.5.0 and I see the same problem.  I couldn't 
get it working no matter what I tried.

I then tested it on 7.7.0 and it works properly in that version.


View raw message