lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: AW: plus sign in request / looking for + in title
Date Fri, 04 Aug 2017 15:33:16 GMT
Glad to hear it. Two things:

1> you might have to do some additional filtering when using
WhitespaceTokenizer. It, well, splits on whitespace so things like
punctuation will come through as part of the token. So "My dog has
fleas." (note the period after fleas) would have the period included
in the token "fleas.".

2> getting the plus sign through URL encoding and the parser may be
fun, you may have to escape it to keep it from being interpreted as an
operator....

Best,
Erick

On Fri, Aug 4, 2017 at 5:55 AM, d.kumar@technisat.de
<d.kumar@technisat.de> wrote:
> Hey, thanks.
>
> Yeah i found a  way..
> I sued for these files my on fieldtype. In these I'm using the WhitespaceTokenizerFactory
for query an index.. and now everything is like it should be..
>
> :-)
>
> Thanks
>
> David
>
> -----Urspr√ľngliche Nachricht-----
> Von: Shawn Heisey [mailto:apache@elyograg.org]
> Gesendet: Freitag, 4. August 2017 14:53
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: plus sign in request / looking for + in title
>
> On 8/4/2017 2:15 AM, d.kumar@technisat.de wrote:
>> So how can I prevent e.g. the ST (standartTokenizer) to remove the plus sign? An
suggestions?
>
> You can't.  The standard tokenizer really isn't configurable at all.
>
> You'd need to change your analysis chain (tokenizer and filters) to produce the results
you want.
>
> Thanks,
> Shawn
>

Mime
View raw message