lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Grigorov <mgrigo...@apache.org>
Subject Re: AlphaNumeric analyzer/tokenizer
Date Mon, 19 Aug 2019 07:32:58 GMT
Hi,


On Mon, Aug 19, 2019 at 9:31 AM Uwe Schindler <uwe@thetaphi.de> wrote:

> You already got many responses. Check you inbox.
>

"many" made me think that I've also missed something.
https://markmail.org/message/ohv5qcvxilj3n3fb


>
> Uwe
>
> Am August 19, 2019 6:23:20 AM UTC schrieb Abhishek Chauhan <
> abhishek.chauhan792@gmail.com>:
> >Hi,
> >
> >Can someone please check the above mail and provide some feedback?
> >
> >Thanks and Regards,
> >Abhishek
> >
> >On Fri, Aug 16, 2019 at 2:52 PM Abhishek Chauhan <
> >abhishek.chauhan792@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> We have been using SimpleAnalyzer which keeps only letters in its
> >tokens.
> >> This limits us to search in strings that contains both letters and
> >numbers.
> >> For e.g. "axt1234". SimpleAnalyzer would only enable us to search for
> >"axt"
> >> successfully, but search strings like "axt1", "axt123" etc would give
> >no
> >> results because while indexing it ignored the numbers.
> >>
> >> I can use StandardAnalyzer or WhitespaceAnalyzer but I want to
> >tokenize on
> >> underscores also
> >> which these analyzers don't do. I have also looked at
> >WordDelimiterFilter
> >> which will split "axt1234" into "axt" and "1234". However, using this
> >also,
> >> I cannot search for "axt12" etc.
> >>
> >> Is there something like an Alphanumeric analyzer which would be very
> >> similar to SimpleAnalzyer but in addition to letters it would also
> >keep
> >> digits in its tokens? I am willing contribute such an analyzer if one
> >is
> >> not available.
> >>
> >> Thanks and Regards,
> >> Abhishek
> >>
> >>
> >>
>
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message