lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Chauhan <abhishek.chauhan...@gmail.com>
Subject Re: AlphaNumeric analyzer/tokenizer
Date Mon, 19 Aug 2019 06:23:20 GMT
Hi,

Can someone please check the above mail and provide some feedback?

Thanks and Regards,
Abhishek

On Fri, Aug 16, 2019 at 2:52 PM Abhishek Chauhan <
abhishek.chauhan792@gmail.com> wrote:

> Hi,
>
> We have been using SimpleAnalyzer which keeps only letters in its tokens.
> This limits us to search in strings that contains both letters and numbers.
> For e.g. "axt1234". SimpleAnalyzer would only enable us to search for "axt"
> successfully, but search strings like "axt1", "axt123" etc would give no
> results because while indexing it ignored the numbers.
>
> I can use StandardAnalyzer or WhitespaceAnalyzer but I want to tokenize on
> underscores also
> which these analyzers don't do. I have also looked at WordDelimiterFilter
> which will split "axt1234" into "axt" and "1234". However, using this also,
> I cannot search for "axt12" etc.
>
> Is there something like an Alphanumeric analyzer which would be very
> similar to SimpleAnalzyer but in addition to letters it would also keep
> digits in its tokens? I am willing contribute such an analyzer if one is
> not available.
>
> Thanks and Regards,
> Abhishek
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message