lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: AlphaNumeric analyzer/tokenizer
Date Mon, 19 Aug 2019 06:30:50 GMT
You already got many responses. Check you inbox.

Uwe

Am August 19, 2019 6:23:20 AM UTC schrieb Abhishek Chauhan <abhishek.chauhan792@gmail.com>:
>Hi,
>
>Can someone please check the above mail and provide some feedback?
>
>Thanks and Regards,
>Abhishek
>
>On Fri, Aug 16, 2019 at 2:52 PM Abhishek Chauhan <
>abhishek.chauhan792@gmail.com> wrote:
>
>> Hi,
>>
>> We have been using SimpleAnalyzer which keeps only letters in its
>tokens.
>> This limits us to search in strings that contains both letters and
>numbers.
>> For e.g. "axt1234". SimpleAnalyzer would only enable us to search for
>"axt"
>> successfully, but search strings like "axt1", "axt123" etc would give
>no
>> results because while indexing it ignored the numbers.
>>
>> I can use StandardAnalyzer or WhitespaceAnalyzer but I want to
>tokenize on
>> underscores also
>> which these analyzers don't do. I have also looked at
>WordDelimiterFilter
>> which will split "axt1234" into "axt" and "1234". However, using this
>also,
>> I cannot search for "axt12" etc.
>>
>> Is there something like an Alphanumeric analyzer which would be very
>> similar to SimpleAnalzyer but in addition to letters it would also
>keep
>> digits in its tokens? I am willing contribute such an analyzer if one
>is
>> not available.
>>
>> Thanks and Regards,
>> Abhishek
>>
>>
>>

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message