lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio GarcĂ­a Maroto <marot...@gmail.com>
Subject Re: Strip out punctuation at the end of token
Date Fri, 24 Nov 2017 15:13:26 GMT
Yes. You are right. I understand now.
Let me explain my issue a bit better with the exact problem i have.

I have this text "Information number  61149-008."
Using the tokenizers and filters described previously i get this list of
tokens.
information
number
61149-008.
61149
008

Basically last token   "61149-008."  gets tokenized as
61149-008.
61149
008
User is searching for "61149-008" without dot, so this is not a match.
I don't want to change the tokenization on the query to avoid altering the
matches for other cases.

I would like to delete the dot at the end. Basically generate this extra
token
information
number
61149-008.
61149
008
61149-008

Not sure if what I am saying make sense or there is other way to do this
right.

Thanks a lot
Sergio


On 24 November 2017 at 15:31, Shawn Heisey <apache@elyograg.org> wrote:

> On 11/24/2017 2:32 AM, marotosg wrote:
>
>> Hi Shaw.
>> Thanks for your reply. Actually my issue is with the last token. It looks
>> like for the last token of a string. It keeps the dot.
>>
>> In your case Testing. This is a test. Test.
>>
>> Keeps the "Test."
>>
>> Is there any reason I can't see for that behauviour?
>>
>
> I am really not sure what you're saying here.
>
> Every token is duplicated, one has the dot and one doesn't.  This is what
> you wanted based on what I read in your initial email.
>
> Making a guess as to what you're asking about this time: If you're
> noticing that there isn't a "Test" as the last token on the line for WDF,
> then I have to tell you that it actually is there, the display was simply
> too wide for the browser window. Scrolling horizontally would be required
> to see the whole thing.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message