lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: WordDelimiterGraphFilter swallows emojis
Date Tue, 03 Jul 2018 13:27:43 GMT
> Any thoughts?

best idea I have would be to tokenize with ICUTokenizer, which will
tag emoji sequences as "<EMOJI>" token type, then use
ConditionalTokenFilter to send all tokens EXCEPT those with token type
of  "<EMOJI>" to your WordDelimiterFilter. This way
WordDelimiterFilter never sees the emoji at all and can't screw them

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message