lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Rajput <>
Subject A field-wide remove duplicate tokens filter
Date Wed, 17 Dec 2014 22:40:55 GMT
The org.apache.solr.analysis.RemoveDuplicatesTokenFilter, as per its description, "Filters
out any tokens which are at the same logical position in the tokenstream as a previous token
with the same text."
A very useful filter would be one which filters out duplicate tokens throughout the field,
irrespective of the logical position of the token. Does something like this exist already
or is being planned to be included in the coming releases?
I have an implementation of this in one of my project and can contribute if the community
finds it useful as well.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message