lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Ryan <mr...@moreover.com>
Subject Best way to fix "Document contains at least one immense term"?
Date Tue, 01 Jul 2014 13:49:13 GMT
In LUCENE-5472, Lucene was changed to throw an error if a term is too long, rather than just
logging a message. I have fields with terms that are too long, but I don't care - I just want
to ignore them and move on.

The recommended solution in the docs is to use LengthFilterFactory, but this limits the terms
by the number of characters, rather than the number of UTF-8 bytes. So you can't just do something
clever like set max=32766, due to the possibility of multibyte characters.

So, is there a way of using LengthFilterFactory to do this such that an error will never be
thrown? Thinking I could use some max less than 32766 / 3, but I want to be absolutely sure
that there is not some edge case that is going to break. I guess I could just set it to something
sane like 1000. Or is there another more direct solution to this problem?

-Michael

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message