lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Angel Todorov <>
Subject Re: FreeTextSuggester throwing error "token must not contain separator byte"
Date Tue, 25 Jul 2017 09:32:09 GMT
Hi guys,

Thank you very much for the help. I think I see what is going on. yes it is
related to the Shingle filter added to the analyzer. It shouldn't be there
if a FreeTextLookup factory is used in the suggester, because it creates
conflict. The StandardTokenizer removes punctuation, including spaces, but
then after the shingles are generated extra whitespace is added in between
the shingles, and this makes the freetext  analyzer  / lookup throw an

Unfortunately, I have tried without the shingles approach, and made it work
some time ago, but it doesn't produce the expected results. I mean, it's
not doing what Google's auto suggest is doing so to speak. Let me give you
a couple of examples:

Input: don (without the quotes)
Output: only single terms. "donald", but not "donald trump", for example

Input: "don" (with quotes)
Output: multi-terms only, but the first term must start with don. So it
still doesn't output "donald trump".

Input: "donald t" (with quotes)
Output: I also get all terms starting with "t", which I don't want

So I am thinking SOLR / Elasticsearch really needs a brand new suggester
implementation. Since most people are using Google as the "example", it
should work as it works there.

Thanks again,

On Tue, Jul 25, 2017 at 12:00 PM, alessandro.benedetti <
> wrote:

> I think this bit is the problem :
> "I am using a Shingle filter right after the StandardTokenizer, not sure if
> that has anything to do with it. "
> When using the FreeTextLookup approach, you don't need to use shingles in
> your analyser, shingles are added by the suggester itself.
> As Erick mentioned, the reason spaces come back is because you produce
> shingles on your own and then the Lookup approach will add additional
> shingles.
> I recommend to read this section of my blog [1] ( you may have read it as
> there is one comment with a similar problem to you)
> [1]
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. -
> --
> View this message in context: http://lucene.472066.n3.
> contain-separator-byte-tp4347406p4347454.html
> Sent from the Solr - User mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message