lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (SOLR-13190) Fuzzy search treated as server error instead of client error when terms are too complex
Date Sun, 03 Feb 2019 14:35:00 GMT


Michael McCandless commented on SOLR-13190:

+1 to improve the exception message to include the field and fuzzy term that led to this.

However, this exception is baffling because the way our FuzzyQuery works is to directly produce
an already determinized and minimized automaton – that's the beauty of the (efficient) Levenshtein
automaton construction algorithm.

So why are we then trying to determinize it again?  Something bad is lurking here – somehow
we lost track that the automaton is already determinized?

> Fuzzy search treated as server error instead of client error when terms are too complex
> ---------------------------------------------------------------------------------------
>                 Key: SOLR-13190
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: master (9.0)
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
> We've seen a fuzzy search end up breaking the automaton and getting reported as a server
error. This usage should be improved by
> 1) reporting as a client error, because it's similar to something like too many boolean
clauses queries in how an operator should deal with it
> 2) report what field is causing the error, since that currently must be deduced from
adjacent query logs and can be difficult if there are multiple terms in the search
> This trigger was added to defend against adversarial regex but somehow hits fuzzy terms
as well, I don't understand enough about the automaton mechanisms to really know how to approach
a fix there, but improving the operability is a good first step.
> relevant stack trace:
> {noformat}
> org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton
with 13632 states and 21348 transitions would result in more than 10000 states.
> 	at org.apache.lucene.util.automaton.Operations.determinize(
> 	at org.apache.lucene.util.automaton.RunAutomaton.<init>(
> 	at org.apache.lucene.util.automaton.ByteRunAutomaton.<init>(
> 	at org.apache.lucene.util.automaton.CompiledAutomaton.<init>(
> 	at org.apache.lucene.util.automaton.CompiledAutomaton.<init>(
> 	at<init>(
> 	at
> 	at$RewriteMethod.getTermsEnum(
> 	at
> 	at
> 	at
> 	at
> 	at
> 	at
> 	at
> 	at
> 	at
> 	at org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(
> 	at org.apache.solr.handler.component.QueryComponent.process(
> 	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(
> 	at org.apache.solr.handler.RequestHandlerBase.handleRequest(
> 	at org.apache.solr.core.SolrCore.execute(
> {noformat}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message