lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Wrobel <kwro...@hawkusa.com>
Subject Re: How to debug empty ParsedQuery from Edismax Query Parser
Date Fri, 04 Jan 2019 17:33:07 GMT
I'd like to follow up on this post here because it has become relevant to me now.

I have set up a debugging environment and took a deep-dive into the SOLR 7.6.0 source code
with Eclipse as my IDE of choice for this task. I have isolated the exact line as to where
things fall apart for my two sample queries that I have been testing with, which are "q=a3f*"
and "q=aa3f*. As you can see here, the only visible difference between the two search terms
are that the second search term has two characters in succession before switching to a numerical
portion.

First things first, the Extended Dismax Query Parser hands over portions of the parsing to
the Standard Query Parser early on the the parsing process. Following down the rabbit hole,
I ended up in SolrQueryParserBase.getPrefixQuery() method. On line 1173 of this method, we
have the following statement:

	termStr = analyzeIfMultitermTermText(field, termStr, schema.getFieldType(field));

This statement, when executing with the "a3f" search term, returns "a3f" as a result. However,
when using "aa3f", it throws a SolrException with excatly the same multi-term error as shown
below, only like this:
> analyzer returned too many terms for multiTerm term: aa3f

At this point, I would like to reiterate the purpose of our search: we are a part number house.
We deal with millions of part numbers in our system and on our web site. A customer of ours
typically searches our site with a given part number (or SKU if you will). Some part numbers
are intelligent, and so customers might reduce the part number string to a portion at the
beginning. Either way, it is *not* a typical "word" based search. Yet, the system (Drupal)
does treat those two query fields like standard "Text" search fields. Those who know Drupal
Commerce will recognize the Title field of a node and also possible the Product Variation
or (SKU) field.

With that in mind, multi-term was introduced with SOLR 5, and I think this error (or limitation)
has probably been in SOLR 5 since then. Can anyone closer to the matter or having struggled
with this same issue chime in on the subject?

Kind regards,

Kay

> On Dec 28, 2018, at 9:57 AM, Kay Wrobel <kwrobel@hawkusa.com> wrote:
> 
> Here are my log entries:
> 
> SOLR 7.x (non-working)
> 2018-12-28 15:36:32.786 INFO  (qtp1769193365-20) [   x:collection1] o.a.s.c.S.Request
[collection1]  webapp=/solr path=/select params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
hits=0 status=0 QTime=2
> 
> SOLR 4.x (working)
> INFO  - 2018-12-28 15:43:41.938; org.apache.solr.core.SolrCore; [collection1] webapp=/solr
path=/select params={q=ac6023*&qf=tm_field_product^21.0&qf=tm_title_field^8.0&EchoParams=all&rows=10&wt=xml&debugQuery=true}
hits=32 status=0 QTime=8 
> 
> EchoParams=all did not show anything different in the resulting XML from SOLR 7.x.
> 
> 
> I found out something curious yesterday. When I try to force the Standard query parser
on SOLR 7.x using the same query, but adding "defType=lucene" at the beginning, SOLR 7 raises
a SolrException with this message: "analyzer returned too many terms for multiTerm term: ac6023"
(full response: https://pastebin.com/ijdBj4GF)
> 
> Log entry for that request:
> 2018-12-28 15:50:58.804 ERROR (qtp1769193365-15) [   x:collection1] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: analyzer returned too many terms for multiTerm term:
ac6023
>        at org.apache.solr.schema.TextField.analyzeMultiTerm(TextField.java:180)
>        at org.apache.solr.parser.SolrQueryParserBase.analyzeIfMultitermTermText(SolrQueryParserBase.java:992)
>        at org.apache.solr.parser.SolrQueryParserBase.getPrefixQuery(SolrQueryParserBase.java:1173)
>        at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:781)
>        at org.apache.solr.parser.QueryParser.Term(QueryParser.java:421)
>        at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:278)
>        at org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)
>        at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)
>        at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:254)
>        at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)
>        at org.apache.solr.search.QParser.getQuery(QParser.java:173)
>        at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:160)
>        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:279)
>        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
>        at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
>        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
>        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
>        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
>        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
>        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
>        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
>        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
>        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
>        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
>        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
>        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
>        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
>        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
>        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
>        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
>        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
>        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>        at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
>        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
>        at org.eclipse.jetty.server.Server.handle(Server.java:531)
>        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
>        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
>        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
>        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
>        at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
>        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
>        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
>        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
>        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
>        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
>        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
>        at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
>        at java.lang.Thread.run(Thread.java:748)
> 
> 2018-12-28 15:50:58.809 INFO  (qtp1769193365-15) [   x:collection1] o.a.s.c.S.Request
[collection1]  webapp=/solr path=/select params={q=ac6023*&defType=lucene&qf=tm_field_product^21.0&qf=tm_title_field^8.0&rows=10&wt=xml&debugQuery=true}
status=400 QTime=5
> 
> Could this be a multi-term issue, but EDisMax does not actually raise an exception???
> 
> 
>> On Dec 27, 2018, at 9:32 PM, Alexandre Rafalovitch <arafalov@gmail.com> wrote:
>> 
>> EchoParams=all
>> 
>> May also be helpful to pinpoint differences in params from all sources,
>> including request handler defaults.
>> 
>> Regards,
>>   Alex
>> 
>> On Thu, Dec 27, 2018, 8:25 PM Shawn Heisey <apache@elyograg.org wrote:
>> 
>>> On 12/27/2018 10:47 AM, Kay Wrobel wrote:
>>>> Now starting from SOLR version 5+, I receive zero (0) results back, but
>>> more importantly, the Query Parser produces an empty parsedQuery.
>>>> 
>>>> Here is the same query issued to SOLR 7.6.0 (current version):
>>>> https://pastebin.com/XcNhfdUD <https://pastebin.com/XcNhfdUD>
>>>> 
>>>> Notice how "parsedQuery" now shows "+()"; an empty query string.
>>> 
>>> I can duplicate this result on a 7.5.0 example config by sending an
>>> edismax query with undefined parameters for df and qf. The other
>>> field-related parameters for edismax are also undefined.  The following
>>> URL parameters with the default example config will produce that parsed
>>> query:
>>> 
>>> q=ac6023*&defType=edismax&df=&qf=&debugQuery=on
>>> 
>>> When a query is made and Solr's logging configuration is at its default
>>> setting, Solr will log a line into its logfile containing all of the
>>> parameters in the query, both those provided on the URL and those set by
>>> Solr's configuration (solrconfig.xml).  Can you share this log line from
>>> both the version that works and the version that doesn't?
>>> 
>>> This is the log line created when I ran the query mentioned above:
>>> 
>>> 2018-12-27 23:03:23.199 INFO  (qtp315932542-23) [   x:baz]
>>> o.a.s.c.S.Request [baz]  webapp=/solr path=/select
>>> params={q=ac6023*&defType=edismax&df=&qf=&debugQuery=on} hits=0
status=0
>>> QTime=0
>>> 
>>> What I'm thinking is that there is a difference in the configuration of
>>> the two servers or the actual query being sent is different.  Either
>>> way, there's something different.  The two log lines that I have asked
>>> for are likely to be different from each other in some way that will
>>> explain what you're seeing.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
> 


-- 

The information in this e-mail is confidential and is intended solely for 
the addressee(s). Access to this email by anyone else is unauthorized. If 
you are not an intended recipient, you may not print, save or otherwise 
store the e-mail or any of the contents thereof in electronic or physical 
form, nor copy, use or disseminate the information contained in the email.  
If you are not an intended recipient,  please notify the sender of this 
email immediately.

Mime
View raw message