lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Staley, Phil R - DCF" <Phil.Sta...@wisconsin.gov>
Subject Re: strange behavior of solr query parser
Date Mon, 02 Mar 2020 13:37:09 GMT
I believe we are experiencing the same thing.


We recently upgraded to our Drupal 8 sites to SOLR 8.3.1.  We are now getting reports of certain
patterns of search terms resulting in an error that reads, “The website encountered an unexpected
error. Please try again later.”



Below is a list of example terms that always result in this error and a similar list that
works fine.  The problem pattern seems to be a search term that contains 2 or 3 characters
followed by a space, followed by additional text.



To confirm that the problem is version 8 of SOLR, I have updated our local and UAT sites with
the latest Drupal updates that did include an update to the Search API Solr module and tested
the terms below under SOLR 7.7.2, 8.3.1, and 8.4.1.  Under version 7.7.2  everything works
fine. Under either of the version 8, the problem returns.



Thoughts?



Search terms that result in error

  *   w-2 agency directory
  *   agency w-2 directory
  *   w-2 agency
  *   w-2 directory
  *   w2 agency directory
  *   w2 agency
  *   w2 directory



Search terms that do not result in error

  *   w-22 agency directory
  *   agency directory w-2
  *   agency w-2directory
  *   agencyw-2 directory
  *   w-2
  *   w2
  *   agency directory
  *   agency
  *   directory
  *   -2 agency directory
  *   2 agency directory
  *   w-2agency directory
  *   w2agency directory




________________________________
From: Hongtai Xue <hxue@yahoo-corp.jp>
Sent: Monday, March 2, 2020 3:45 AM
To: solr_user lucene_apache <solr-user@lucene.apache.org>
Cc: dev@lucene.apache.org <dev@lucene.apache.org>
Subject: strange behavior of solr query parser


Hi,



Our team found a strange behavior of solr query parser.

In some specific cases, some conditional clauses on unindexed field will be ignored.



for query like, q=A:1 OR B:1 OR A:2 OR B:2

if field B is not indexed(but docValues="true"), "B:1" will be lost.



but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,

it will work perfect.



the only difference of two queries is that they are wrote in different orders.

one is ABAB, another is AABB,



■reproduce steps and example explanation

you can easily reproduce this problem on a solr collection with _default configset and exampledocs/books.csv
data.



1. create a _default collection

bin/solr create -c books -s 2 -rf 2



2. post books.csv.

bin/post -c books example/exampledocs/books.csv



3. run following query.

http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29&debug=query





I printed query parsing debug information.

you can tell "name_str:Foundation" is lost.



query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"

(please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 64 61 74 69
6f 6e")

--------

  "debug":{

    "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)",

    "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd)",

    "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]]))",

    "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72
65 67]])",

    "QParser":"LuceneQParser"}}

--------



but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",

everything is OK. "name_str:Foundation" is not lost.

--------

  "debug":{

    "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)",

    "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd)",

    "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] TO [46 6f
75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]])))",

    "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 74 69 6f 6e] TO
[46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 67] TO [4a 68 65 72 65 67]]))",

    "QParser":"LuceneQParser"}}

--------

http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29&debug=query



we did a little bit research, and we wander if it is a bug of SolrQueryParser.

more specifically, we think if statement here might be wrong.

https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711



Could you please tell us if it is a bug, or it's just a wrong query statement.



Thanks,

Hongtai Xue

Mime
View raw message