lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: [SPAM] Re: query parsed in different ways in two identical solr instances
Date Mon, 10 Jun 2019 13:32:28 GMT
Ok, great.

We now moved from "identical setup breaks things in a bugfix version"
to "strange behavior when field does not exist". The "identical" part
was actually throwing us off the trail.

And all this leads us to
https://issues.apache.org/jira/browse/SOLR-5163 , fixed in 8.0.

Hope it helps,
    Alex.

On Mon, 10 Jun 2019 at 09:19, Danilo Tomasoni <tomasoni@cosbi.eu> wrote:
>
> Hello I was able to reproduce this behaviour in an isolated environment,
> and performed some differential analysis between the two versions (that has different
schemas, diff of schemas attached)
>
> With the schema of solr1, the query is parsed as +(+(....) +(....))
> while with the schema of solr-test, the same query is parsed as +((....) (....))
>
> The query is
>
> "q":"(f1:PUBMEDPMID12159614 AND (_query_:\"{!edismax qf='medline_chemical_terms medline_mesh_terms'
q.op=OR mm=1 v=$subquery1}\"))"
>
> in solr1 and also in solr test f1 equals
> "f.f1.qf":"id pmid pmc source_id other_id doi manuscript_id publication_id secondary_ids"}}
>
> And then I suddenly remembered that the field secondary_ids was renamed to external_data
in solr-test (before the bulk import).
>
> So I changed f1 definition removing secondary_ids and adding external_data..
> and now the behaviour is the same!
>
> How is that possible? why the schema (and in this case a non-existing field) can influence
in such a profound way the behaviour of the query parser?
>
> I think that this is a subtle bug and an error should be raised instead of performing
an unexpected query.
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for Computational and
Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomasoni@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the protection of
natural persons with regard to the processing of personal data, we inform you that all the
data we possess are object of treatment in the respect of the normative provided for by the
cited GDPR.
> It is your right to be informed on which of your data are used and how; you may ask for
their correction, cancellation or you may oppose to their use by written request sent by recorded
delivery to The Microsoft Research – University of Trento Centre for Computational and Systems
Biology Scarl, Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>
> ________________________________________
> From: Alexandre Rafalovitch [arafalov@gmail.com]
> Sent: 10 June 2019 12:49
> To: solr-user
> Subject: [SPAM] Re: query parsed in different ways in two identical solr instances
>
> Were you able to simplify it to the simplest use case showing the issue? Or
> reproduce it on the stock Solr with stock example? Because otherwise, we
> would be just as stuck in a Jira as now. It is the same people helping....
>
> For example, is the _query_ part significant?
>
> Also, did you try running both queries with echoParams=all just to
> eliminate stray differences? I know you looked at the debug line, but
> perhaps this is worth a check too.
>
> Regards,
>     Alex
>
>
>
> On Mon, Jun 10, 2019, 5:46 AM Danilo Tomasoni, <tomasoni@cosbi.eu> wrote:
>
> > Hello all,
> > maybe I should consider this as a bug and open an issue?
> >
> > Danilo Tomasoni
> >
> > Fondazione The Microsoft Research - University of Trento Centre for
> > Computational and Systems Biology (COSBI)
> > Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> > tomasoni@cosbi.eu
> > http://www.cosbi.eu
> >
> > As for the European General Data Protection Regulation 2016/679 on the
> > protection of natural persons with regard to the processing of personal
> > data, we inform you that all the data we possess are object of treatment in
> > the respect of the normative provided for by the cited GDPR.
> > It is your right to be informed on which of your data are used and how;
> > you may ask for their correction, cancellation or you may oppose to their
> > use by written request sent by recorded delivery to The Microsoft Research
> > – University of Trento Centre for Computational and Systems Biology Scarl,
> > Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> > P Please don't print this e-mail unless you really need to
> >
> > ________________________________________
> > From: Danilo Tomasoni
> > Sent: 07 June 2019 11:47
> > To: solr-user@lucene.apache.org
> > Subject: RE: query parsed in different ways in two identical solr instances
> >
> > any thoughts on that difference in the solr parsing? is it correct that
> > the first looks like an AND while the second looks like and OR?
> > Thank you
> >
> > Danilo Tomasoni
> >
> > Fondazione The Microsoft Research - University of Trento Centre for
> > Computational and Systems Biology (COSBI)
> > Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> > tomasoni@cosbi.eu
> > http://www.cosbi.eu
> >
> > As for the European General Data Protection Regulation 2016/679 on the
> > protection of natural persons with regard to the processing of personal
> > data, we inform you that all the data we possess are object of treatment in
> > the respect of the normative provided for by the cited GDPR.
> > It is your right to be informed on which of your data are used and how;
> > you may ask for their correction, cancellation or you may oppose to their
> > use by written request sent by recorded delivery to The Microsoft Research
> > – University of Trento Centre for Computational and Systems Biology Scarl,
> > Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> > P Please don't print this e-mail unless you really need to
> >
> > ________________________________________
> > From: Danilo Tomasoni [tomasoni@cosbi.eu]
> > Sent: 06 June 2019 16:21
> > To: solr-user@lucene.apache.org
> > Subject: RE: query parsed in different ways in two identical solr instances
> >
> > The two collections are not identical, many overlapping documents but with
> > some different field names (test has also extra fields that 1 didn't have).
> > Actually we have 42.000.000 docs in solr1, and 40.000.000 in solr-test,
> > but I think this shouldn'd be relevant because the query is basically like
> >
> > id=x AND mesh=list of phrase queries
> >
> > where the second part of the and is handled through a nested query
> > (_query_ magic keyword).
> >
> > I expect that a query like this one would return 1 documents (x) or 0
> > documents.
> >
> > The thing that puzzles me is that on solr1 the engine is returning 1
> > document (x)
> > while on test the engine is returning 68.000 documents..
> > If you look at my first e-mail you will notice that in the correct engine
> > the parsed query is like
> >
> > +(+(...) +(...))
> >
> > That is correct for an AND
> >
> > while in the test engine the query is parsed like
> >
> > +((...) (...))
> >
> > which is more like an OR...
> >
> >
> > Danilo Tomasoni
> >
> > Fondazione The Microsoft Research - University of Trento Centre for
> > Computational and Systems Biology (COSBI)
> > Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> > tomasoni@cosbi.eu
> > http://www.cosbi.eu
> >
> > As for the European General Data Protection Regulation 2016/679 on the
> > protection of natural persons with regard to the processing of personal
> > data, we inform you that all the data we possess are object of treatment in
> > the respect of the normative provided for by the cited GDPR.
> > It is your right to be informed on which of your data are used and how;
> > you may ask for their correction, cancellation or you may oppose to their
> > use by written request sent by recorded delivery to The Microsoft Research
> > – University of Trento Centre for Computational and Systems Biology Scarl,
> > Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> > P Please don't print this e-mail unless you really need to
> >
> > ________________________________________
> > From: Alexandre Rafalovitch [arafalov@gmail.com]
> > Sent: 06 June 2019 15:53
> > To: solr-user
> > Subject: Re: query parsed in different ways in two identical solr instances
> >
> > Those two queries look same after sorting the parameters, yet the
> > results are clearly different. That means the difference is deeper.
> >
> > 1) Have you checked that both collections have the same amount of
> > documents (e.g. mismatched final commit). Does basic "query=*:*"
> > return the same counts in the same initial order?
> > 2) Are you absolutely sure you are comparing 7.3.0 with 7.3.1? There
> > was SOLR-11501 that may be relevant, but it was fixed in 7.2:
> > https://issues.apache.org/jira/browse/SOLR-11501
> >
> > Regards,
> >    Alex.
> >
> > Are you absolutely sure that your instances are 7.3.0 and 7.3.1?
> >
> > On Thu, 6 Jun 2019 at 09:26, Danilo Tomasoni <tomasoni@cosbi.eu> wrote:
> > >
> > > Hello, and thank you for your answer.
> > > Attached you will find the two logs for the working solr1 server, and
> > the non-working solr-test server.
> > >
> > >
> > > Danilo Tomasoni
> > >
> > >
> > > Fondazione The Microsoft Research - University of Trento Centre for
> > Computational and Systems Biology (COSBI)
> > > Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> > > tomasoni@cosbi.eu
> > > http://www.cosbi.eu
> > >
> > > As for the European General Data Protection Regulation 2016/679 on the
> > protection of natural persons with regard to the processing of personal
> > data, we inform you that all the data we possess are object of treatment in
> > the respect of the normative provided for by the cited GDPR.
> > > It is your right to be informed on which of your data are used and how;
> > you may ask for their correction, cancellation or you may oppose to their
> > use by written request sent by recorded delivery to The Microsoft Research
> > – University of Trento Centre for Computational and Systems Biology Scarl,
> > Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> > > P Please don't print this e-mail unless you really need to
> > >
> > > ________________________________________
> > > From: Shawn Heisey [apache@elyograg.org]
> > > Sent: 05 June 2019 17:52
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: query parsed in different ways in two identical solr
> > instances
> > >
> > > On 6/5/2019 8:41 AM, Danilo Tomasoni wrote:
> > > > Hello,
> > > > I have two solr instances with exactly the same configuration.
> > > > The only difference that i know is that the first (the working one, is
> > solr 7.3.0,
> > > > while the one that's not working is solr 7.3.1)
> > > >
> > > > If I execute the same query (with debugQuery=on) it gets parsed in
> > different ways on the two systems and I don't understand why.
> > >
> > > Look in solr.log.  The full query, including parameters that are used
> > > but not on the URL, will be shown there.  Provide that whole line from
> > > both versions.
> > >
> > > An example of the kind of line you need to find, with a very simple
> > > query, is below:
> > >
> > > 2019-06-05 15:50:23.691 INFO  (qtp1264413185-43) [   x:foo]
> > > o.a.s.c.S.Request [foo]  webapp=/solr path=/select
> > > params={q=*:*&_=1559749821933} hits=0 status=0 QTime=38
> > >
> > > If your index has multiple shards, there can be multiple lines.  In that
> > > situation, we need the last one, which should be the main query itself
> > > rather than the subqueries.
> > >
> > > Thanks,
> > > Shawn
> >

Mime
View raw message