lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Webster Homer <webster.ho...@sial.com>
Subject Re: NRT replicas miss hits and return duplicate hits when paging solrcloud searches
Date Fri, 02 Mar 2018 18:19:55 GMT
Becky,
This should have been its own question.

Solrcloud is different from standalone solr, the configurations live in
Zookeeper and the index is created under SOLR_HOME. You might want to
rethink your solution, What problem are you trying to solve with that
layout? Would it be solved by creating the Parent1 collection with 2 shards?

On Fri, Mar 2, 2018 at 10:56 AM, Becky Bonner <bbonner@teleflora.com> wrote:

> We are trying to setup one solr server for several applications each with
> a different collection.  Is there a way to have have 2 collections under
> one folder and the url be something like this:
> http://mysolrinstance.com/solr/myParent1/collection1
> http://mysolrinstance.com/solr/myParent1/collection2
> http://mysolrinstance.com/solr/myParent2
> http://mysolrinstance.com/solr/myParent3
>
>
> We organized it like that under the solr folder but the URLs to the
> collections do not include the "myParent1".
> This makes the names of my collections more confusing because you can't
> tell what application they belong to.  It wasn’t a problem until we had 2
> collections for one of the apps.
>
>
>
>
> -----Original Message-----
> From: Webster Homer [mailto:webster.homer@sial.com]
> Sent: Friday, March 2, 2018 10:29 AM
> To: solr-user@lucene.apache.org
> Subject: Re: NRT replicas miss hits and return duplicate hits when paging
> solrcloud searches
>
> I am trying to test if enabling stats cache as suggested by Eric would
> also address this issue. I added this to my solrconfig.xml
>
>  <statsCache class="org.apache.solr.search.stats.ExactSharedStatsCache"/>
>
> I executed queries and saw no differences. Then I re-indexed the data,
> again I saw no differences in behavior.
> Then I found this,  SOLR-10952. It seems we need to disable the
> queryResultCache for the global stats cache to work.
> I've never disabled this before. I edited the solrconfig.xml setting the
> sizes to 0. I'm not sure if this is how to disable the cache or not.
>
>     <queryResultCache class="solr.LRUCache"
>                      size="0"
>                      initialSize="0"
>                      autowarmCount="0"/>
>
> I also set this:
>    <queryResultMaxDocsCached>0</queryResultMaxDocsCached>
>
> Then uploaded the solrconfig.xml and reloaded the collection. It sill made
> no difference. Do I need to restart solr for this to take effect?
> When I look in the admin console, the queryResultCache still seems to have
> the old settings.
>
> Does enabling statsCache require a solr restart too? Does enabling the
> statsCache require that the data be re-indexed? The documentation on this
> feature is skimpy.
> Is there a way to see if it's enabled in the Admin Console?
>
> On Tue, Feb 27, 2018 at 9:31 AM, Webster Homer <webster.homer@sial.com>
> wrote:
>
> > Emir,
> >
> > Using tlog replica types addresses my immediate problem.
> >
> > The secondary issue is that all of our searches show inconsistent
> results.
> > These are all normal paging use cases. We regularly test our
> > relevancy, and these differences creates confusion in the testers.
> > Moreover, we are migrating from Endeca which has very consistent results.
> >
> > I'm hoping that using the global stats cache will make the other
> > searches more stable. I think we will eventually move to favoring tlog
> > replicas. We have a couple of collections where NRT makes sense, but
> > those collections don't need to return data in relevancy order. I
> > think NRT should be considered a niche use case for a search engine,
> > tlog and pull replicas are a much better fit for a search engine
> > (imho)
> >
> > On Tue, Feb 27, 2018 at 4:01 AM, Emir Arnautović <
> > emir.arnautovic@sematext.com> wrote:
> >
> >> Hi Webster,
> >> Since you are returning all hits, returning the last page is almost
> >> as heavy for Solr as returning all documents. Maybe you should
> >> consider just returning one large page and completely avoid this issue.
> >> I agree with you that this should be handled by Solr. ES solved this
> >> issue with “preference” search parameter where you can set session id
> >> as preference and it will stick to the same shards. I guess you could
> >> try similar thing on your own but that would require you to send list
> >> of shards as parameter for your search and balance it for different
> sessions.
> >>
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> >> Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >> > On 26 Feb 2018, at 21:03, Webster Homer <webster.homer@sial.com>
> wrote:
> >> >
> >> > Erick,
> >> >
> >> > No we didn't look at that. I will add it to the list. We have  not
> >> > seen performance issues with solr. We have much slower technologies
> >> > in our stack. This project was to replace a system that was too slow.
> >> >
> >> > Thank you, I will look into it
> >> >
> >> > Webster
> >> >
> >> > On Mon, Feb 26, 2018 at 1:13 PM, Erick Erickson <
> >> erickerickson@gmail.com>
> >> > wrote:
> >> >
> >> >> Did you try enabling distributed IDF (statsCache)? See:
> >> >> https://lucene.apache.org/solr/guide/6_6/distributed-requests.html
> >> >>
> >> >> It's may not totally fix the issue, but it's worth trying. It does
> >> >> come with a performance penalty of course.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Mon, Feb 26, 2018 at 11:00 AM, Webster Homer <
> >> webster.homer@sial.com>
> >> >> wrote:
> >> >>> Thanks Shawn, I had settled on this as a solution.
> >> >>>
> >> >>> All our use cases for Solr is to return results in order of
> >> >>> relevancy
> >> to
> >> >>> the query, so having a deterministic sort would defeat that purpose.
> >> >> Since
> >> >>> we wanted to be able to return all the results for a query, I
> >> originally
> >> >>> looked at using the Streaming API, but that doesn't support
> >> >>> returning results sorted by relevancy
> >> >>>
> >> >>> I disagree with you about NRT replicas though. They may function
> >> >>> as designed, but since they cannot guarantee consistent results
> >> >>> their
> >> design
> >> >>> is buggy, at least it is for a search engine.
> >> >>>
> >> >>>
> >> >>> On Mon, Feb 26, 2018 at 12:20 PM, Shawn Heisey
> >> >>> <apache@elyograg.org>
> >> >> wrote:
> >> >>>
> >> >>>> On 2/26/2018 10:26 AM, Webster Homer wrote:
> >> >>>>> We need the results by relevancy so the application sorts
the
> >> results
> >> >> by
> >> >>>>> score desc, and the unique id ascending as the tie breaker
> >> >>>>
> >> >>>> This is the reason for the discrepancy, and why the different
> >> >>>> replica types don't have the same issue.
> >> >>>>
> >> >>>> Each NRT replica can have different deleted documents than
the
> >> others,
> >> >>>> just due to the way that NRT replicas work.  Deleted documents
> >> >>>> affect relevancy scoring.  When one replica has say 5000 deleted
> >> >>>> documents
> >> and
> >> >>>> another has 200, or has 5000 but they're different docs, a
> >> >>>> relevancy sort can end up different.  So when Solr goes to
one
> >> >>>> replica for
> >> page 1
> >> >>>> and another for page 2 (which is expected due to SolrCloud's
> >> >>>> internal load balancing), you may end up with duplicate
> >> >>>> documents or documents missing.  Because deleted documents
are
> >> >>>> not counted or returned, numFound will be consistent, as long
as
> >> >>>> the index doesn't change
> >> between
> >> >>>> the queries for pages.
> >> >>>>
> >> >>>> If you were using a deterministic sort rather than relevancy,
> >> >>>> this wouldn't be happening, because deleted documents have
no
> >> >>>> influence on that kind of sort.
> >> >>>>
> >> >>>> With TLOG or PULL, the replicas are absolutely identical, so
> >> >>>> there
> >> is no
> >> >>>> difference, unless the index is changing as you page through
the
> >> >> results.
> >> >>>>
> >> >>>> I think changing replica types is the only solution here. 
NRT
> >> replicas
> >> >>>> are working as they were designed -- there's no bug, even though
> >> >>>> problems like this do sometimes turn up.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Shawn
> >> >>>>
> >> >>>>
> >> >>>
> >> >>> --
> >> >>>
> >> >>>
> >> >>> This message and any attachment are confidential and may be
> >> privileged or
> >> >>> otherwise protected from disclosure. If you are not the intended
> >> >> recipient,
> >> >>> you must not copy this message or attachment or disclose the
> >> >>> contents
> >> to
> >> >>> any other person. If you have received this transmission in
> >> >>> error,
> >> please
> >> >>> notify the sender immediately and delete the message and any
> >> attachment
> >> >>> from your system. Merck KGaA, Darmstadt, Germany and any of its
> >> >>> subsidiaries do not accept liability for any omissions or errors
> >> >>> in
> >> this
> >> >>> message which may arise as a result of E-Mail-transmission or for
> >> damages
> >> >>> resulting from any unauthorized changes of the content of this
> >> >>> message
> >> >> and
> >> >>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any
of
> >> >>> its subsidiaries do not guarantee that this message is free of
> >> >>> viruses and
> >> >> does
> >> >>> not accept liability for any damages caused by any virus
> >> >>> transmitted therewith.
> >> >>>
> >> >>> Click http://www.emdgroup.com/disclaimer to access the German,
> >> French,
> >> >>> Spanish and Portuguese versions of this disclaimer.
> >> >>
> >> >
> >> > --
> >> >
> >> >
> >> > This message and any attachment are confidential and may be
> >> > privileged
> >> or
> >> > otherwise protected from disclosure. If you are not the intended
> >> recipient,
> >> > you must not copy this message or attachment or disclose the
> >> > contents to any other person. If you have received this
> >> > transmission in error,
> >> please
> >> > notify the sender immediately and delete the message and any
> >> > attachment from your system. Merck KGaA, Darmstadt, Germany and any
> >> > of its subsidiaries do not accept liability for any omissions or
> >> > errors in this message which may arise as a result of
> >> > E-Mail-transmission or for
> >> damages
> >> > resulting from any unauthorized changes of the content of this
> >> > message
> >> and
> >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of
> >> > its subsidiaries do not guarantee that this message is free of
> >> > viruses and
> >> does
> >> > not accept liability for any damages caused by any virus
> >> > transmitted therewith.
> >> >
> >> > Click http://www.emdgroup.com/disclaimer to access the German,
> >> > French, Spanish and Portuguese versions of this disclaimer.
> >>
> >>
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message