lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Potter <tim.pot...@lucidworks.com>
Subject Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Date Tue, 14 Oct 2014 14:32:58 GMT
Try adding shards.info=true and debug=track to your queries ... these will
give more detailed information about what's going behind the scenes.

On Mon, Oct 13, 2014 at 11:11 PM, S.L <simpleliving016@gmail.com> wrote:

> Erick,
>
> I have upgraded to SolrCloud 4.10.1 with the same toplogy , 3 shards and 2
> replication factor with six cores altogether.
>
> Unfortunately , I still see the issue of intermittently no results being
> returned.I am not able to figure out whats going on here, I have included
> the logging information below.
>
> *Here's the query that I run.*
>
>
> http://server1.mydomain.com:8081/solr/dyCollection1/select/?q=*:*&fq=%28id:220a8dce-3b31-4d46-8386-da8405595c47%29&wt=json&distrib=true
>
>
>
> *Scenario 1: No result returned.*
>
> *Log Information for Scenario #1 .*
> 92860314 [http-bio-8081-exec-103] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> null
> 92860315 [http-bio-8081-exec-103] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> null
> 92860315 [http-bio-8081-exec-103] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> null
> 92860315 [http-bio-8081-exec-103] INFO  org.apache.solr.core.SolrCore  –
> [dyCollection1_shard2_replica1] webapp=/solr path=/select/
>
> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
> hits=0 status=0 QTime=5
>
> *Scenario #2 : I get result back*
>
>
>
> *Log information for scenario #2.*92881911 [http-bio-8081-exec-177] INFO
> org.apache.solr.core.SolrCore  – [dyCollection1_shard2_replica1]
> webapp=/solr path=/select
>
> params={spellcheck=true&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=
>
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&fl=productURL,score&df=suggestAggregate&start=0&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fsv=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
> }
> hits=1 status=0 QTime=1
> 92881913 [http-bio-8081-exec-177] INFO  org.apache.solr.core.SolrCore  –
> [dyCollection1_shard2_replica1] webapp=/solr path=/select
>
> params={spellcheck=false&spellcheck.maxResultsForSuggest=5&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&ids=
>
> http://www.searcheddomain.com/p/ironwork-8-piece-comforter-set/-/A-15273248&spellcheck.maxCollations=5&spellcheck.maxCollationTries=10&distrib=false&wt=javabin&spellcheck.collate=true&version=2&rows=10&NOW=1413251927427&shard.url=http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/&df=suggestAggregate&q=*:*&spellcheck.dictionary=direct&spellcheck.dictionary=wordbreak&spellcheck.count=10&isShard=true&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)&spellcheck.alternativeTermCount=5
> }
> status=0 QTime=0
> 92881914 [http-bio-8081-exec-169] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8082/solr/dyCollection1_shard3_replica2/|http://server2.mydomain.com:8082/solr/dyCollection1_shard3_replica1/
> null
> 92881914 [http-bio-8081-exec-169] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server3.mydomain.com:8082/solr/dyCollection1_shard1_replica1/|http://server2.mydomain.com:8081/solr/dyCollection1_shard1_replica2/
> null
> 92881914 [http-bio-8081-exec-169] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> null
> 92881914 [http-bio-8081-exec-169] INFO
> org.apache.solr.handler.component.SpellCheckComponent  –
>
> http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/|http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/
> null
> 92881915 [http-bio-8081-exec-169] INFO  org.apache.solr.core.SolrCore  –
> [dyCollection1_shard2_replica1] webapp=/solr path=/select/
>
> params={q=*:*&distrib=true&wt=json&fq=(id:220a8dce-3b31-4d46-8386-da8405595c47)}
> hits=1 status=0 QTime=7
>
>
> *Autocommit and Soft commit settings.*
>
>      <autoSoftCommit>
>        <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>      </autoSoftCommit>
>
>      <autoCommit>
>        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
>
>        <openSearcher>true</openSearcher>
>      </autoCommit>
>
>
>
> On Tue, Oct 7, 2014 at 12:22 AM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
> > Not, I'm not guaranteeing that it'll actually cure the problem, just
> > that enough has changed since 4.7 that it'd be a good place to start.
> >
> > Things have been reported off and on, but they're often pesky race
> > conditions or something else that takes a long time to track down, you
> > just are lucky perhaps ;)...
> >
> > Erick
> >
> > On Mon, Oct 6, 2014 at 8:04 PM, S.L <simpleliving016@gmail.com> wrote:
> > > Erick,
> > >
> > > Thanks for the suggestion , I am not sure if I would be able to capture
> > > what went wrong , so upgrading to 4.10 seems easier even though it
> means
> > ,
> > > a days work of effort :) . I will go ahead and upgrade and let me know
> ,
> > > although I am surprised that this issue never got reported for 4.7 up
> > until
> > > now.
> > >
> > > Thanks again for your help!
> > >
> > >
> > >
> > > On Mon, Oct 6, 2014 at 10:52 PM, Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > >> I think there were some holes that would allow replicas and leaders to
> > >> be out of synch that have been patched up in the last 3 releases.
> > >>
> > >> There shouldn't be anything you need to do to keep these in synch, so
> > >> if you can capture what happened when things got out of synch we'll
> > >> fix it. But a lot has changed in the last several months, so the first
> > >> thing I'd do if possible is to upgrade to 4.10.1.
> > >>
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Mon, Oct 6, 2014 at 2:41 PM, S.L <simpleliving016@gmail.com>
> wrote:
> > >> > Hi Erick,
> > >> >
> > >> > Before I tried your suggestion of  issung a commit=true update, I
> > >> realized that for eaach shard there was atleast a node that had its
> > index
> > >> directory named like index.<timestamp>.
> > >> >
> > >> > I went ahead and deleted index directory that restarted that core
> and
> > >> now the index directory got syched with the other node and is properly
> > >> named as 'index' without any timestamp attached to it.This is now
> > giving me
> > >> consistent results for distrib=true using a load balancer.Also
> > >> distrib=false returns expexted results for a given shard.
> > >> >
> > >> > The underlying issue appears to be that in every shard the leader
> and
> > >> the replica(follower) were out of sych.
> > >> >
> > >> > How can I avoid this from happening again?
> > >> >
> > >> > Thanks for your help!
> > >> >
> > >> > Sent from my HTC
> > >> >
> > >> > ----- Reply message -----
> > >> > From: "Erick Erickson" <erickerickson@gmail.com>
> > >> > To: <solr-user@lucene.apache.org>
> > >> > Subject: SolrCloud 4.7 not doing distributed search when querying
> > from a
> > >> load balancer.
> > >> > Date: Fri, Oct 3, 2014 12:56 AM
> > >> >
> > >> > Hmmmm. Assuming that you aren't re-indexing the doc you're searching
> > >> for...
> > >> >
> > >> > Try issuing http://blah
> blah:8983/solr/collection/update?commit=true.
> > >> > That'll force all the docs to be searchable. Does <1> still
hold for
> > >> > the document in question? Because this is exactly backwards of what
> > >> > I'd expect. I'd expect, if anything, the replica (I'm trying to call
> > >> > it the "follower" when a distinction needs to be made since the
> leader
> > >> > is a "replica" too....) would be out of sync. This is still a Bad
> > >> > Thing, but the leader gets first crack at indexing thing.
> > >> >
> > >> > bq: only the replica of the shard that has this key returns the
> result
> > >> > , and the leader does not ,
> > >> >
> > >> > Just to be sure we're talking about the same thing. When you say
> > >> > "leader", you mean the shard leader, right? The filled-in circle on
> > >> > the graph view from the admin/cloud page.
> > >> >
> > >> > And let's see your soft and hard commit settings please.
> > >> >
> > >> > Best,
> > >> > Erick
> > >> >
> > >> > On Thu, Oct 2, 2014 at 9:48 PM, S.L <simpleliving016@gmail.com>
> > wrote:
> > >> >> Eirck,
> > >> >>
> > >> >> 0> Load balancer is out of the picture
> > >> >> .
> > >> >> 1>When I query with *distrib=false* , I get consistent results
as
> > >> expected
> > >> >> for those shards that dont have the key i.e I dont get the results
> > back
> > >> for
> > >> >> those shards, however I just realized that while *distrib=false*
is
> > >> present
> > >> >> in the query for the shard that is supposed to contain the key,only
> > the
> > >> >> replica of the shard that has this key returns the result , and
the
> > >> leader
> > >> >> does not , looks like replica and the leader do not have the same
> > data
> > >> and
> > >> >> replica seems to contain the key in the query for that shard.
> > >> >>
> > >> >> 2> By indexing I mean this collection is being populated by
a web
> > >> crawler.
> > >> >>
> > >> >> So looks like 1> above  is pointing to leader and replica being
out
> > of
> > >> >> synch for atleast one shard.
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson <
> > >> erickerickson@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >>> bq: Also ,the collection is being actively indexed as I query
> this,
> > >> could
> > >> >>> that
> > >> >>> be an issue too ?
> > >> >>>
> > >> >>> Not if the documents you're searching aren't being added as
you
> > search
> > >> >>> (and all your autocommit intervals have expired).
> > >> >>>
> > >> >>> I would turn off indexing for testing, it's just one more
variable
> > >> >>> that can get in the way of understanding this.
> > >> >>>
> > >> >>> Do note that if the problem were endemic to Solr, there would
> > probably
> > >> >>> be a _lot_ more noise out there.
> > >> >>>
> > >> >>> So to recap:
> > >> >>> 0> we can take the load balancer out of the picture all
together.
> > >> >>>
> > >> >>> 1> when you query each shard individually with &distrib=true,
> every
> > >> >>> replica in a particular shard returns the same count.
> > >> >>>
> > >> >>> 2> when you query without &distrib=true you get varying
counts.
> > >> >>>
> > >> >>> This is very strange and not at all expected. Let's try it
again
> > >> >>> without indexing going on....
> > >> >>>
> > >> >>> And what do you mean by "indexing" anyway? How are documents
being
> > fed
> > >> >>> to your system?
> > >> >>>
> > >> >>> Best,
> > >> >>> Erick@PuzzledAsWell
> > >> >>>
> > >> >>> On Thu, Oct 2, 2014 at 7:32 PM, S.L <simpleliving016@gmail.com>
> > wrote:
> > >> >>> > Erick,
> > >> >>> >
> > >> >>> > I would like to add that the interesting behavior i.e
point #2
> > that I
> > >> >>> > mentioned in my earlier reply  happens in all the shards
, if
> this
> > >> were
> > >> >>> to
> > >> >>> > be a distributed search issue this should have not manifested
> > itself
> > >> in
> > >> >>> the
> > >> >>> > shard that contains the key that I am searching for ,
looks like
> > the
> > >> >>> search
> > >> >>> > is just failing as whole intermittently .
> > >> >>> >
> > >> >>> > Also ,the collection is being actively indexed as I query
this,
> > could
> > >> >>> that
> > >> >>> > be an issue too ?
> > >> >>> >
> > >> >>> > Thanks.
> > >> >>> >
> > >> >>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <simpleliving016@gmail.com
> >
> > >> wrote:
> > >> >>> >
> > >> >>> >> Erick,
> > >> >>> >>
> > >> >>> >> Thanks for your reply, I tried your suggestions.
> > >> >>> >>
> > >> >>> >> 1 . When not using loadbalancer if  *I have distrib=false*
I
> get
> > >> >>> >> consistent results across the replicas.
> > >> >>> >>
> > >> >>> >> 2. However here's the insteresting part , while not
using load
> > >> balancer
> > >> >>> if
> > >> >>> >> I *dont have distrib=false* , then when I query a
particular
> node
> > >> ,I get
> > >> >>> >> the same behaviour as if I were using a loadbalancer
, meaning
> > the
> > >> >>> >> distributed search from a node works intermittently
.Does this
> > give
> > >> any
> > >> >>> >> clue ?
> > >> >>> >>
> > >> >>> >>
> > >> >>> >>
> > >> >>> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <
> > >> erickerickson@gmail.com
> > >> >>> >
> > >> >>> >> wrote:
> > >> >>> >>
> > >> >>> >>> Hmmm, nothing quite makes sense here....
> > >> >>> >>>
> > >> >>> >>> Here are some experiments:
> > >> >>> >>> 1> avoid the load balancer and issue queries
like
> > >> >>> >>>
> > http://solr_server:8983/solr/collection/q=whatever&distrib=false
> > >> >>> >>>
> > >> >>> >>> the &distrib=false bit will cause keep SolrCloud
from trying
> to
> > >> send
> > >> >>> >>> the queries anywhere, they'll be served only
from the node you
> > >> address
> > >> >>> >>> them to.
> > >> >>> >>> that'll help check whether the nodes are consistent.
You
> should
> > be
> > >> >>> >>> getting back the same results from each replica
in a shard
> > (i.e. 2
> > >> of
> > >> >>> >>> your 6 machines).
> > >> >>> >>>
> > >> >>> >>> Next, try your failing query the same way.
> > >> >>> >>>
> > >> >>> >>> Next, try your failing query from a browser,
pointing it at
> > >> successive
> > >> >>> >>> nodes.
> > >> >>> >>>
> > >> >>> >>> Where is the first place problems show up?
> > >> >>> >>>
> > >> >>> >>> My _guess_ is that your load balancer isn't quite
doing what
> you
> > >> >>> think, or
> > >> >>> >>> your cluster isn't set up the way you think it
is, but those
> are
> > >> >>> guesses.
> > >> >>> >>>
> > >> >>> >>> Best,
> > >> >>> >>> Erick
> > >> >>> >>>
> > >> >>> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <
> simpleliving016@gmail.com>
> > >> wrote:
> > >> >>> >>> > Hi All,
> > >> >>> >>> >
> > >> >>> >>> > I am trying to query a 6 node Solr4.7  cluster
with 3 shards
> > >> and  a
> > >> >>> >>> > replication factor of 2 .
> > >> >>> >>> >
> > >> >>> >>> > I have fronted these 6 Solr nodes using
a load balancer ,
> > what I
> > >> >>> notice
> > >> >>> >>> is
> > >> >>> >>> > that every time I do a search of the form
> > >> >>> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)
 it gives
> > me a
> > >> >>> result
> > >> >>> >>> > only once in every 3 tries , telling me
that the load
> > balancer is
> > >> >>> >>> > distributing the requests between the 3
shards and SolrCloud
> > only
> > >> >>> >>> returns a
> > >> >>> >>> > result if the request goes to the core that
as that id .
> > >> >>> >>> >
> > >> >>> >>> > However if I do a simple search like q=*:*
, I consistently
> > get
> > >> the
> > >> >>> >>> right
> > >> >>> >>> > aggregated results back of all the documents
across all the
> > >> shards
> > >> >>> for
> > >> >>> >>> > every request from the load balancer. Can
someone please let
> > me
> > >> know
> > >> >>> >>> what
> > >> >>> >>> > this is symptomatic of ?
> > >> >>> >>> >
> > >> >>> >>> > Somehow Solr Cloud seems to be doing search
query
> distribution
> > >> and
> > >> >>> >>> > aggregation for queries of type *:* only.
> > >> >>> >>> >
> > >> >>> >>> > Thanks.
> > >> >>> >>>
> > >> >>> >>
> > >> >>> >>
> > >> >>>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message