lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9063) CloudSolrClient with _route_ shouldn't require collection param to disambig cores
Date Thu, 05 May 2016 05:37:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271908#comment-15271908
] 

David Smiley commented on SOLR-9063:
------------------------------------

Okay I chased this one down sufficiently to know what's going on.  Strictly speaking, there
is no bug, or perhaps a small one.  In SOLR-5380 [~markrmiller@gmail.com] fixed a bug in CloudSolrClient
involving aliases pointing to multiple collections, and when "collection" was not specified
as a parameter thus relying on the default collection (an alias).  I think it's because when
the server gets the request to a specific core, doesn't know what the original/unresolved
alias was.  So the solution implemented as seen in the code now is to send the request to
a collection (the alias) level URL instead of to a specific core, but to do this only when
"collection" isn't a parameter.  If "collection" *is* a parameter, then the server end will
know how to deal with it.

But I think the logic to know when to do this should be improved to go to the collection level
URL (thus not to the core) in more constrained circumstances.  In particular, if {{collectionNames.size()
== 1}} then it can go directly to the core URL, as we just resolved the alias (if there even
was one).  I think the condition to differentiate should be exactly that instead of combining
in the current condition looking to see if "collection" was specified, if at least for the
reason of simpler understanding.  Would this be an optimization or simplification?  I'm not
completely sure.

Here's the snippet I propose it become:
{code:java}
            String url;
            if (collectionNames.size() > 1) {
              // If there was an alias pointing to multiple collections, we can't send directly
to a core.  If it were
              // convenient to modify the params to add collection=(list) we would but it
isn't.  So we send the request
              // to the collection at the node and let the server end handle dispatching (incl.
alias resolution).
              url = ZkCoreNodeProps.getCoreUrl(nodeProps.getStr(ZkStateReader.BASE_URL_PROP),
collection);
            } else {
              url = coreNodeProps.getCoreUrl();
            }
            ((!sendToLeaders || coreNodeProps.isLeader()) ? urlList2 : replicas).add(url);
{code}

How did I run into this?  I'm working with a codebase that wanted to make a request to a custom
request handler that did _not_ extend SearchHandler.  It's expected that a request be routed
to the proper shard via an explicit {{\_route\_}}, which we do and CloudSolrClient processes
that param.  However, because we didn't set "collection" as a param (we relied on the default
collection prop), CloudSolrClient ignored it's work in figuring out which shard to send it
to and instead sent the request to the collection level.  At the server end, HttpSolrCall
doesn't process {{\_route_}} so it arbitrarily passed the request to a shard that was the
wrong one on the node.  If the request handler were a SearchHandler (it isn't), then it'd
become a distributed search to the {{\_route\_}} handling logic in HttpShardHandler.  The
solution was quite simple -- add "collection" and don't use the default collection setting.
 That seemed awfully weird so I spent today finding out why it didn't work.  I argue this
_should_ work and hence I propose the simple change above.

Testing in progress...

> CloudSolrClient with _route_ shouldn't require collection param to disambig cores
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-9063
>                 URL: https://issues.apache.org/jira/browse/SOLR-9063
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud, SolrJ
>    Affects Versions: 4.10.4
>            Reporter: David Smiley
>            Assignee: David Smiley
>
> CloudSolrClient uses {{\_route\_}} to know where to send a request  It sorta works --
it'll go to an appropriate _node_.  But it will only go to the correct core on that node if
the {{collection}} parameter is explicitly added.  In another words, it ignores the default
collection configured on CloudSolrClient.  It also seems to ignore "collection" parameter
to the protected method sendRequest for this purpose too.  As I write this, see line 1139
on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message