lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Gerlowski (JIRA)" <>
Subject [jira] [Commented] (SOLR-6595) Improve error response in case distributed collection cmd fails
Date Wed, 02 Jan 2019 13:19:00 GMT


Jason Gerlowski commented on SOLR-6595:

I'm not going to have much time in the immediate future to finish this up, so I wanted to
summarize the progress so far:

- the latest patch sets the "status" property to 500 when the "failure" list is present and
- because of this, SolrJ will now throw exceptions in failure cases where it previously allowed
the request to fail silently.  This causes some tests to fail that were passing (incorrectly)
before.  I investigated a few examples of this, and most were in test setup/cleanup when the
expectations were a bit off.  There weren't a ton of these failures though and they should
be simpler to debug thanks to other recent test flakiness improvements.
- I investigated making changes to SolrJ that would attach a NamedList to SolrExceptions thrown
because of a 500, but didn't pursue that too far.  It's probably a separate JIRA anyways.

> Improve error response in case distributed collection cmd fails
> ---------------------------------------------------------------
>                 Key: SOLR-6595
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.10
>         Environment: SolrCloud with Client SSL
>            Reporter: Sindre Fiskaa
>            Assignee: Jason Gerlowski
>            Priority: Minor
>         Attachments: SOLR-6595.patch
> Followed the description
and generated a self signed key pair. Configured a few solr-nodes and used the collection
api to crate a new collection. -I get error message when specify the nodes with the createNodeSet
param. When I don't use createNodeSet param the collection gets created without error on random
nodes. Could this be a bug related to the createNodeSet param?- *Update: It failed due to
what turned out to be invalid client certificate on the overseer, and returned the following
> {code:xml}
> <response>
>   <lst name="responseHeader"><int name="status">0</int><int name="QTime">185</int></lst>
>   <lst name="failure">
>     <str>org.apache.solr.client.solrj.SolrServerException:IOException occured when
talking to server at: https://vt-searchln04:443/solr</str>
>   </lst>
> </response>
> {code}
> *Update: Three problems:*
> # Status=0 when the cmd did not succeed (only ZK was updated, but cores not created due
to failing to connect to shard nodes to talk to core admin API).
> # The error printed does not tell which action failed. Would be helpful to either get
the msg from the original exception or at least some message saying "Failed to create core,
see log on Overseer <>
> # State of collection is not clean since it exists as far as ZK is concerned but cores
not created. Thus retrying the CREATECOLLECTION cmd would fail. Should Overseer detect error
in distributed cmds and rollback changes already made in ZK?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message