lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steffen Moldenhauer <s.moldenha...@intershop.de>
Subject RE: Asynchronous Calls to Backup/Restore Collections ignoring errors
Date Wed, 06 Feb 2019 12:51:12 GMT
Hi Jason, 

thanks for pointing me to issue SOLR-6595.  Looks to me that the async thing is similar to
the handling of distributed collection cmds. 
I hope I can spare the time to try if your patch would fix it. 
Yes, I will try your suggestion and see if we can do a work around and check the collection
after the restore with a query. 

Regards
Steffen 

> -----Original Message-----
> From: Jason Gerlowski [mailto:gerlowskija@gmail.com]
> Sent: Montag, 4. Februar 2019 15:43
> To: solr-user@lucene.apache.org
> Subject: Re: Asynchronous Calls to Backup/Restore Collections ignoring
> errors
> 
> Hi Steffen,
> 
> There are a few "known issues" in this area.  Probably most relevant is
> SOLR-6595, which covers a few error-reporting issues for "collection-admin"
> operations.  I don't think we've gotten any reports yet of success/failure
> determination being broken for asynchronous operations, but that's not
> too surprising given my understanding of how that bit of the code works.
> So "yes", this is a known issue.
> We've made some progress towards improving the situation, but there's
> still work to be done.
> 
> As for workarounds, I can't think of any clever suggestions.  You might be
> able to issue a query to the collection to see if it returns any docs, or a
> particular number of expected docs.  But that may not be possible,
> depending on what you meant by the collection being "unusable" above.
> 
> Best,
> 
> Jason
> 
> On Thu, Jan 31, 2019 at 10:10 AM Steffen Moldenhauer
> <s.moldenhauer@intershop.de> wrote:
> >
> > Hi all,
> >
> > we are using the collection API backup and restore to transfer
> > collections from a pre-prod to a production system. We are currently
> using Solr version 6.6.5 But sometimes that automated process fails and
> collections are not working on the production system.
> >
> > It seems that the asynchronous API calls backup and restore do not report
> some errors/exceptions.
> >
> > I tried it with the solrcloud gettingstarted example:
> >
> >
> http://localhost:8983/solr/admin/collections?action=BACKUP&name=back
> up
> > -gettingstarted&collection=gettingstarted&location=D:\solr_backup
> >
> >
> http://localhost:8983/solr/admin/collections?action=DELETE&name=gettin
> > gstarted
> >
> > Now I simulate an error just by deleting somthing from the backup in the
> file-system and try to restore the incomplete backup:
> >
> >
> http://localhost:8983/solr/admin/collections?action=RESTORE&name=bac
> ku
> > p-
> gettingstarted&collection=gettingstarted&location=D:\solr_backup&asy
> > nc=1000
> >
> >
> http://localhost:8983/solr/admin/collections?action=REQUESTSTATUS&req
> u
> > estid=1000 <response><lst name="responseHeader"><int
> > name="status">0</int><int name="QTime">2</int></lst><lst
> > name="status"><str name="state">completed</str><str
> name="msg">found
> > [1000] in completed tasks</str></lst></response>
> >
> > The status is completed but the collection is not usable.
> >
> > With a synchronous restore call I get:
> >
> >
> http://localhost:8983/solr/admin/collections?action=RESTORE&name=bac
> kup-gettingstarted&collection=gettingstarted&location=D:\solr_backup
> >         <response><lst name="responseHeader"><int
> name="status">500</int><int name="QTime">6456</int></lst><str
> name="Operation restore caused
> exception:">org.apache.solr.common.SolrException:org.apache.solr.commo
> n.SolrException: Could not restore core</str><lst name="exception"><str
> name="msg">Could not restore core</str><int
> name="rspCode">500</int></lst><lst name="error"><lst
> name="metadata"><str name="error-
> class">org.apache.solr.common.SolrException</str><str name="root-error-
> class">org.apache.solr.common.SolrException</str></lst><str
> name="msg">Could not restore core</str><str
> name="trace">org.apache.solr.common.SolrException: Could not restore
> core
> >                at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(Collectio
> nsHandler.java:300)
> >                at
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(Collections
> Handler.java:237)
> >                at
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(Colle
> ctionsHandler.java:215)
> >                at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:173)
> >                at
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)
> >                at
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:
> 729)
> >                at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)
> >                at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:36
> 1)
> >                at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:30
> 5)
> >                at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandle
> r.java:1691)
> >                at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> >                at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:
> 143)
> >                at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> >                at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.ja
> va:226)
> >                at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.ja
> va:1180)
> >                at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> >                at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.jav
> a:185)
> >                at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.jav
> a:1112)
> >                at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:
> 141)
> >                at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHa
> ndlerCollection.java:213)
> >                at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.
> java:119)
> >                at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.ja
> va:134)
> >                at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java
> :335)
> >                at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.ja
> va:134)
> >                at org.eclipse.jetty.server.Server.handle(Server.java:534)
> >                at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> >                at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> >                at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractCo
> nnection.java:273)
> >                at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> >                at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.jav
> a:93)
> >                at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProdu
> ceConsume(ExecuteProduceConsume.java:303)
> >                at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceCons
> ume(ExecuteProduceConsume.java:148)
> >                at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecutePr
> oduceConsume.java:136)
> >                at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.ja
> va:671)
> >                at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.ja
> va:589)
> >                at java.lang.Thread.run(Thread.java:748)
> > </str><int name="code">500</int></lst></response>
> >
> >
> > But we cannot use the sync call because we are running in a timout even if
> we increase the socket timeout of the client.
> > And we cannot use the async because it does not report errors.
> >
> > Is this a known bug? Any ideas for a workaround?
> >
> > Kind regards
> > Steffen Moldenhauer
> >
Mime
View raw message