lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: update fails if one doc is wrong
Date Fri, 01 Mar 2013 01:15:42 GMT
This has been hanging around for a long time. I did some preliminary work
here: https://issues.apache.org/jira/browse/SOLR-445 but moved on to other
things before committing it. The discussion there might be useful.

FWIW,
Erick


On Wed, Feb 27, 2013 at 5:32 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Colleagues,
>
> Here are my considerations
>
> If the exception is occurs somewhere in updateprocessor we can add a
> special update processor on top of the head of update processor chain,
> which will catch exception from delegated processAdd call, log and/or
> swallow it.
> If it fits for the purpose we can try to figure out how to return failed
> doc ids back to the client. I'm not sure but i think it's possible. Just
> because responsewrite is quite -dumb- flexible, i e if update processor
> drops something to response, it should be blindly streamed back to the
> client.
>
> One more consideration.
> Anirudha,
> When you say "re-try them" do you mean to post a failed doc one more time?
> It seems I didn't get your point. Please clarify.
>  27.02.2013 1:13 пользователь "Anirudha Jadhav" <anirudha@nyu.edu>
> написал:
>
> > Ideally you would want to use SOLRJ or other interface which can catch
> > exceptions/error and re-try them.
> >
> >
> > On Tue, Feb 26, 2013 at 3:45 PM, Walter Underwood <wunder@wunderwood.org
> > >wrote:
> >
> > > I've done exactly the same thing. On error, set the batch size to one
> and
> > > try again.
> > >
> > > wunder
> > >
> > > On Feb 26, 2013, at 12:27 PM, Timothy Potter wrote:
> > >
> > > > Here's what I do to work-around failures when processing batches of
> > > updates:
> > > >
> > > > On client side, catch the exception that the batch failed. In the
> > > > exception handler, switch to one-by-one mode for the failed batch
> > > > only.
> > > >
> > > > This allows you to isolate the *bad* documents as well as getting the
> > > > *good* documents in the batch indexed in Solr.
> > > >
> > > > This assumes most batches work so you only pay the one-by-one penalty
> > > > for the occasional batch with a bad doc.
> > > >
> > > > Tim
> > > >
> > > > On Tue, Feb 26, 2013 at 12:08 PM, Isaac Hebsh <isaac.hebsh@gmail.com
> >
> > > wrote:
> > > >> Hi.
> > > >>
> > > >> I add documents to Solr by POSTing them to UpdateHandler, as bulks
> of
> > > <add>
> > > >> commands (DIH is not used).
> > > >>
> > > >> If one document contains any invalid data (e.g. string data into
> > numeric
> > > >> field), Solr returns HTTP 400 Bad Request, and the whole bulk is
> > failed.
> > > >>
> > > >> I'm searching for a way to tell Solr to accept the rest of the
> > > documents...
> > > >> (I'll use RealTimeGet to determine which documents were added).
> > > >>
> > > >> If there is no standard way for doing it, maybe it can be
> implemented
> > by
> > > >> spiltting the <add> commands into seperate HTTP POSTs. Because
of
> > using
> > > >> auto-soft-commit, can I say that it is almost equivalent? What is
> the
> > > >> performance penalty of 100 POST requests (of 1 document each)
> againt 1
> > > >> request of 100 docs, if a soft commit is eventually done.
> > > >>
> > > >> Thanks in advance...
> > >
> > > --
> > > Walter Underwood
> > > wunder@wunderwood.org
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Anirudha P. Jadhav
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message