lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-445) Update Handlers abort with bad documents
Date Sun, 27 Apr 2014 18:49:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982423#comment-13982423
] 

Tomás Fernández Löbbe commented on SOLR-445:
--------------------------------------------

bq. As a side note, this DistributedUpdateProcessor behavior makes it “tolerant”, but
only in some cases? 
I have confirmed this. Depending on which node gets the initial update request and the position
of the invalid doc in the batch, the docs that end up indexed will vary from 0 to all but
the invalid doc. 

> Update Handlers abort with bad documents
> ----------------------------------------
>
>                 Key: SOLR-445
>                 URL: https://issues.apache.org/jira/browse/SOLR-445
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-445-3_x.patch, SOLR-445-alternative.patch, SOLR-445-alternative.patch,
SOLR-445-alternative.patch, SOLR-445-alternative.patch, SOLR-445.patch, SOLR-445.patch, SOLR-445.patch,
SOLR-445.patch, SOLR-445_3x.patch, solr-445.xml
>
>
> Has anyone run into the problem of handling bad documents / failures mid batch.  Ie:
> <add>
>   <doc>
>     <field name="id">1</field>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="myDateField">I_AM_A_BAD_DATE</field>
>   </doc>
>   <doc>
>     <field name="id">3</field>
>   </doc>
> </add>
> Right now solr adds the first doc and then aborts.  It would seem like it should either
fail the entire batch or log a message/return a code and then continue on to add doc 3.  Option
1 would seem to be much harder to accomplish and possibly require more memory while Option
2 would require more information to come back from the API.  I'm about to dig into this but
I thought I'd ask to see if anyone had any suggestions, thoughts or comments.    



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message