lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: ignoring bad documents during index
Date Sat, 10 Jan 2015 17:45:33 GMT
There are some significant throughput improvements when you batch up
a bunch of docs to Solr (assuming SolrJ). You can go ahead and send, say,
1,000 docs in a batch and if the batch fails, re-process the list to find the
bad doc.

But as Jack says, Solr could do better here.

Best,
Erick

On Sat, Jan 10, 2015 at 3:46 AM, Jack Krupansky
<jack.krupansky@gmail.com> wrote:
> Sending individual documents will give you absolute control - just make
> sure not to "commit" on each document sent since that would really slow
> down indexing.
>
> You could also send smaller batches, life 5 to 20 documents to balance
> between fine control and performance. It also depends on your document size
> - small documents should be collected into larger batches, but large
> documents should be sent in smaller batches. Sending a total of 2K to 20K
> of bytes of data at a time is probably a good target. Smaller than 2K
> incurs more overhead, and more than 50K or 100K may simply overload the
> server rather than optimize performance.
>
> -- Jack Krupansky
>
> On Sat, Jan 10, 2015 at 6:02 AM, SolrUser1543 <ostap26@gmail.com> wrote:
>
>> Would it be a good solution to index single document instead of bulk ?
>> In this case I will know about the status of each message .
>>
>> What is recommendation in this case : Bulk vs Single ?
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/ignoring-bad-documents-during-index-tp4176947p4178546.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Mime
View raw message