lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: ignoring bad documents during index
Date Sat, 10 Jan 2015 17:45:33 GMT
There are some significant throughput improvements when you batch up
a bunch of docs to Solr (assuming SolrJ). You can go ahead and send, say,
1,000 docs in a batch and if the batch fails, re-process the list to find the
bad doc.

But as Jack says, Solr could do better here.


On Sat, Jan 10, 2015 at 3:46 AM, Jack Krupansky
<> wrote:
> Sending individual documents will give you absolute control - just make
> sure not to "commit" on each document sent since that would really slow
> down indexing.
> You could also send smaller batches, life 5 to 20 documents to balance
> between fine control and performance. It also depends on your document size
> - small documents should be collected into larger batches, but large
> documents should be sent in smaller batches. Sending a total of 2K to 20K
> of bytes of data at a time is probably a good target. Smaller than 2K
> incurs more overhead, and more than 50K or 100K may simply overload the
> server rather than optimize performance.
> -- Jack Krupansky
> On Sat, Jan 10, 2015 at 6:02 AM, SolrUser1543 <> wrote:
>> Would it be a good solution to index single document instead of bulk ?
>> In this case I will know about the status of each message .
>> What is recommendation in this case : Bulk vs Single ?
>> --
>> View this message in context:
>> Sent from the Solr - User mailing list archive at

View raw message