storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tech Id <>
Subject Question on SolrUpdateBolt
Date Fri, 15 Apr 2016 20:47:45 GMT

I have a question on SolrUpdateBolt.execute()

It seems that SolrUpdateBolt is sending every tuple to Solr in the
execute() method but sending a commit() only after a specified number of
documents have been sent.

Would it be better if we batch the documents in memory and then send to
Solr ?

I am drawing inspiration from another very popular search-engine bolt
EsBolt that keeps the tuples in memory and then sends one batch-request
along with ack() or fail() based on a single batch-request's outcome.

Here are some pointers on the EsBolt that shows how they do it:
--> RestRepository.writeToIndex()
 ---> RestRepository.doWriteToIndex()

If we do the same in SolrUpdateBolt, the number of http-calls is reduced by
a factor of N, where N is the batch-size of the request and that would be a
good performance boost IMO


View raw message