spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashrafuzzaman <ashrafuzzaman...@gmail.com>
Subject Re: Bulk insert strategy
Date Sun, 08 Mar 2015 19:03:40 GMT
Yes so that brings me to another question. How do I do a batch insert from
worker?
In prod we are planning to put a 3 shared kinesis. So the number of
partitions should be 3. Right?
On Mar 8, 2015 8:57 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:

> What's the expected number of partitions in your use case ?
>
> Have you thought of doing batching in the workers ?
>
> Cheers
>
> On Sat, Mar 7, 2015 at 10:54 PM, A.K.M. Ashrafuzzaman <
> ashrafuzzaman.g2@gmail.com> wrote:
>
>> While processing DStream in the Spark Programming Guide, the suggested
>> usage of connection is the following,
>>
>> dstream.foreachRDD(rdd => {
>>       rdd.foreachPartition(partitionOfRecords => {
>>           // ConnectionPool is a static, lazily initialized pool of connections
>>           val connection = ConnectionPool.getConnection()
>>           partitionOfRecords.foreach(record => connection.send(record))
>>           ConnectionPool.returnConnection(connection)  // return to the pool for
future reuse
>>       })
>>   })
>>
>>
>> In this case processing and the insertion is done in the workers. There,
>> we don’t use batch insert in db. How about this use case, where we can
>> process(parse string JSON to obj) and send back those objects to master and
>> then send a bulk insert request. Is there any benefit for sending
>> individually using connection pool vs use of bulk operation in the master?
>>
>> A.K.M. Ashrafuzzaman
>> Lead Software Engineer
>> NewsCred <http://www.newscred.com/>
>>
>> (M) 880-175-5592433
>> Twitter <https://twitter.com/ashrafuzzaman> | Blog
>> <http://jitu-blog.blogspot.com/> | Facebook
>> <https://www.facebook.com/ashrafuzzaman.jitu>
>>
>> Check out The Academy <http://newscred.com/theacademy>, your #1 source
>> for free content marketing resources
>>
>>
>

Mime
View raw message