spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Bulk insert strategy
Date Sun, 08 Mar 2015 14:57:13 GMT
What's the expected number of partitions in your use case ?

Have you thought of doing batching in the workers ?

Cheers

On Sat, Mar 7, 2015 at 10:54 PM, A.K.M. Ashrafuzzaman <
ashrafuzzaman.g2@gmail.com> wrote:

> While processing DStream in the Spark Programming Guide, the suggested
> usage of connection is the following,
>
> dstream.foreachRDD(rdd => {
>       rdd.foreachPartition(partitionOfRecords => {
>           // ConnectionPool is a static, lazily initialized pool of connections
>           val connection = ConnectionPool.getConnection()
>           partitionOfRecords.foreach(record => connection.send(record))
>           ConnectionPool.returnConnection(connection)  // return to the pool for future
reuse
>       })
>   })
>
>
> In this case processing and the insertion is done in the workers. There,
> we don’t use batch insert in db. How about this use case, where we can
> process(parse string JSON to obj) and send back those objects to master and
> then send a bulk insert request. Is there any benefit for sending
> individually using connection pool vs use of bulk operation in the master?
>
> A.K.M. Ashrafuzzaman
> Lead Software Engineer
> NewsCred <http://www.newscred.com/>
>
> (M) 880-175-5592433
> Twitter <https://twitter.com/ashrafuzzaman> | Blog
> <http://jitu-blog.blogspot.com/> | Facebook
> <https://www.facebook.com/ashrafuzzaman.jitu>
>
> Check out The Academy <http://newscred.com/theacademy>, your #1 source
> for free content marketing resources
>
>

Mime
View raw message