spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guangle Fan <fanguan...@gmail.com>
Subject Re: Spark Streaming
Date Fri, 18 Jul 2014 07:04:45 GMT
Thanks Tathagata! I tried it, and worked out perfectly.


On Thu, Jul 17, 2014 at 6:34 PM, Tathagata Das <tathagata.das1565@gmail.com>
wrote:

> Step 1: use rdd.mapPartitions(). Thats equivalent to mapper of the
> MapReduce. You can open connection, get all the data and buffer it, close
> connection, return iterator to the buffer
> Step 2: Make step 1 better, by making it reuse connections. You can use
> singletons / static vars, to lazily initialize and reuse a pool of
> connections. You will have to take care of concurrency, as multiple tasks
> may using the database in parallel in the same worker JVM.
>
> TD
>
>
> On Thu, Jul 17, 2014 at 4:41 PM, Guangle Fan <fanguangle@gmail.com> wrote:
>
>> Hi, All
>>
>> When I run spark streaming, in one of the flatMap stage, I want to access
>> database.
>>
>> Code looks like :
>>
>> stream.flatMap(
>> new FlatMapFunction {
>>     call () {
>>         //access database cluster
>>     }
>>   }
>> )
>>
>> Since I don't want to create database connection every time call() was
>> called, where is the best place do I create the connection and reuse it on
>> per-host basis (Like one database connection per Mapper/Reducer ) ?
>>
>> Regards,
>>
>> Guangle
>>
>>
>

Mime
View raw message