spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <>
Subject Re: Spark Streaming: question on sticky session across batches ?
Date Tue, 15 Nov 2016 09:07:17 GMT
- dev


AFAIK, if you use RDDs only, you can control the partition mapping to some
by using a partition key RDD[(key, data)].
A defined partitioner distributes data into partitions depending on the key.
As a good example to control partitions, you can see the GraphX code;

GraphX holds `PartitionId` in edge RDDs to control the partition where edge
data are.

// maropu

On Tue, Nov 15, 2016 at 5:19 AM, Manish Malhotra <> wrote:

> sending again.
> any help is appreciated !
> thanks in advance.
> On Thu, Nov 10, 2016 at 8:42 AM, Manish Malhotra <
>> wrote:
>> Hello Spark Devs/Users,
>> Im trying to solve the use case with Spark Streaming 1.6.2 where for
>> every batch ( say 2 mins) data needs to go to the same reducer node after
>> grouping by key.
>> The underlying storage is Cassandra and not HDFS.
>> This is a map-reduce job, where also trying to use the partitions of the
>> Cassandra table to batch the data for the same partition.
>> The requirement of sticky session/partition across batches is because the
>> operations which we need to do, needs to read data for every key and then
>> merge this with the current batch aggregate values. So, currently when
>> there is no stickyness across batches, we have to read for every key, merge
>> and then write back. and reads are very expensive. So, if we have sticky
>> session, we can avoid read in every batch and have a cache of till last
>> batch aggregates across batches.
>> So, there are few options, can think of:
>> 1. to change the TaskSchedulerImpl, as its using Random to identify the
>> node for mapper/reducer before starting the batch/phase.
>> Not sure if there is a custom scheduler way of achieving it?
>> 2. Can custom RDD can help to find the node for the key-->node.
>> there is a getPreferredLocation() method.
>> But not sure, whether this will be persistent or can vary for some edge
>> cases?
>> Thanks in advance for you help and time !
>> Regards,
>> Manish

Takeshi Yamamuro

View raw message