spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qihong <>
Subject Re: how to setup steady state stream partitions
Date Wed, 10 Sep 2014 04:13:13 GMT
Thanks for your response. I do have something like:

val inputDStream = ...
val keyedDStream =  // use sensorId as key
val partitionedDStream = keyedDstream.transform(rdd => rdd.partitionBy(new
val stateDStream = partitionedDStream.updateStateByKey[...](udpateFunction)

The partitionedDStream does have steady partitions, but stateDStream does
have steady partitions, i.e., in the partition 0 of partitionedDStream,
there's only
data for sensors 0 to 999, but the partition 0 of stateDStream contains data
for some sensors from 0 to 999 range, and lot of sensor from other
partitions of

I wish the partition 0 of stateDStream only contains the data from the
partition 0
of partitionedDStream, partiton 1 of stateDStream only from partition 1 of 
partitionedDStream, and so on. Anyone knows how to implement that?


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message