spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <amitsel...@gmail.com>
Subject Does MapWithState follow with a shuffle ?
Date Tue, 29 Nov 2016 21:16:52 GMT
Hi all,

I've been digging into MapWithState code (branch 1.6), and I came across
the compute
<https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159>
implementation in *InternalMapWithStateDStream*.

Looking at the defined partitioner
<https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L112>
it
looks like it could be different from the parent RDD partitioner (if
defaultParallelism() changed for instance, or input partitioning was
smaller to begin with), which will eventually create
<https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L537>
a ShuffleRDD.

Am I reading this right ?

Thanks,
Amit

Mime
View raw message