Hi all,

I'm using Spark Streaming mapWithState operation to do a stateful operation on my Kafka stream (though I think similar arguments would apply for any source).

Trying to understand a way to control mapWithState's partitioning schema.

My transformations are simple: 

1) create KafkaDStream
2) mapPartitions to get a key-value stream where `key` corresponds to Kafka message key
3) apply mapWithState operation on key-value stream, the state stream shares keys with the original stream, the resulting streams doesn't change keys either

The problem is that, as I understand, mapWithState stream has a different partitioning schema and thus I see shuffles in Spark Web UI.

From the mapWithState implementation I see that:
mapwithState uses Partitioner if specified, otherwise partitions data with HashPartitioner(<default-parallelism-conf>). The thing is that original KafkaDStream has a specific partitioning schema: Kafka partitions correspond Spark RDD partitions.

Question: is there a way for mapWithState stream to inherit partitioning schema from the original stream (i.e. correspond to Kafka partitions).

Thanks,
Andrii