spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mendelson, Assaf" <>
Subject RE: mapPartitioningWithIndex in Dataframe
Date Sun, 06 Aug 2017 05:31:42 GMT
First I believe you mean on the Dataset API rather than the dataframe API.
You can easily add the partition index as a new column to your dataframe using spark_partition_id()
Then a normal mapPartitions should work fine (i.e. you should create the appropriate case
class which includes the partition id and then do mapPartitions).


From: Lalwani, Jayesh []
Sent: Thursday, August 03, 2017 5:20 PM
Subject: mapPartitioningWithIndex in Dataframe

Are there any plans to add mapPartitioningWithIndex in the Dataframe API? Or is there any
way to implement my own mapPartitionWithIndex for a Dataframe?

I am implementing something which is logically similar to the randomSplit function. In 2.1,
randomSplit internally does df.mapPartitionWithIndex and assigns a different seed for every
partition by adding the partition’s index to the seed. I want to get  a partition specific
seed too.

The problem is rdd.mapPartitionWithIndex doesn’t work in streaming. df.mapPartition works,
but I don’t get index.

Is there a way to extend Spark to add mapPartitionWithIndex at the Dataframe level ?
I was digging into the 2.2 code a bit and it looks like in 2.2, all the Dataframe apis have
been changed to be based around SparkStrategy. I couldn’t figure out  how I can add my own
custom strategy. Is there any documentation around this? If it makes sense to add this to
Spark, I would be excited to make a contribution.


The information contained in this e-mail is confidential and/or proprietary to Capital One
and/or its affiliates and may only be used solely in performance of work or services for Capital
One. The information transmitted herewith is intended only for use by the individual or entity
to which it is addressed. If the reader of this message is not the intended recipient, you
are hereby notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is strictly prohibited.
If you have received this communication in error, please contact the sender and delete the
material from your computer.
View raw message