spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <>
Subject Re: Pyspark Partitioning
Date Mon, 01 Oct 2018 16:22:19 GMT

the most simple option is create UDF's of these different functions and
then use case statement (or similar) in SQL and pass it on. But this is low
tech, in case you have conditions based on record values which are even
more granular, why not use a single UDF, and then let conditions handle it.

But I think that UDF is not that super unless you use Scala.

It will be interesting to see if there are other scalable options (which
are not RDD based) from the group.

Gourav Sengupta

On Sun, Sep 30, 2018 at 7:31 PM dimitris plakas <>

> Hello everyone,
> I am trying to split a dataframe on partitions and i want to apply a
> custom function on every partition. More precisely i have a dataframe like
> the one below
> Group_Id | Id | Points
> 1            | id1| Point1
> 2            | id2| Point2
> I want to have a partition for every Group_Id and apply on every partition
> a function defined by me.
> I have tried with partitionBy('Group_Id').mapPartitions() but i receive
> error.
> Could you please advice me how to do it?

View raw message