spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rabin Banerjee <dev.rabin.baner...@gmail.com>
Subject Re: Confusion SparkSQL DataFrame OrderBy followed by GroupBY
Date Thu, 03 Nov 2016 13:59:44 GMT
Hi Koert & Robin ,

*  Thanks ! *But if you go through the blog https://bzhangusc.wordpress.co
m/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/ and check the
comments under the blog it's actually working, although I am not sure how .
And yes I agree a custom aggregate UDAF is a good option .

Can anyone share the best way to implement this in Spark .?

Regards,
Rabin Banerjee

On Thu, Nov 3, 2016 at 6:59 PM, Koert Kuipers <koert@tresata.com> wrote:

> Just realized you only want to keep first element. You can do this without
> sorting by doing something similar to min or max operation using a custom
> aggregator/udaf or reduceGroups on Dataset. This is also more efficient.
>
> On Nov 3, 2016 7:53 AM, "Rabin Banerjee" <dev.rabin.banerjee@gmail.com>
> wrote:
>
>> Hi All ,
>>
>>   I want to do a dataframe operation to find the rows having the latest
>> timestamp in each group using the below operation
>>
>> df.orderBy(desc("transaction_date")).groupBy("mobileno").agg(first("customername").as("customername"),first("service_type").as("service_type"),first("cust_addr").as("cust_abbr"))
>> .select("customername","service_type","mobileno","cust_addr")
>>
>>
>> *Spark Version :: 1.6.x*
>>
>> My Question is *"Will Spark guarantee the Order while doing the groupBy , if DF is
ordered using OrderBy previously in Spark 1.6.x"??*
>>
>>
>> *I referred a blog here :: **https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/
<https://bzhangusc.wordpress.com/2015/05/28/groupby-on-dataframe-is-not-the-groupby-on-rdd/>*
>>
>> *Which claims it will work except in Spark 1.5.1 and 1.5.2 .*
>>
>>
>> *I need a bit elaboration of how internally spark handles it ? also is it more efficient
than using a Window function ?*
>>
>>
>> *Thanks in Advance ,*
>>
>> *Rabin Banerjee*
>>
>>
>>
>>

Mime
View raw message