spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghavendra Pandey <raghavendra.pan...@gmail.com>
Subject Re: Filter on Grouped Data
Date Fri, 03 Jul 2015 15:20:48 GMT
Why dont you apply filter first and then Group the data and run
aggregations..
On Jul 3, 2015 1:29 PM, "Megha Sridhar- Cynepia" <megha.sridhara@cynepia.com>
wrote:

> Hi,
>
>
> I have a Spark DataFrame object, which when trimmed, looks like,
>
>
>
> From            To                  Subject        Message-ID
> karen.den@xyz.com    ['vance.meyer@enron.com',         SEC Inquiry
> <19952575.1075858>
>              'jeannie.mandelker@enron.com',
>              'mary.clark@enron.com',
>              'sarah.palmer@enron.com']
>
>
>
> elyn.hughes@xyz.com    ['dennis.vegas@enron.com',        Revised
> documents    <33499184.1075858>
>              'gina.taylor@enron.com',
>              'kelly.kimberly@enron.com']
> .
> .
> .
>
>
> I have run a groupBy("From") on the above dataFrame and obtained a
> GroupedData object as a result. I need to apply a filter on the grouped
> data (for instance, getting the sender who sent maximum number of the mails
> that were addressed to a particular receiver in the "To" list).
> Is there a way to accomplish this by applying filter on grouped data?
>
>
> Thanks,
> Megha
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message