spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: Top rows per group
Date Mon, 16 Mar 2015 18:13:08 GMT
https://issues.apache.org/jira/browse/SPARK-5954 is for this issue and
Shuo is working on it. We will first implement topByKey for RDD and
them we could add it to DataFrames. -Xiangrui

On Mon, Mar 9, 2015 at 9:43 PM, Moss <rhouda27@gmail.com> wrote:
>  I do have a schemaRDD where I want to group by a given field F1, but  want
> the result to be not a single row per group but multiple rows per group
> where only the rows that have the N top F2 field values are kept.
> The issue is that the groupBy operation is an aggregation of multiple rows
> to a single one.
> Any suggestion or hint will be appreciated.
>
> Best,
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Top-rows-per-group-tp21983.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message