spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés Ivaldi <iaiva...@gmail.com>
Subject Aggregations with scala pairs
Date Wed, 17 Aug 2016 14:01:24 GMT
Hello, I'd like to report a wrong behavior of DataSet's API, I don´t know
how I can do that. My Jira account doesn't allow me to add a Issue

I'm using Apache 2.0.0 but the problem came since at least version 1.4
(given the doc since 1.3)

The problem is simple to reporduce, also the work arround, if we apply agg
over a DataSet with scala pairs over the same column, only one agg over
that column is actualy used, this is because the toMap that reduce the pair
values of the mane key to one and overwriting the value

class
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala


 def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame
> = {
>     agg((aggExpr +: aggExprs).toMap)
>   }


rewrited as somthing like this should work
 def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame
= {
   toDF((aggExpr +: aggExprs).map { pairExpr =>
      strToExpr(pairExpr._2)(df(pairExpr._1).expr)
    }.toSeq)
}


regards
-- 
Ing. Ivaldi Andres

Mime
View raw message