spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Girardot <o.girar...@lateral-thoughts.com>
Subject Re: Aggregations with scala pairs
Date Thu, 18 Aug 2016 06:32:10 GMT
CC'ing dev list, you should open a Jira and a PR related to it to discuss it c.f.
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCodeChanges





On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaivaldi@gmail.com wrote:
Hello, I'd like to report a wrong behavior of DataSet's API, I don´t know how I
can do that. My Jira account doesn't allow me to add a Issue
I'm using Apache 2.0.0 but the problem came since at least version 1.4 (given
the doc since 1.3)
The problem is simple to reporduce, also the work arround, if we apply agg over
a DataSet with scala pairs over the same column, only one agg over that column
is actualy used, this is because the toMap that reduce the pair values of the
mane key to one and overwriting the value
class 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala


def agg(aggExpr: (String, String), aggExprs: (String, String)*): DataFrame = {
agg((aggExpr +: aggExprs).toMap)
}
rewrited as somthing like this should work def agg(aggExpr: (String, String), aggExprs: (String,
String)*): DataFrame = {
toDF((aggExpr +: aggExprs).map { pairExpr => strToExpr(pairExpr._2)(df(pairExpr._1).expr)
}.toSeq) }

regards --
Ing. Ivaldi Andres


Olivier Girardot | Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94
Mime
View raw message