spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés Ivaldi <iaiva...@gmail.com>
Subject DataFrame group and agg
Date Mon, 25 Apr 2016 18:34:20 GMT
Hello,
Anyone know if this is on purpose or its a bug?
in
https://github.com/apache/spark/blob/2f1d0320c97f064556fa1cf98d4e30d2ab2fe661/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
class

the def agg have many implemetations next two of them:
Line 136:
  def agg(aggExpr: (String, String), aggExprs: (String, String)*):
DataFrame = {
    agg((aggExpr +: aggExprs).toMap)
  }

Line 155:
  def agg(exprs: Map[String, String]): DataFrame = {
    toDF(exprs.map { case (colName, expr) =>
      strToExpr(expr)(df(colName).expr)
    }.toSeq)
  }

So this allow me to do somthing like .agg( "col1"->"sum", "col2"->"max"  )

But If I want to apply two differents agg function to same column, as the
method 136 creates map then somtihg like "col"->"sum", "col"->"max" will
end as "col"->"max"


I think this signatur of def whould work

  def agg(exprs: Seq[String, String]): DataFrame = {
    toDF(exprs.map { case (colName, expr) =>
      strToExpr(expr)(df(colName).expr)
    }.toSeq)
  }

Regards.

Mime
View raw message