spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: Aggregations with scala pairs
Date Thu, 18 Aug 2016 06:35:47 GMT
Agreed.

Regards
JB



On Aug 18, 2016, 07:32, at 07:32, Olivier Girardot <o.girardot@lateral-thoughts.com>
wrote:
>CC'ing dev list, you should open a Jira and a PR related to it to
>discuss it c.f.
>https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCodeChanges
>
>
>
>
>
>On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaivaldi@gmail.com wrote:
>Hello, I'd like to report a wrong behavior of DataSet's API, I don´t
>know how I
>can do that. My Jira account doesn't allow me to add a Issue
>I'm using Apache 2.0.0 but the problem came since at least version 1.4
>(given
>the doc since 1.3)
>The problem is simple to reporduce, also the work arround, if we apply
>agg over
>a DataSet with scala pairs over the same column, only one agg over that
>column
>is actualy used, this is because the toMap that reduce the pair values
>of the
>mane key to one and overwriting the value
>class 
>https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
>
>
>def agg(aggExpr: (String, String), aggExprs: (String, String)*):
>DataFrame = {
>agg((aggExpr +: aggExprs).toMap)
>}
>rewrited as somthing like this should work def agg(aggExpr: (String,
>String), aggExprs: (String, String)*): DataFrame = {
>toDF((aggExpr +: aggExprs).map { pairExpr =>
>strToExpr(pairExpr._2)(df(pairExpr._1).expr) }.toSeq) }
>
>regards --
>Ing. Ivaldi Andres
>
>
>Olivier Girardot | Associé
>o.girardot@lateral-thoughts.com
>+33 6 24 09 17 94

Mime
View raw message