spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés Ivaldi <iaiva...@gmail.com>
Subject Re: Aggregations with scala pairs
Date Thu, 18 Aug 2016 15:35:50 GMT
Thanks!!!

On Thu, Aug 18, 2016 at 3:35 AM, Jean-Baptiste Onofré <jb@nanthrax.net>
wrote:

> Agreed.
>
> Regards
> JB
> On Aug 18, 2016, at 07:32, Olivier Girardot <o.girardot@lateral-thoughts.
> com> wrote:
>>
>> CC'ing dev list,
>> you should open a Jira and a PR related to it to discuss it c.f.
>> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#
>> ContributingtoSpark-ContributingCodeChanges
>>
>>
>>
>> On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaivaldi@gmail.com wrote:
>>
>>> Hello, I'd like to report a wrong behavior of DataSet's API, I don´t
>>> know how I can do that. My Jira account doesn't allow me to add a Issue
>>>
>>> I'm using Apache 2.0.0 but the problem came since at least version 1.4
>>> (given the doc since 1.3)
>>>
>>> The problem is simple to reporduce, also the work arround, if we apply
>>> agg over a DataSet with scala pairs over the same column, only one agg over
>>> that column is actualy used, this is because the toMap that reduce the pair
>>> values of the mane key to one and overwriting the value
>>>
>>> class
>>> https://github.com/apache/spark/blob/master/sql/core/
>>> src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala
>>>
>>>
>>>  def agg(aggExpr: (String, String), aggExprs: (String, String)*):
>>> DataFrame = {
>>>     agg((aggExpr +: aggExprs).toMap)
>>>   }
>>>
>>>
>>> rewrited as somthing like this should work
>>>  def agg(aggExpr: (String, String), aggExprs: (String, String)*):
>>> DataFrame = {
>>>    toDF((aggExpr +: aggExprs).map { pairExpr =>
>>>       strToExpr(pairExpr._2)(df(pairExpr._1).expr)
>>>     }.toSeq)
>>> }
>>>
>>>
>>> regards
>>> --
>>> Ing. Ivaldi Andres
>>>
>>
>>
>> *Olivier Girardot*   | Associé
>> o.girardot@lateral-thoughts.com
>> +33 6 24 09 17 94
>>
>


-- 
Ing. Ivaldi Andres

Mime
View raw message