spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Dataset Select Function after Aggregate Error
Date Sat, 18 Jun 2016 02:53:09 GMT
Hi,

In 2.0, you can say;
val ds = Seq[Tuple2[Int, Int]]((1, 0), (2, 0)).toDS
ds.groupBy($"_1").count.select($"_1", $"count").show


// maropu


On Sat, Jun 18, 2016 at 7:53 AM, Xinh Huynh <xinh.huynh@gmail.com> wrote:

> Hi Pedro,
>
> In 1.6.1, you can do:
> >> ds.groupBy(_.uid).count().map(_._1)
> or
> >> ds.groupBy(_.uid).count().select($"value".as[String])
>
> It doesn't have the exact same syntax as for DataFrame.
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset
>
> It might be different in 2.0.
>
> Xinh
>
> On Fri, Jun 17, 2016 at 3:33 PM, Pedro Rodriguez <ski.rodriguez@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am working on using Datasets in 1.6.1 and eventually 2.0 when its
>> released.
>>
>> I am running the aggregate code below where I have a dataset where the
>> row has a field uid:
>>
>> ds.groupBy(_.uid).count()
>> // res0: org.apache.spark.sql.Dataset[(String, Long)] = [_1: string, _2:
>> bigint]
>>
>> This works as expected, however, attempts to run select statements after
>> fails:
>> ds.groupBy(_.uid).count().select(_._1)
>> // error: missing parameter type for expanded function ((x$2) => x$2._1)
>> ds.groupBy(_.uid).count().select(_._1)
>>
>> I have tried several variants, but nothing seems to work. Below is the
>> equivalent Dataframe code which works as expected:
>> df.groupBy("uid").count().select("uid")
>>
>> Thanks!
>> --
>> Pedro Rodriguez
>> PhD Student in Distributed Machine Learning | CU Boulder
>> UC Berkeley AMPLab Alumni
>>
>> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
>> Github: github.com/EntilZha | LinkedIn:
>> https://www.linkedin.com/in/pedrorodriguezscience
>>
>>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message