spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xinh Huynh <xinh.hu...@gmail.com>
Subject Re: Dataset Select Function after Aggregate Error
Date Fri, 17 Jun 2016 22:53:08 GMT
Hi Pedro,

In 1.6.1, you can do:
>> ds.groupBy(_.uid).count().map(_._1)
or
>> ds.groupBy(_.uid).count().select($"value".as[String])

It doesn't have the exact same syntax as for DataFrame.
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset

It might be different in 2.0.

Xinh

On Fri, Jun 17, 2016 at 3:33 PM, Pedro Rodriguez <ski.rodriguez@gmail.com>
wrote:

> Hi All,
>
> I am working on using Datasets in 1.6.1 and eventually 2.0 when its
> released.
>
> I am running the aggregate code below where I have a dataset where the row
> has a field uid:
>
> ds.groupBy(_.uid).count()
> // res0: org.apache.spark.sql.Dataset[(String, Long)] = [_1: string, _2:
> bigint]
>
> This works as expected, however, attempts to run select statements after
> fails:
> ds.groupBy(_.uid).count().select(_._1)
> // error: missing parameter type for expanded function ((x$2) => x$2._1)
> ds.groupBy(_.uid).count().select(_._1)
>
> I have tried several variants, but nothing seems to work. Below is the
> equivalent Dataframe code which works as expected:
> df.groupBy("uid").count().select("uid")
>
> Thanks!
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>

Mime
View raw message