spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Rodriguez <ski.rodrig...@gmail.com>
Subject Re: Spark 2.0 Dataset Documentation
Date Sat, 18 Jun 2016 05:28:49 GMT
The updates look great!

Looks like many places are updated to the new APIs, but there still isn't a
section for working with Datasets (most of the docs work with Dataframes).
Are you planning on adding more? I am thinking something that would address
common questions like the one I posted on the user email list earlier today.

Should I take discussion to your PR?

Pedro

On Fri, Jun 17, 2016 at 11:12 PM, Cheng Lian <lian.cs.zju@gmail.com> wrote:

> Hey Pedro,
>
> SQL programming guide is being updated. Here's the PR, but not merged yet:
> https://github.com/apache/spark/pull/13592
>
> Cheng
> On 6/17/16 9:13 PM, Pedro Rodriguez wrote:
>
> Hi All,
>
> At my workplace we are starting to use Datasets in 1.6.1 and even more
> with Spark 2.0 in place of Dataframes. I looked at the 1.6.1 documentation
> then the 2.0 documentation and it looks like not much time has been spent
> writing a Dataset guide/tutorial.
>
> Preview Docs:
> https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html#creating-datasets
> Spark master docs:
> https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md
>
> I would like to spend the time to contribute an improvement to those docs
> with a more in depth examples of creating and using Datasets (eg using $ to
> select columns). Is this of value, and if so what should my next step be to
> get this going (create JIRA etc)?
>
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> R&D Data Science Intern at Oracle Data Cloud
> UC Berkeley AMPLab Alumni
>
> ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>
>


-- 
Pedro Rodriguez
PhD Student in Distributed Machine Learning | CU Boulder
UC Berkeley AMPLab Alumni

ski.rodriguez@gmail.com | pedrorodriguez.io | 909-353-4423
Github: github.com/EntilZha | LinkedIn:
https://www.linkedin.com/in/pedrorodriguezscience

Mime
View raw message