spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Difference between Data set and Data Frame in Spark 2
Date Thu, 01 Sep 2016 16:15:36 GMT
On Thu, Sep 1, 2016 at 4:56 PM, Mich Talebzadeh
<mich.talebzadeh@gmail.com> wrote:
> Data Frame built on top of RDD to create as tabular format that we all love
> to make the original build easily usable (say SQL like queries, column
> headings etc). The drawback is it restricts you with what you can do with
> Data Frame (now that you have dome RDD.toDF)

DataFrame is a Dataset[Row], literally, rather than based on an RDD.

> DataSet  is the new RDD with improvements on RDD. As I understand from
> Sean's explanation they add some optimisation on top the common RDD.

At the moment I don't think there's any particular reason to use RDDs
except to interoperate with code that uses RDDs -- which is entirely
valid. I believe new code would generally touch only Dataset and
DataFrame otherwise. So I don't think there are really 3 elemental
concepts in play as of Spark 2.x.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message