spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Which one preferred -- Dataset.ofRows vs SparkSession.baseRelationToDataFrame?
Date Mon, 15 May 2017 19:27:14 GMT
Hi,

While reviewing how BaseRelation and "relatives" are used in Spark SQL
I've noticed that some code paths use Dataset.ofRows [1] while other
prefer SparkSession.baseRelationToDataFrame [2].

[1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L65-L69
[2] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L414-L416

It looks like SparkSession.baseRelationToDataFrame is the preferred
way. Should that be fixed?

p.s. There's also the private Dataset.withPlan that looks so similar
to the others [3]

[3] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L2940-L2942

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message