spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jggg777 <jonrgr...@gmail.com>
Subject Is there a processing speed difference between DataFrames and Datasets?
Date Tue, 22 Nov 2016 14:50:11 GMT
I've seen a number of visuals showing the processing time benefits of using
Datasets+DataFrames over RDDs, but I'd assume that there are performance
benefits to using a defined case class instead a generic Dataset[Row].  The
tale of three Spark APIs post mentions "If you want higher degree of
type-safety at compile time, want typed JVM objects, *take advantage of
Catalyst optimization, and benefit from Tungsten’s efficient code
generation, use Dataset.*"

Are there any comparisons showing the performance differences between
Datasets and DataFrames?  Or more information about how Catalyst/Tungsten
handle them differently?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-processing-speed-difference-between-DataFrames-and-Datasets-tp28117.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message