spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jggg777 <>
Subject Is there a processing speed difference between DataFrames and Datasets?
Date Tue, 22 Nov 2016 14:50:11 GMT
I've seen a number of visuals showing the processing time benefits of using
Datasets+DataFrames over RDDs, but I'd assume that there are performance
benefits to using a defined case class instead a generic Dataset[Row].  The
tale of three Spark APIs post mentions "If you want higher degree of
type-safety at compile time, want typed JVM objects, *take advantage of
Catalyst optimization, and benefit from Tungsten’s efficient code
generation, use Dataset.*"

Are there any comparisons showing the performance differences between
Datasets and DataFrames?  Or more information about how Catalyst/Tungsten
handle them differently?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

View raw message