spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhilanand <akhilanand...@gmail.com>
Subject Difference between dataset and dataframe
Date Tue, 19 Feb 2019 02:01:47 GMT

Hello, 

I have been recently exploring about dataset and dataframes. I would really appreciate if
someone could answer these questions:

1) Is there any difference in terms performance when we use datasets over dataframes? Is it
significant to choose 1 over other. I do realise there would be some overhead due case classes
but how significant is that? Are there any other implications. 

2) Is the Tungsten code generation done only for datasets or is there any internal process
to generate bytecode for dataframes as well? Since its related to jvm , I think its just for
datasets but I couldn’t find anything that tells it specifically. If its just for datasets
, does that mean we miss out on the project tungsten optimisation for dataframes?



Regards,
Akhilanand BV

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message