spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Bryński <mac...@brynski.pl>
Subject Cache'ing performance
Date Sat, 27 Aug 2016 20:39:16 GMT
Hi,
I did some benchmark of cache function today.

*RDD*
sc.parallelize(0 until Int.MaxValue).cache().count()

*Datasets*
spark.range(Int.MaxValue).cache().count()

For me Datasets was 2 times slower.

Results (3 nodes, 20 cores and 48GB RAM each)
*RDD - 6s*
*Datasets - 13,5 s*

Is that expected behavior for Datasets and Encoders ?

Regards,
-- 
Maciek Bryński

Mime
View raw message