spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wxhsdp <>
Subject Re: storage.MemoryStore estimated size 7 times larger than real
Date Tue, 15 Apr 2014 03:48:25 GMT
thanks for your help,  Davidson!
i modified
val a:RDD[Int] = sc.parallelize(array).cache()
to keep "val a" an RDD of Int, but has the same result

another question
JVM and spark memory locate at different parts of system memory, the spark
code is executed in JVM memory, malloc operation like val e = new
Array[Int](2*size) /*8MB*/ use JVM memory. if not cached, generated RDDs are
writed back to disk, if cached, RDDs are copied to spark memory for further
use, is that

val RDD_1 = RDD_0.groupByKey{...}
shuffle separate stages, can anyone tell me the memory/disk usage of shuffle
input  RDD and shuffle output RDD under the condition that RDD_0, RDD_1 is
cached or not? 

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message