spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From huangjay <ja...@live.cn>
Subject Re: Bagel caching issues
Date Thu, 05 Dec 2013 13:31:35 GMT
Hi,

Maybe you need to check those nodes. It's very slow.


3487	SUCCESS	PROCESS_LOCAL	ip-10-60-150-111.ec2.internal	2013/12/01 02:11:38	17.7 m	16.3 m
23.3 MB	
3447	SUCCESS	PROCESS_LOCAL	ip-10-12-54-63.ec2.internal	2013/12/01 02:11:26	20.1 m	13.9 m	50.9
MB	


> 在 2013年12月1日,上午10:59,"Mayuresh Kunjir" <mayuresh.kunjir@gmail.com>
写道:
> 
> I tried passing DISK_ONLY storage level to Bagel's run method. It's running without any
error (so far) but is too slow. I am attaching details for a stage corresponding to second
iteration of my algorithm. (foreach at Bagel.scala:237) It's been running for more than 35
minutes. I am noticing very high GC time for some tasks. Listing below the setup parameters.

> 
> #nodes = 16
> SPARK_WORKER_MEMORY = 13G
> SPARK_MEM = 13G
> RDD storage fraction = 0.5
> degree of parallelism = 192 (16 nodes * 4 cores each * 3)
> Serializer = Kryo
> Vertex data size after serialization = ~12G (probably too high, but it's the bare minimum
required for the algorithm.)
> 
> I would be grateful if you could suggest some further optimizations or point out reasons
why/if Bagel is not suitable for this data size. I need to further scale my cluster and not
feeling confident at all looking at this.
> 
> Thanks and regards,
> ~Mayuresh
> 
> 
>> On Sat, Nov 30, 2013 at 3:07 PM, Mayuresh Kunjir <mayuresh.kunjir@gmail.com>
wrote:
>> Hi Spark users,
>> 
>> I am running a pagerank-style algorithm on Bagel and bumping into "out of memory"
issues with that. 
>> 
>> Referring to the following table, rdd_120 is the rdd of vertices, serialized and
compressed in memory. On each iteration, Bagel deserializes the compressed rdd. e.g. rdd_126
shows the uncompressed version of rdd_120 persisted in memory and disk. As iterations keep
piling on, the cached partitions start getting evicted. The moment a rdd_120 partition gets
evicted, it necessitates a recomputations and the performance goes for a toss. Although we
don't need uncompressed rdds from previous iterations, they are the last ones to get evicted
thanks to LRU policy. 
>> 
>> Should I make Bagel use DISK_ONLY persistence? How much of a performance hit would
that be? Or maybe there is a better solution here.
>> 
>> Storage
>> RDD Name	Storage Level	 Cached Partitions	Fraction Cached	 Size in Memory	Size on
Disk
>> rdd_83	 Memory Serialized1x Replicated	 23	 12%	 83.7 MB	 0.0 B
>> rdd_95	 Memory Serialized1x Replicated	 23	12%	2.5 MB	 0.0 B
>> rdd_120	 Memory Serialized1x Replicated	 25	 13%	 761.1 MB	 0.0 B
>> rdd_126	 Disk Memory Deserialized 1x Replicated	 192	100%	77.9 GB	 1016.5 MB
>> rdd_134	 Disk Memory Deserialized 1x Replicated	 185	 96%	 60.8 GB	 475.4 MB
>> Thanks and regards,
>> ~Mayuresh
> 
> <BigFrame - Details for Stage 23.htm>
Mime
View raw message