spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayuresh Kunjir <mayuresh.kun...@gmail.com>
Subject Re: Bagel caching issues
Date Sun, 01 Dec 2013 02:58:24 GMT
I tried passing DISK_ONLY storage level to Bagel's run method. It's running
without any error (so far) but is too slow. I am attaching details for a
stage corresponding to second iteration of my algorithm. (foreach at
Bagel.scala:237<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/stages/stage?id=23>)
It's been running for more than 35 minutes. I am noticing very high GC time
for some tasks. Listing below the setup parameters.

#nodes = 16
SPARK_WORKER_MEMORY = 13G
SPARK_MEM = 13G
RDD storage fraction = 0.5
degree of parallelism = 192 (16 nodes * 4 cores each * 3)
Serializer = Kryo
Vertex data size after serialization = ~12G (probably too high, but it's
the bare minimum required for the algorithm.)

I would be grateful if you could suggest some further optimizations or
point out reasons why/if Bagel is not suitable for this data size. I need
to further scale my cluster and not feeling confident at all looking at
this.

Thanks and regards,
~Mayuresh


On Sat, Nov 30, 2013 at 3:07 PM, Mayuresh Kunjir
<mayuresh.kunjir@gmail.com>wrote:

> Hi Spark users,
>
> I am running a pagerank-style algorithm on Bagel and bumping into "out of
> memory" issues with that.
>
> Referring to the following table, rdd_120 is the rdd of vertices,
> serialized and compressed in memory. On each iteration, Bagel deserializes
> the compressed rdd. e.g. rdd_126 shows the uncompressed version of rdd_120
> persisted in memory and disk. As iterations keep piling on, the cached
> partitions start getting evicted. The moment a rdd_120 partition gets
> evicted, it necessitates a recomputations and the performance goes for a
> toss. Although we don't need uncompressed rdds from previous iterations,
> they are the last ones to get evicted thanks to LRU policy.
>
> Should I make Bagel use DISK_ONLY persistence? How much of a performance
> hit would that be? Or maybe there is a better solution here.
>
> Storage
>  RDD NameStorage Level Cached PartitionsFraction Cached Size in MemorySize
> on Disk rdd_83<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=83>Memory
Serialized1x Replicated2312%83.7 MB0.0 B
> rdd_95<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=95>Memory
Serialized1x Replicated23
> 12% 2.5 MB 0.0 B rdd_120<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=120>Memory
Serialized1x Replicated2513%761.1 MB0.0 B
> rdd_126<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=126>Disk
Memory Deserialized 1x Replicated192
> 100% 77.9 GB 1016.5 MB rdd_134<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=134>Disk
Memory Deserialized 1x Replicated18596%60.8 GB475.4 MB
> Thanks and regards,
> ~Mayuresh
>

Mime
View raw message