spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Spilling when not expected
Date Fri, 13 Mar 2015 07:05:06 GMT
How did you run the Spark command? Maybe the memory setting didn't actually
apply? How much memory does the web ui say is available?

BTW - I don't think any JVM can actually handle 700G heap ... (maybe Zing).

On Thu, Mar 12, 2015 at 4:09 PM, Tom Hubregtsen <thubregtsen@gmail.com>
wrote:

> Hi all,
>
> I'm running the teraSort benchmark with a relative small input set: 5GB.
> During profiling, I can see I am using a total of 68GB. I've got a terabyte
> of memory in my system, and set
> spark.executor.memory 900g
> spark.driver.memory 900g
> I use the default for
> spark.shuffle.memoryFraction
> spark.storage.memoryFraction
> I believe that I now have 0.2*900=180GB for shuffle and 0.6*900=540GB for
> storage.
>
> I noticed a lot of variation in runtime (under the same load), and tracked
> this down to this function in
> core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala
>   private def spillToPartitionFiles(collection:
> SizeTrackingPairCollection[(Int, K), C]): Unit = {
>     spillToPartitionFiles(collection.iterator)
>   }
> In a slow run, it would loop through this function 12000 times, in a fast
> run only 700 times, even though the settings in both runs are the same and
> there are no other users on the system. When I look at the function calling
> this (insertAll, also in ExternalSorter), I see that spillToPartitionFiles
> is only called 700 times in both fast and slow runs, meaning that the
> function recursively calls itself very often. Because of the function name,
> I assume the system is spilling to disk. As I have sufficient memory, I
> assume that I forgot to set a certain memory setting. Anybody any idea
> which
> other setting I have to set, in order to not spill data in this scenario?
>
> Thanks,
>
> Tom
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spilling-when-not-expected-tp11017.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message