spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonathan Perez <yonathan0...@gmail.com>
Subject OutOfMemoryError when loading input file
Date Sun, 02 Mar 2014 06:29:51 GMT
Hello,

I'm trying to run a simple test program that loads a large file (~12.4GB)
into memory of a single many-core machine.
The machine I'm using has more than enough memory (1TB RAM) and 64 cores
(of which I want to use 16 for worker threads).
Even though I set both the executor memory (spark.executor.memory) to 200GB
in SparkContext and set the JMV memory to 200GB (-Xmx200g) in spark-env.sh,
I keep getting errors when trying to load input:
"java.lang.OutOfMemoryError: GC overhead limit exceeded".
I believe that the memory configuration parameters I pass do not stick, as
I get the following message when running:
"14/03/01 22:09:31 INFO storage.MemoryStore: MemoryStore started with
capacity 883.2 MB."
Obviously I'm missing something when configuring Spark, but I can't figure
out what, and I'd appreciate your help.

The test program I'm running (not through shell, but as a standalone scala
app):

import org.apache.spark._
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

object LoadBenchmark {
  def main(args: Array[String]) {
  val conf = new SparkConf().setMaster("local[16]").setAppName("Load
Benchmark").set("spark.executor.memory", "200g")
    val sc = new SparkContext(conf)
    println("LOADING INPUT FILE")
    val edges =
sc.textFile("/lfs/madmax/0/yonathan/half_twitter_rv.txt").cache()
    val cnt = edges.count()
    println("edge count: "+ cnt)
  }
}

The contents of the spark-env.sh file:

#     Examples of app-wide options : -Dspark.serializer
SPARK_JAVA_OPTS+="-Xms200g -Xmx200g -XX:-UseGCOverheadLimit"
export SPARK_JAVA_OPTS
# If using the standalone deploy mode, you can also set variables for it
here:
# - SPARK_MASTER_IP, to bind the master to a different IP address or
hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
SPARK_WORKER_CORES=16
export SPARK_WORKER_CORES
# - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g)
SPARK_WORKER_MEMORY=200g
export SPARK_WORKER_MEMORY
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT
# - SPARK_WORKER_INSTANCES, to set the number of worker processes per node
# - SPARK_WORKER_DIR, to set the working directory of worker processes

Thank you!

Mime
View raw message