spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Vohra <>
Subject Fw: Re: An interesting and serious problem I encountered
Date Sat, 14 Feb 2015 17:27:36 GMT

    ----- Forwarded Message -----
  From: Deepak Vohra <>
 To: fangyixianghku <>; "" <>

 Sent: Saturday, February 14, 2015 9:15 AM
 Subject: Re: Re: An interesting and serious problem I encountered
One alternative is to use yarn submit and set the spark.yarn.driver.memoryOverhead.Alternatively

   From: fangyixianghku <>
 To: sowen <> 
Cc: "" <>; Ye Xianjin <>

 Sent: Friday, February 13, 2015 8:30 PM
 Subject: Re: Re: An interesting and serious problem I encountered
 #yiv6485096520 BLOCKQUOTE {MARGIN-TOP:0px;MARGIN-BOTTOM:0px;MARGIN-LEFT:2em;}#yiv6485096520
OL {MARGIN-TOP:0px;MARGIN-BOTTOM:0px;}#yiv6485096520 UL {MARGIN-TOP:0px;MARGIN-BOTTOM:0px;}#yiv6485096520
BODY {LINE-HEIGHT:1.5;FONT-FAMILY:宋体;COLOR:#000000;FONT-SIZE:10.5pt;}#yiv6485096520 P
{MARGIN-TOP:0px;MARGIN-BOTTOM:0px;} Dear Sean Owen,Let me first thank you for your helpful
comments sincerely! We guys were working on this issue for more than one month and we really
need some help.The SizeOf.jar computes the memory cost incorreclty. I would like to show you
more details. [Error Messages]Two common erros that I encountered are taken photoed. Note
that if I run the program on a small dataset, there is not any problems.In case that you cannot
see the pictures, you may see them from the attached files. 1. The first picture is as follows.
It seems that the error was promted from the Driver, but I have set 30GB for the driver. I
think it should be enough.  2. The second picture is as follows. This message is very popular
and frequently. It seems that it was network error, but it's not actually, because when propomting
this messages, I still can log in to SL6 using ssh command without any problems. Acutally,
our network quality should be very good, i.e., all the 8 machines are connected in a local
network with 10Gigabit bandwidth.  [Configuration]1. I have 8 machines(SL1~SL8) in total. In the slave file (../spark/spark-1.2.0-bin-hadoop2.4/conf/slaves), I configured as followsSL1SL2SL3SL4SL5SL6SL7SL8 2. In the submit script of my application, the details are as follows.$HOME/spark/spark-1.2.0-bin-hadoop2.4/bin/spark-submit \  --class pj.Test\  --master spark://SL1:7077 \  --driver-memory 30G \  --executor-memory 310G \  --total-executor-cores 256 \  --conf "spark.akka.timeout=3600" \  --conf "spark.scheduler.maxRegisteredResourcesWaitingTime=3600000" \  --conf "spark.akka.frameSize=1000" \  --conf "spark.worker.timeout=3600" \  --conf "spark.default.parallelism=2560" \  --conf "" \  /home/fangyixiang/Desktop/spark/data/fang/pj.jar \  #other parameters are used only for programs 3. My application runs on a standalone mode.   [Executor]As
you suggested, probably there are some configuration problems. My program runs on the standalone
mode. In my view, it would be better for each slave machine to have only 1 executor. This
may reduce the overhead of resource management and network communication. Do you agree with
this ? As you may see from my configuraion, I didn't set the number of executors explicitly.
So the number of executors should be 8. I am wondering whether it is necessary for use Yarn
or some other resource management platform. How do you think about this? We are looking forward
to seeing your further suggestion. Thank you in advance!  Best regards,Yixiang Fang From: Sean
OwenDate: 2015-02-13 18:10To: LandmarkCC: user@spark.apache.orgSubject: Re: An interesting
and serious problem I encounteredA number of comments: 310GB is probably too large for an executor. You probably want manysmaller executors per machine. But this is not your problem. You didn't say where the OutOfMemoryError occurred. Executor or driver? Tuple2 is a Scala type, and a general type. It is appropriate forgeneral pairs. You're asking about optimizing for a primitive array,yes, but of course Spark handles other types. I don't quite understand your test result. An array doesn't changesize because it's referred to in a Tuple2. You are still dealing witha primitive array. There is no general answer to your question. Usually you have toconsider the overhead of Java references, which does mattersignificantly, but there is no constant multiplier of course. It's upto you if it matters to implement more efficient data structures. Herehowever you're using just about the most efficient rep of an array ofintegers. I think you have plenty of memory in general, so the question is whatwas throwing the memory error? I'd also confirm that the configurationyour executors actually used is what you expect to rule out configproblems. On Fri, Feb 13, 2015 at 6:26 AM, Landmark <> wrote:> Hi foks,>> My Spark cluster has 8 machines, each of which has 377GB physical memory,> and thus the total maximum memory can be used for Spark is more than> 2400+GB. In my program, I have to deal with 1 billion of (key, value) pairs,> where the key is an integer and the value is an integer array with 43> elements.  Therefore, the memory cost of this raw dataset is [(1+43) *> 1000000000 * 4] / (1024 * 1024 * 1024) = 164GB.>> Since I have to use this dataset repeatedly, I have to cache it in memory.> Some key parameter settings are:>> spark.driver.memory=30GB> spark.executor.memory=310GB.>> But it failed on running a simple countByKey() and the error message is> "java.lang.OutOfMemoryError: Java heap space...". Does this mean a Spark> cluster of 2400+GB memory cannot keep 164GB raw data in memory?>> The codes of my program is as follows:>> def main(args: Array[String]):Unit = {>     val sc = new SparkContext(new SparkConfig());>>     val rdd = sc.parallelize(0 until 1000000000, 25600).map(i => (i, new> Array[Int](43))).cache();>     println("The number of keys is " + rdd.countByKey());>>     //some other operations following here ...> }>>>>> To figure out the issue, I evaluated the memory cost of key-value pairs and> computed their memory cost using SizeOf.jar. The codes are as follows:>> val arr = new Array[Int](43);> println(SizeOf.humanReadable(SizeOf.deepSizeOf(arr)));>> val tuple = (1, arr.clone);> println(SizeOf.humanReadable(SizeOf.deepSizeOf(tuple)));>> The output is:> 192.0b> 992.0b>>> *Hard to believe, but it is true!! This result means, to store a key-value> pair, Tuple2 needs more than 5+ times memory than the simplest method with> array. Even though it may take 5+ times memory, its size is less than> 1000GB, which is still much less than the total memory size of my cluster,> i.e., 2400+GB. I really do not understand why this happened.*>> BTW, if the number of pairs is 1 million, it works well. If the arr contains> only 1 integer, to store a pair, Tuples needs around 10 times memory.>> So I have some questions:> 1. Why does Spark choose such a poor data structure, Tuple2, for key-value> pairs? Is there any better data structure for storing (key, value)  pairs> with less memory cost ?> 2. Given a dataset with size of M, in general Spark how many times of memory> to handle it?>>> Best,> Landmark>>>>> --> View this message in context:> Sent from the Apache Spark User List mailing list archive at>> ---------------------------------------------------------------------> To unsubscribe, e-mail:> For additional commands, e-mail:>

To unsubscribe, e-mail:
For additional commands, e-mail:


View raw message