spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Tan <w...@us.ibm.com>
Subject Re: long GC pause during file.cache()
Date Mon, 16 Jun 2014 14:55:36 GMT
BTW: nowadays a single machine with huge RAM (200G to 1T) is really 
common. With virtualization you lose some performance. It would be ideal 
to see some "best practice" on how to use Spark in these state-of-art 
machines...

Best regards,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan



From:   Wei Tan/Watson/IBM@IBMUS
To:     user@spark.apache.org, 
Date:   06/16/2014 10:47 AM
Subject:        Re: long GC pause during file.cache()



Thanks you all for advice including (1) using CMS GC (2) use multiple 
worker instance and (3) use Tachyon. 

I will try (1) and (2) first and report back what I found. 

I will also try JDK 7 with G1 GC. 

Best regards, 
Wei 

--------------------------------- 
Wei Tan, PhD 
Research Staff Member 
IBM T. J. Watson Research Center 
http://researcher.ibm.com/person/us-wtan 



From:        Aaron Davidson <ilikerps@gmail.com> 
To:        user@spark.apache.org, 
Date:        06/15/2014 09:06 PM 
Subject:        Re: long GC pause during file.cache() 



Note also that Java does not work well with very large JVMs due to this 
exact issue. There are two commonly used workarounds: 

1) Spawn multiple (smaller) executors on the same machine. This can be 
done by creating multiple Workers (via SPARK_WORKER_INSTANCES in 
standalone mode[1]). 
2) Use Tachyon for off-heap caching of RDDs, allowing Spark executors to 
be smaller and avoid GC pauses 

[1] See standalone documentation here: 
http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts 



On Sun, Jun 15, 2014 at 3:50 PM, Nan Zhu <zhunanmcgill@gmail.com> wrote: 
Yes, I think in the spark-env.sh.template, it is listed in the comments 
(didn’t check….) 

Best, 

--  
Nan Zhu 
On Sunday, June 15, 2014 at 5:21 PM, Surendranauth Hiraman wrote: 
Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0? 



On Sun, Jun 15, 2014 at 4:59 PM, Nan Zhu <zhunanmcgill@gmail.com> wrote: 
SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you don’t 
mind the WARNING in the logs 

you can set spark.executor.extraJavaOpts in your SparkConf obj 

Best, 

-- 
Nan Zhu 
On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote: 
Hi, Wei 

You may try to set JVM opts in spark-env.sh as follow to prevent or 
mitigate GC pause: 

export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC 
-Xmx2g -XX:MaxPermSize=256m" 

There are more options you could add, please just Google :) 


Regards, 
Wang Hao(王灏) 

CloudTeam | School of Software Engineering 
Shanghai Jiao Tong University 
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240 
Email:wh.sjtu@gmail.com 


On Sun, Jun 15, 2014 at 10:24 AM, Wei Tan <wtan@us.ibm.com> wrote: 
Hi, 

  I have a single node (192G RAM) stand-alone spark, with memory 
configuration like this in spark-env.sh 

SPARK_WORKER_MEMORY=180g 
SPARK_MEM=180g 


 In spark-shell I have a program like this: 

val file = sc.textFile("/localpath") //file size is 40G 
file.cache() 


val output = file.map(line => extract something from line) 

output.saveAsTextFile (...) 


When I run this program again and again, or keep trying file.unpersist() 
--> file.cache() --> output.saveAsTextFile(), the run time varies a lot, 
from 1 min to 3 min to 50+ min. Whenever the run-time is more than 1 min, 
from the stage monitoring GUI I observe big GC pause (some can be 10+ 
min). Of course when run-time is "normal", say ~1 min, no significant GC 
is observed. The behavior seems somewhat random. 

Is there any JVM tuning I should do to prevent this long GC pause from 
happening? 



I used java-1.6.0-openjdk.x86_64, and my spark-shell process is something 
like this: 

root     10994  1.7  0.6 196378000 1361496 pts/51 Sl+ 22:06   0:12 
/usr/lib/jvm/java-1.6.0-openjdk.x86_64/bin/java -cp 
::/home/wtan/scala/spark-1.0.0-bin-hadoop1/conf:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-core-3.2.2.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-rdbms-3.2.1.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar

-XX:MaxPermSize=128m -Djava.library.path= -Xms180g -Xmx180g 
org.apache.spark.deploy.SparkSubmit spark-shell --class 
org.apache.spark.repl.Main 

Best regards, 
Wei 

--------------------------------- 
Wei Tan, PhD 
Research Staff Member 
IBM T. J. Watson Research Center 
http://researcher.ibm.com/person/us-wtan 





-- 
                                                             
SUREN HIRAMAN, VP TECHNOLOGY
Velos 
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105 
F: 646.349.4063
E: suren.hiraman@velos.io
W: www.velos.io




Mime
View raw message