Yes, I think in the spark-env.sh.template, it is listed in the comments (didn’t check….)Best,--Nan Zhu
On Sunday, June 15, 2014 at 5:21 PM, Surendranauth Hiraman wrote:Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0?
On Sun, Jun 15, 2014 at 4:59 PM, Nan Zhu <firstname.lastname@example.org> wrote:SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you don’t mind the WARNING in the logsyou can set spark.executor.extraJavaOpts in your SparkConf objBest,--Nan Zhu
On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote:Hi, WeiYou may try to set JVM opts in spark-env.sh as follow to prevent or mitigate GC pause:export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC -Xmx2g -XX:MaxPermSize=256m"
There are more options you could add, please just Google :)Regards,Wang Hao(王灏)CloudTeam | School of Software EngineeringShanghai Jiao Tong UniversityAddress:800 Dongchuan Road, Minhang District, Shanghai, 200240Email:email@example.comOn Sun, Jun 15, 2014 at 10:24 AM, Wei Tan <firstname.lastname@example.org> wrote:Hi,
I have a single node (192G RAM) stand-alone spark, with memory configuration like this in spark-env.sh
In spark-shell I have a program like this:
val file = sc.textFile("/localpath") //file size is 40G
val output = file.map(line => extract something from line)
When I run this program again and again, or keep trying file.unpersist() --> file.cache() --> output.saveAsTextFile(), the run time varies a lot, from 1 min to 3 min to 50+ min. Whenever the run-time is more than 1 min, from the stage monitoring GUI I observe big GC pause (some can be 10+ min). Of course when run-time is "normal", say ~1 min, no significant GC is observed. The behavior seems somewhat random.
Is there any JVM tuning I should do to prevent this long GC pause from happening?
I used java-1.6.0-openjdk.x86_64, and my spark-shell process is something like this:
root 10994 1.7 0.6 196378000 1361496 pts/51 Sl+ 22:06 0:12 /usr/lib/jvm/java-1.6.0-openjdk.x86_64/bin/java -cp ::/home/wtan/scala/spark-1.0.0-bin-hadoop1/conf:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-core-3.2.2.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-rdbms-3.2.1.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar -XX:MaxPermSize=128m -Djava.library.path= -Xms180g -Xmx180g org.apache.spark.deploy.SparkSubmit spark-shell --class org.apache.spark.repl.Main
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center