spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nan Zhu <zhunanmcg...@gmail.com>
Subject Re: long GC pause during file.cache()
Date Sun, 15 Jun 2014 22:50:35 GMT
Yes, I think in the spark-env.sh.template, it is listed in the comments (didn’t check….)
 

Best,  

--  
Nan Zhu


On Sunday, June 15, 2014 at 5:21 PM, Surendranauth Hiraman wrote:

> Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0?
>  
>  
>  
> On Sun, Jun 15, 2014 at 4:59 PM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)>
wrote:
> > SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you don’t mind the
WARNING in the logs
> >  
> > you can set spark.executor.extraJavaOpts in your SparkConf obj  
> >  
> > Best,
> >  
> > --  
> > Nan Zhu
> >  
> >  
> > On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote:
> >  
> > > Hi, Wei
> > >  
> > > You may try to set JVM opts in spark-env.sh (http://spark-env.sh) as follow
to prevent or mitigate GC pause:  
> > >  
> > > export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC -Xmx2g
-XX:MaxPermSize=256m"
> > >  
> > > There are more options you could add, please just Google :)  
> > >  
> > >  
> > > Regards,
> > > Wang Hao(王灏)
> > >  
> > > CloudTeam | School of Software Engineering
> > > Shanghai Jiao Tong University
> > > Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
> > > Email:wh.sjtu@gmail.com (mailto:wh.sjtu@gmail.com)
> > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > On Sun, Jun 15, 2014 at 10:24 AM, Wei Tan <wtan@us.ibm.com (mailto:wtan@us.ibm.com)>
wrote:
> > > > Hi,  
> > > >  
> > > >   I have a single node (192G RAM) stand-alone spark, with memory configuration
like this in spark-env.sh (http://spark-env.sh)  
> > > >  
> > > > SPARK_WORKER_MEMORY=180g  
> > > > SPARK_MEM=180g  
> > > >  
> > > >  
> > > >  In spark-shell I have a program like this:  
> > > >  
> > > > val file = sc.textFile("/localpath") //file size is 40G  
> > > > file.cache()  
> > > >  
> > > >  
> > > > val output = file.map(line => extract something from line)  
> > > >  
> > > > output.saveAsTextFile (...)  
> > > >  
> > > >  
> > > > When I run this program again and again, or keep trying file.unpersist()
--> file.cache() --> output.saveAsTextFile(), the run time varies a lot, from 1 min
to 3 min to 50+ min. Whenever the run-time is more than 1 min, from the stage monitoring GUI
I observe big GC pause (some can be 10+ min). Of course when run-time is "normal", say ~1
min, no significant GC is observed. The behavior seems somewhat random.  
> > > >  
> > > > Is there any JVM tuning I should do to prevent this long GC pause from
happening?  
> > > >  
> > > >  
> > > >  
> > > > I used java-1.6.0-openjdk.x86_64, and my spark-shell process is something
like this:  
> > > >  
> > > > root     10994  1.7  0.6 196378000 1361496 pts/51 Sl+ 22:06   0:12 /usr/lib/jvm/java-1.6.0-openjdk.x86_64/bin/java
-cp ::/home/wtan/scala/spark-1.0.0-bin-hadoop1/conf:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-core-3.2.2.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-rdbms-3.2.1.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar
-XX:MaxPermSize=128m -Djava.library.path= -Xms180g -Xmx180g org.apache.spark.deploy.SparkSubmit
spark-shell --class org.apache.spark.repl.Main  
> > > >  
> > > > Best regards,  
> > > > Wei  
> > > >  
> > > > ---------------------------------  
> > > > Wei Tan, PhD  
> > > > Research Staff Member  
> > > > IBM T. J. Watson Research Center  
> > > > http://researcher.ibm.com/person/us-wtan
> >  
>  
>  
>  
> --  
>                                                              SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>  
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v (mailto:suren.hiraman@sociocast.com)elos.io (http://elos.io)
> W: www.velos.io (http://www.velos.io/)
>  


Mime
View raw message