spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Ningjun (LNG-NPV)" <ningjun.w...@lexisnexis.com>
Subject RE: How to force parallel processing of RDD using multiple thread
Date Thu, 15 Jan 2015 22:53:40 GMT
Spark Standalone cluster.

My program is running very slow, I suspect it is not doing parallel processing of rdd. How
can I force it to run parallel? Is there anyway to check whether it is processed in parallel?

Regards,

Ningjun Wang
Consulting Software Engineer
LexisNexis
121 Chanlon Road
New Providence, NJ 07974-1541


-----Original Message-----
From: Sean Owen [mailto:sowen@cloudera.com] 
Sent: Thursday, January 15, 2015 4:29 PM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: How to force parallel processing of RDD using multiple thread

What is your cluster manager? For example on YARN you would specify --executor-cores. Read:
http://spark.apache.org/docs/latest/running-on-yarn.html

On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV) <ningjun.wang@lexisnexis.com>
wrote:
> I have a standalone spark cluster with only one node with 4 CPU cores. 
> How can I force spark to do parallel processing of my RDD using 
> multiple threads? For example I can do the following
>
>
>
> Spark-submit  --master local[4]
>
>
>
> However I really want to use the cluster as follow
>
>
>
> Spark-submit  --master spark://10.125.21.15:7070
>
>
>
> In that case, how can I make sure the RDD is processed with multiple 
> threads/cores?
>
>
>
> Thanks
>
> Ningjun
>
>
Mime
View raw message