spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chitturi Padma <learnings.chitt...@gmail.com>
Subject Re: Spark work distribution among execs
Date Tue, 15 Mar 2016 14:27:32 GMT
By default spark uses 2 executors with one core each, have you allocated
more executors using the command line args as -
--num-executors 25 --executor-cores x  ???

What do you mean by the difference between the nodes is huge ?

Regards,
Padma Ch

On Tue, Mar 15, 2016 at 6:57 PM, bkapukaranov [via Apache Spark User List] <
ml-node+s1001560n26502h7@n3.nabble.com> wrote:

> Hi,
>
> I'm running a Spark 1.6.0 on YARN on a Hadoop 2.6.0 cluster.
> I observe a very strange issue.
> I run a simple job that reads about 1TB of json logs from a remote HDFS
> cluster and converts them to parquet, then saves them to the local HDFS of
> the Hadoop cluster.
>
> I run it with 25 executors with sufficient resources. However the strange
> thing is that the job only uses 2 executors to do most of the read work.
>
> For example when I go to the Executors' tab in the Spark UI and look at
> the "Input" column, the difference between the nodes is huge, sometimes 20G
> vs 120G.
>
> 1. What is the cause for this behaviour?
> 2. Any ideas how to achieve a more balanced performance?
>
> Thanks,
> Borislav
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h76@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502p26503.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message