spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mail.com" <pradeep.mi...@mail.com>
Subject Re: Num of executors and cores
Date Tue, 26 Jul 2016 13:56:32 GMT
Hi,

In spark submit, I specify --master yarn-client.
When I go to executors in UI I do see all the 12 different executors assigned. But for the
stage when I drill down to Tasks I saw only 8 tasks with index 0-7.

I ran again increasing the number of executors as 15 and I now see 12 tasks for the stage.

Still like to understand even if 12 executors were available why there was only 8 tasks for
the stage. 

Thanks,
Pradeep



> On Jul 26, 2016, at 8:46 AM, Jacek Laskowski <jacek@japila.pl> wrote:
> 
> Hi,
> 
> Where's this yarn-client mode specified? When you said "However, when
> I run the job I see that the stage which reads the directory has only
> 8 tasks." -- how do you see 8 tasks for a stage? It appears you're in
> local[*] mode on a 8-core machine (like me) and that's why I'm asking
> such basic questions.
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
> 
> 
>> On Tue, Jul 26, 2016 at 2:39 PM, Mail.com <pradeep.misra@mail.com> wrote:
>> More of jars and files and app name. It runs on yarn-client mode.
>> 
>> Thanks,
>> Pradeep
>> 
>>> On Jul 26, 2016, at 7:10 AM, Jacek Laskowski <jacek@japila.pl> wrote:
>>> 
>>> Hi,
>>> 
>>> What's "<all other stuff>"? What master URL do you use?
>>> 
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>> 
>>> 
>>>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pradeep.misra@mail.com>
wrote:
>>>> Hi All,
>>>> 
>>>> I have a directory which has 12 files. I want to read the entire file so
I am reading it as wholeTextFiles(dirpath, numPartitions).
>>>> 
>>>> I run spark-submit as <all other stuff> --num-executors 12 --executor-cores
1 and numPartitions 12.
>>>> 
>>>> However, when I run the job I see that the stage which reads the directory
has only 8 tasks. So some task reads more than one file and takes twice the time.
>>>> 
>>>> What can I do that the files are read by 12 tasks  I.e one file per task.
>>>> 
>>>> Thanks,
>>>> Pradeep
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message