spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kartik Mathur <kar...@bluedata.com>
Subject Re: Problem understanding spark word count execution
Date Thu, 01 Oct 2015 19:25:40 GMT
Thanks Nicolae ,
So In my case all executers are sending results back to the driver and and "
*shuffle* *is just sending out the textFile to distribute the
partitions", *could
you please elaborate on this  ? what exactly is in this file ?

On Wed, Sep 30, 2015 at 9:57 PM, Nicolae Marasoiu <
nicolae.marasoiu@adswizz.com> wrote:

>
> Hi,
>
> 2- the end results are sent back to the driver; the shuffles are
> transmission of intermediate results between nodes such as the -> which are
> all intermediate transformations.
>
> More precisely, since flatMap and map are narrow dependencies, meaning
> they can usually happen on the local node, I bet shuffle is just sending
> out the textFile to a few nodes to distribute the partitions.
>
>
> ------------------------------
> *From:* Kartik Mathur <kartik@bluedata.com>
> *Sent:* Thursday, October 1, 2015 12:42 AM
> *To:* user
> *Subject:* Problem understanding spark word count execution
>
> Hi All,
>
> I tried running spark word count and I have couple of questions -
>
> I am analyzing stage 0 , i.e
>  *sc.textFile -> flatMap -> Map (Word count example)*
>
> 1) In the *Stage logs* under Application UI details for every task I am
> seeing Shuffle write as 2.7 KB, *question - how can I know where all did
> this task write ? like how many bytes to which executer ?*
>
> 2) In the executer's log when I look for same task it says 2000 bytes of
> result is sent to driver , my question is , *if the results were directly
> sent to driver what is this shuffle write ? *
>
> Thanks,
> Kartik
>

Mime
View raw message