spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Spark Job Execution halts during shuffle...
Date Fri, 27 May 2016 03:17:04 GMT
Priya:
Have you checked the executor logs on hostname1 and hostname2 ?

Cheers

On Thu, May 26, 2016 at 8:00 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
wrote:

> Hi,
>
> If you get stuck in job fails, one of best practices is to increase
> #partitions.
> Also, you'd better off using DataFrame instread of RDD in terms of join
> optimization.
>
> // maropu
>
>
> On Thu, May 26, 2016 at 11:40 PM, Priya Ch <learnings.chitturi@gmail.com>
> wrote:
>
>> Hello Team,
>>
>>
>>  I am trying to perform join 2 rdds where one is of size 800 MB and the
>> other is 190 MB. During the join step, my job halts and I don't see
>> progress in the execution.
>>
>> This is the message I see on console -
>>
>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
>> locations for shuffle 0 to <hostname1>:40000
>> INFO spark.MapOutputTrackerMasterEndPoint: Asked to send map output
>> locations for shuffle 1 to <hostname2>:40000
>>
>> After these messages, I dont see any progress. I am using Spark 1.6.0
>> version and yarn scheduler (running in YARN client mode). My cluster
>> configurations is - 3 node cluster (1 master and 2 slaves). Each slave has
>> 1 TB hard disk space, 300GB memory and 32 cores.
>>
>> HDFS block size is 128 MB.
>>
>> Thanks,
>> Padma Ch
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>

Mime
View raw message