spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Spark on YARN not utilizing all the YARN containers available
Date Tue, 09 Oct 2018 21:54:06 GMT
Hi Dillon,

I do think that there is a setting available where in once YARN sets up the
containers then you do not deallocate them, I had used it previously in
HIVE, and it just saves processing time in terms of allocating containers.
That said I am still trying to understand how do we determine one YARN
container = one executor in SPARK.

Regards,
Gourav

On Tue, Oct 9, 2018 at 9:04 PM Dillon Dukek <dillon.dukek@placed.com.invalid>
wrote:

> I'm still not sure exactly what you are meaning by saying that you have 6
> yarn containers. Yarn should just be aware of the total available resources
> in  your cluster and then be able to launch containers based on the
> executor requirements you set when you submit your job. If you can, I think
> it would be helpful to send me the command you're using to launch your
> spark process. You should also be able to use the logs and/or the spark UI
> to determine how many executors are running.
>
> On Tue, Oct 9, 2018 at 12:57 PM Gourav Sengupta <gourav.sengupta@gmail.com>
> wrote:
>
>> hi,
>>
>> may be I am not quite clear in my head on this one. But how do we know
>> that 1 yarn container = 1 executor?
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek
>> <dillon.dukek@placed.com.invalid> wrote:
>>
>>> Can you send how you are launching your streaming process? Also what
>>> environment is this cluster running in (EMR, GCP, self managed, etc)?
>>>
>>> On Tue, Oct 9, 2018 at 10:21 AM kant kodali <kanth909@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am using Spark 2.3.1 and using YARN as a cluster manager.
>>>>
>>>> I currently got
>>>>
>>>> 1) 6 YARN containers(executors=6) with 4 executor cores for each
>>>> container.
>>>> 2) 6 Kafka partitions from one topic.
>>>> 3) You can assume every other configuration is set to whatever the
>>>> default values are.
>>>>
>>>> Spawned a Simple Streaming Query and I see all the tasks get scheduled
>>>> on one YARN container. am I missing any config?
>>>>
>>>> Thanks!
>>>>
>>>

Mime
View raw message