spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Suhari <>
Subject Re: Spark and Oozie
Date Mon, 05 Aug 2019 16:03:05 GMT
Hi William,

because it is the only job that is running I don't think it is resource contention. We have
configured capacity scheduler which means using yarn queues. As it is the only job I cant
see that it is waiting somehow in the queue. 



Von meinem iPhone gesendet

> Am 20.07.2019 um 01:48 schrieb William Shen <>:
> Dennis, do you know what’s taking the additional time? Is it the Spark Job, or oozie
waiting for allocation from YARN? Do you have resource contention issue in YARN?
>> On Fri, Jul 19, 2019 at 12:24 AM Bartek Dobija <> wrote:
>> Hi Dennis, 
>> Oozie jobs shouldn't take that long in a well configured cluster. Oozie allocates
it's own resources in Yarn which may require fine tuning. Check if YARN gives resources to
the Oozie job immediately which may be one of the reasons and change jobs priorities in YARN
scheduling configuration.  
>> Alternatively check the Apache Airflow project which is a good alternative to Oozie.

>> Regards,
>> Bartek 
>>> On Fri, Jul 19, 2019, 09:09 Dennis Suhari <>
>>> Dear experts,
>>> I am using Spark for processing data from HDFS (hadoop). These Spark application
are data pipelines, data wrangling and machine learning applications. Thus Spark submits its
job using YARN. 
>>> This also works well. For scheduling I am now trying to use Apache Oozie, but
I am facing performqnce impacts. A Spark job which tooks 44 seconds when submitting it via
CLI now takes nearly 3 Minutes.
>>> Have you faced similar experiences in using Oozie for scheduling Spark application
jobs ? What alternative workflow tools are you using for scheduling Spark jobs on Hadoop ?
>>> Br,
>>> Dennis
>>> Von meinem iPhone gesendet
>>> Von meinem iPhone gesendet
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail:

View raw message