spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bartek Dobija <>
Subject Re: Spark and Oozie
Date Fri, 19 Jul 2019 07:23:31 GMT
Hi Dennis,

Oozie jobs shouldn't take that long in a well configured cluster. Oozie
allocates it's own resources in Yarn which may require fine tuning. Check
if YARN gives resources to the Oozie job immediately which may be one of
the reasons and change jobs priorities in YARN scheduling configuration.

Alternatively check the Apache Airflow project which is a good alternative
to Oozie.


On Fri, Jul 19, 2019, 09:09 Dennis Suhari <>

> Dear experts,
> I am using Spark for processing data from HDFS (hadoop). These Spark
> application are data pipelines, data wrangling and machine learning
> applications. Thus Spark submits its job using YARN.
> This also works well. For scheduling I am now trying to use Apache Oozie,
> but I am facing performqnce impacts. A Spark job which tooks 44 seconds
> when submitting it via CLI now takes nearly 3 Minutes.
> Have you faced similar experiences in using Oozie for scheduling Spark
> application jobs ? What alternative workflow tools are you using for
> scheduling Spark jobs on Hadoop ?
> Br,
> Dennis
> Von meinem iPhone gesendet
> Von meinem iPhone gesendet
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message