spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikram Kone <>
Subject Re: Spark job workflow engine recommendations
Date Fri, 07 Aug 2015 16:01:31 GMT
Thanks for the suggestion Hien. I'm curious why not azkaban from linkedin.
>From what I read online Oozie was very cumbersome to setup and use compared
to azkaban. Since you are from linkedin wanted to get some perspective on
what it lacks compared to Oozie. Ease of use is very important more than
full feature set

On Friday, August 7, 2015, Hien Luu <> wrote:

> Looks like Oozie can satisfy most of your requirements.
> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <
> <javascript:_e(%7B%7D,'cvml','');>> wrote:
>> Hi,
>> I'm looking for open source workflow tools/engines that allow us to
>> schedule spark jobs on a datastax cassandra cluster. Since there are tonnes
>> of alternatives out there like Ozzie, Azkaban, Luigi , Chronos etc, I
>> wanted to check with people here to see what they are using today.
>> Some of the requirements of the workflow engine that I'm looking for are
>> 1. First class support for submitting Spark jobs on Cassandra. Not some
>> wrapper Java code to submit tasks.
>> 2. Active open source community support and well tested at production
>> scale.
>> 3. Should be dead easy to write job dependencices using XML or web
>> interface . Ex; job A depends on Job B and Job C, so run Job A after B and
>> C are finished. Don't need to write full blown java applications to specify
>> job parameters and dependencies. Should be very simple to use.
>> 4. Time based  recurrent scheduling. Run the spark jobs at a given time
>> every hour or day or week or month.
>> 5. Job monitoring, alerting on failures and email notifications on daily
>> basis.
>> I have looked at Ooyala's spark job server which seems to be hated
>> towards making spark jobs run faster by sharing contexts between the jobs
>> but isn't a full blown workflow engine per se. A combination of spark job
>> server and workflow engine would be ideal
>> Thanks for the inputs

View raw message