spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Klos <michal.klo...@gmail.com>
Subject Re: Tools to manage workflows on Spark
Date Mon, 02 Mar 2015 14:27:35 GMT
Piggy-backing on the thread a little --

Does anyone out there use luigi to manage spark workflows?

I see they recently added spark support


On Sun, Mar 1, 2015 at 10:20 PM, Qiang Cao <caoqiang.cs@gmail.com> wrote:

> Thanks, Himanish and Felix!
>
> On Sun, Mar 1, 2015 at 7:50 PM, Himanish Kushary <himanish@gmail.com>
> wrote:
>
>> We are running our Spark jobs on Amazon AWS and are using AWS
>> Datapipeline for orchestration of the different spark jobs. AWS
>> datapipeline provides automatic EMR cluster provisioning, retry on
>> failure,SNS notification etc. out of the box and works well for us.
>>
>>
>>
>>
>>
>> On Sun, Mar 1, 2015 at 7:02 PM, Felix C <felixcheung_m@hotmail.com>
>> wrote:
>>
>>>  We use Oozie as well, and it has worked well.
>>> The catch is each action in Oozie is separate and one cannot retain
>>> SparkContext or RDD, or leverage caching or temp table, going into another
>>> Oozie action. You could either save output to file or put all Spark
>>> processing into one Oozie action.
>>>
>>> --- Original Message ---
>>>
>>> From: "Mayur Rustagi" <mayur.rustagi@gmail.com>
>>> Sent: February 28, 2015 7:07 PM
>>> To: "Qiang Cao" <caoqiang.cs@gmail.com>
>>> Cc: "Ted Yu" <yuzhihong@gmail.com>, "Ashish Nigam" <
>>> ashnigamtech@gmail.com>, "user" <user@spark.apache.org>
>>> Subject: Re: Tools to manage workflows on Spark
>>>
>>>  Sorry not really. Spork is a way to migrate your existing pig scripts
>>> to Spark or write new pig jobs then can execute on spark.
>>> For orchestration you are better off using Oozie especially if you are
>>> using other execution engines/systems besides spark.
>>>
>>>
>>>     Regards,
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoid.com <http://www.sigmoidanalytics.com/>
>>> @mayur_rustagi <http://www.twitter.com/mayur_rustagi>
>>>
>>> On Sat, Feb 28, 2015 at 6:59 PM, Qiang Cao <caoqiang.cs@gmail.com>
>>> wrote:
>>>
>>> Thanks Mayur! I'm looking for something that would allow me to easily
>>> describe and manage a workflow on Spark. A workflow in my context is a
>>> composition of Spark applications that may depend on one another based on
>>> hdfs inputs/outputs. Is Spork a good fit? The orchestration I want is on
>>> app level.
>>>
>>>
>>>
>>> On Sat, Feb 28, 2015 at 9:38 PM, Mayur Rustagi <mayur.rustagi@gmail.com>
>>> wrote:
>>>
>>> We do maintain it but in apache repo itself. However Pig cannot do
>>> orchestration for you. I am not sure what you are looking at from Pig in
>>> this context.
>>>
>>>     Regards,
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoid.com <http://www.sigmoidanalytics.com/>
>>>  @mayur_rustagi <http://www.twitter.com/mayur_rustagi>
>>>
>>> On Sat, Feb 28, 2015 at 6:36 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>> Here was latest modification in spork repo:
>>> Mon Dec 1 10:08:19 2014
>>>
>>>  Not sure if it is being actively maintained.
>>>
>>> On Sat, Feb 28, 2015 at 6:26 PM, Qiang Cao <caoqiang.cs@gmail.com>
>>> wrote:
>>>
>>> Thanks for the pointer, Ashish! I was also looking at Spork
>>> https://github.com/sigmoidanalytics/spork Pig-on-Spark), but wasn't
>>> sure if that's the right direction.
>>>
>>> On Sat, Feb 28, 2015 at 6:36 PM, Ashish Nigam <ashnigamtech@gmail.com>
>>> wrote:
>>>
>>> You have to call spark-submit from oozie.
>>> I used this link to get the idea for my implementation -
>>>
>>>
>>> http://mail-archives.apache.org/mod_mbox/oozie-user/201404.mbox/%3CCAHCsPn-0Grq1rSXrAZu35yy_i4T=FvoVDOX2uGpCUHkWMjPQNQ@mail.gmail.com%3E
>>>
>>>
>>>
>>>  On Feb 28, 2015, at 3:25 PM, Qiang Cao <caoqiang.cs@gmail.com> wrote:
>>>
>>>  Thanks, Ashish! Is Oozie integrated with Spark? I knew it can
>>> accommodate some Hadoop jobs.
>>>
>>>
>>> On Sat, Feb 28, 2015 at 6:07 PM, Ashish Nigam <ashnigamtech@gmail.com>
>>> wrote:
>>>
>>> Qiang,
>>> Did you look at Oozie?
>>> We use oozie to run spark jobs in production.
>>>
>>>
>>>  On Feb 28, 2015, at 2:45 PM, Qiang Cao <caoqiang.cs@gmail.com> wrote:
>>>
>>>  Hi Everyone,
>>>
>>>  We need to deal with workflows on Spark. In our scenario, each
>>> workflow consists of multiple processing steps. Among different steps,
>>> there could be dependencies.  I'm wondering if there are tools
>>> available that can help us schedule and manage workflows on Spark. I'm
>>> looking for something like pig on Hadoop, but it should fully function on
>>> Spark.
>>>
>>>  Any suggestion?
>>>
>>>  Thanks in advance!
>>>
>>>  Qiang
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards
>> Himanish
>>
>
>

Mime
View raw message