spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qiang Cao <caoqiang...@gmail.com>
Subject Re: Tools to manage workflows on Spark
Date Mon, 02 Mar 2015 03:20:25 GMT
Thanks, Himanish and Felix!

On Sun, Mar 1, 2015 at 7:50 PM, Himanish Kushary <himanish@gmail.com> wrote:

> We are running our Spark jobs on Amazon AWS and are using AWS Datapipeline
> for orchestration of the different spark jobs. AWS datapipeline provides
> automatic EMR cluster provisioning, retry on failure,SNS notification etc.
> out of the box and works well for us.
>
>
>
>
>
> On Sun, Mar 1, 2015 at 7:02 PM, Felix C <felixcheung_m@hotmail.com> wrote:
>
>>  We use Oozie as well, and it has worked well.
>> The catch is each action in Oozie is separate and one cannot retain
>> SparkContext or RDD, or leverage caching or temp table, going into another
>> Oozie action. You could either save output to file or put all Spark
>> processing into one Oozie action.
>>
>> --- Original Message ---
>>
>> From: "Mayur Rustagi" <mayur.rustagi@gmail.com>
>> Sent: February 28, 2015 7:07 PM
>> To: "Qiang Cao" <caoqiang.cs@gmail.com>
>> Cc: "Ted Yu" <yuzhihong@gmail.com>, "Ashish Nigam" <
>> ashnigamtech@gmail.com>, "user" <user@spark.apache.org>
>> Subject: Re: Tools to manage workflows on Spark
>>
>>  Sorry not really. Spork is a way to migrate your existing pig scripts
>> to Spark or write new pig jobs then can execute on spark.
>> For orchestration you are better off using Oozie especially if you are
>> using other execution engines/systems besides spark.
>>
>>
>>     Regards,
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoid.com <http://www.sigmoidanalytics.com/>
>> @mayur_rustagi <http://www.twitter.com/mayur_rustagi>
>>
>> On Sat, Feb 28, 2015 at 6:59 PM, Qiang Cao <caoqiang.cs@gmail.com> wrote:
>>
>> Thanks Mayur! I'm looking for something that would allow me to easily
>> describe and manage a workflow on Spark. A workflow in my context is a
>> composition of Spark applications that may depend on one another based on
>> hdfs inputs/outputs. Is Spork a good fit? The orchestration I want is on
>> app level.
>>
>>
>>
>> On Sat, Feb 28, 2015 at 9:38 PM, Mayur Rustagi <mayur.rustagi@gmail.com>
>> wrote:
>>
>> We do maintain it but in apache repo itself. However Pig cannot do
>> orchestration for you. I am not sure what you are looking at from Pig in
>> this context.
>>
>>     Regards,
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoid.com <http://www.sigmoidanalytics.com/>
>>  @mayur_rustagi <http://www.twitter.com/mayur_rustagi>
>>
>> On Sat, Feb 28, 2015 at 6:36 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> Here was latest modification in spork repo:
>> Mon Dec 1 10:08:19 2014
>>
>>  Not sure if it is being actively maintained.
>>
>> On Sat, Feb 28, 2015 at 6:26 PM, Qiang Cao <caoqiang.cs@gmail.com> wrote:
>>
>> Thanks for the pointer, Ashish! I was also looking at Spork
>> https://github.com/sigmoidanalytics/spork Pig-on-Spark), but wasn't sure
>> if that's the right direction.
>>
>> On Sat, Feb 28, 2015 at 6:36 PM, Ashish Nigam <ashnigamtech@gmail.com>
>> wrote:
>>
>> You have to call spark-submit from oozie.
>> I used this link to get the idea for my implementation -
>>
>>
>> http://mail-archives.apache.org/mod_mbox/oozie-user/201404.mbox/%3CCAHCsPn-0Grq1rSXrAZu35yy_i4T=FvoVDOX2uGpCUHkWMjPQNQ@mail.gmail.com%3E
>>
>>
>>
>>  On Feb 28, 2015, at 3:25 PM, Qiang Cao <caoqiang.cs@gmail.com> wrote:
>>
>>  Thanks, Ashish! Is Oozie integrated with Spark? I knew it can
>> accommodate some Hadoop jobs.
>>
>>
>> On Sat, Feb 28, 2015 at 6:07 PM, Ashish Nigam <ashnigamtech@gmail.com>
>> wrote:
>>
>> Qiang,
>> Did you look at Oozie?
>> We use oozie to run spark jobs in production.
>>
>>
>>  On Feb 28, 2015, at 2:45 PM, Qiang Cao <caoqiang.cs@gmail.com> wrote:
>>
>>  Hi Everyone,
>>
>>  We need to deal with workflows on Spark. In our scenario, each workflow
>> consists of multiple processing steps. Among different steps, there could
>> be dependencies.  I'm wondering if there are tools available that can
>> help us schedule and manage workflows on Spark. I'm looking for something
>> like pig on Hadoop, but it should fully function on Spark.
>>
>>  Any suggestion?
>>
>>  Thanks in advance!
>>
>>  Qiang
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Thanks & Regards
> Himanish
>

Mime
View raw message