From user-return-35690-apmail-spark-user-archive=spark.apache.org@spark.apache.org Wed Jun 17 08:36:29 2015 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AABA118F51 for ; Wed, 17 Jun 2015 08:36:29 +0000 (UTC) Received: (qmail 25722 invoked by uid 500); 17 Jun 2015 08:36:25 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 25620 invoked by uid 500); 17 Jun 2015 08:36:25 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 25610 invoked by uid 99); 17 Jun 2015 08:36:25 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Jun 2015 08:36:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 475931A5BEC for ; Wed, 17 Jun 2015 08:36:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.in Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id DaSVQPvnH68w for ; Wed, 17 Jun 2015 08:36:12 +0000 (UTC) Received: from nm41-vm9.bullet.mail.ne1.yahoo.com (nm41-vm9.bullet.mail.ne1.yahoo.com [98.138.120.213]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 6A20F43C92 for ; Wed, 17 Jun 2015 08:36:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.in; s=s2048; t=1434530165; bh=ufK5zfK86ZeussjG7q3cJO8FmR8per7WBRqM7KWhhAk=; h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:From:Subject; b=GqjvmgwCzMFzttGy3v80Y2pbe9JjFryqFbQujUWS4p7DkWZwsyHqalyHpuzaidnjijei2vAOu+c/2pVcPHdR9z1tbFyvOumsnqwxzXLYQQvmYQGQG8ltkmhFO+qJnN0B2S5RGPT/Pf+7WtxdVom6KvMBnZfc4nG4S3nr6h+UqQM5GW7MuDZARhi1LvpSnfSqlS0ZbEnEvJ+H6qMzyJBfLf6pxBwLFDQpdDDtok5TTJpgRcd4rctjowuFia+E+4tKjUVDFkwvFguyMu2RzwWnhYkGGCvhk5d94uijb0tMAychMu7syJB4u+G6zOLDfVl98aNm1tKqJjHMaUQSSlxkSg== Received: from [127.0.0.1] by nm41.bullet.mail.ne1.yahoo.com with NNFMP; 17 Jun 2015 08:36:05 -0000 Received: from [98.138.101.131] by nm41.bullet.mail.ne1.yahoo.com with NNFMP; 17 Jun 2015 08:33:24 -0000 Received: from [106.10.166.60] by tm19.bullet.mail.ne1.yahoo.com with NNFMP; 17 Jun 2015 08:33:24 -0000 Received: from [106.10.150.28] by tm17.bullet.mail.sg3.yahoo.com with NNFMP; 17 Jun 2015 08:33:23 -0000 Received: from [127.0.0.1] by omp1029.mail.sg3.yahoo.com with NNFMP; 17 Jun 2015 08:33:23 -0000 X-Yahoo-Newman-Property: ymail-4 X-Yahoo-Newman-Id: 682641.91029.bm@omp1029.mail.sg3.yahoo.com X-YMail-OSG: OKe0rbwVM1nsdH0sIN5BifeAFIon0HFXC29_yBp5MFTGyWCMqY4BllUcU0o_CwW MeStoiM_P3G6GpWpZnfI5czhnyt1Kozs.cecLuOA47yCvhsOtH08_tFAYeAE8T_xUCJnytVLHNZt 1a49aGlSq.jxpM3Otm8hjApQWMpN4R7zCLhiov9SauiIZxpsKARJLmot30WihZQoRizlSHXlLofj xZKcNzmTsRjgY_Sgt5Wlcum4KKHVA.7Nzw_s679UNUspuzu2AfD4RizO7SUZy8R6N2TJPM468e70 s1MZUfZz_B3CT3GVwsBxCbqk4TtwyMI3vFqhYovwFdvEqRLXaemFzIfmVOkSQH2KoaZaYCaKPj1N Qb3UPJp2sf5lmB19bmT5_AaWNIkShzmZGRv_CwBbEJ8_ShtruJ1G8IrEOpzoaGEdHhgv4pb7t_0A VxyC09WJLngXk9KHPXmgxgkUrKtqJZuabiFxyTuFA_DLvrxW5BrNT9pjSvZJUFGYJUou7Di2e Received: by 106.10.196.95; Wed, 17 Jun 2015 08:33:23 +0000 Date: Wed, 17 Jun 2015 08:33:20 +0000 (UTC) From: Spark Enthusiast Reply-To: Spark Enthusiast To: Enno Shioji , Sabarish Sasidharan Cc: Sateesh Kavuri , "asoni.learn@gmail.com" , Will Briggs , ayan guha , user Message-ID: <632987870.264016.1434530000744.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: References: Subject: Re: Spark or Storm MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_264015_361895316.1434530000728" ------=_Part_264015_361895316.1434530000728 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable When you say Storm, did you mean Storm with Trident or Storm? My use case does not have simple transformation. There are complex events t= hat need to be generated by joining the incoming event stream. Also, what do you mean by "No Back PRessure" ? =20 On Wednesday, 17 June 2015 11:57 AM, Enno Shioji w= rote: =20 We've evaluated Spark Streaming vs. Storm and ended up sticking with Storm= . Some of the important draw backs are: Spark has no back pressure (receiver rate limit can alleviate this to a cer= tain point, but it's far from ideal)There is also no exactly-once=C2=A0sema= ntics. (updateStateByKey can achieve this semantics, but is not practical i= f you have any significant amount of state because it does so by dumping th= e entire state on every checkpointing) There are also some minor drawbacks that I'm sure will be fixed quickly, li= ke no task timeout, not being able to read from Kafka using multiple nodes,= data loss hazard with Kafka. It's also not possible to attain very low latency in Spark, if that's what = you need. The pos for Spark is the concise and IMO more intuitive syntax, especially = if you compare it with Storm's Java API. I admit I might be a bit biased towards Storm tho as I'm more familiar with= it. Also, you can do some processing with Kinesis. If all you need to do is str= aight forward transformation and you are reading from Kinesis to begin with= , it might be an easier option to just do the transformation in Kinesis. On Wed, Jun 17, 2015 at 7:15 AM, Sabarish Sasidharan wrote: Whatever you write in bolts would be the logic you want to apply on your ev= ents. In Spark, that logic would be coded in map() or similar such=C2=A0 tr= ansformations and/or actions. Spark doesn't enforce a structure for capturi= ng your processing logic like Storm does.Regards SabProbably overloading the question a bit. In Storm, Bolts have the functionality of getting triggered on events. Is t= hat kind of functionality possible with Spark streaming? During each phase = of the data processing, the transformed data is stored to the database and = this transformed data should then be sent to a new pipeline for further pro= cessing How can this be achieved using Spark? On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast wrote: I have a use-case where a stream of Incoming events have to be aggregated a= nd joined to create Complex events. The aggregation will have to happen at = an interval of 1 minute (or less). The pipeline is :=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 send events = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0en= rich eventUpstream services -------------------> KAFKA ---------> event Str= eam Processor ------------> Complex Event Processor ------------> Elastic S= earch. >From what I understand, Storm will make a very good ESP and Spark Streaming= will make a good CEP. But, we are also evaluating Storm with Trident. How does Spark Streaming compare with Storm with Trident? Sridhar Chellappa =C2=A0=20 On Wednesday, 17 June 2015 10:02 AM, ayan guha w= rote: =20 I have a similar scenario where we need to bring data from kinesis to hbas= e. Data volecity is 20k per 10 mins. Little manipulation of data will be re= quired but that's regardless of the tool so we will be writing that piece i= n Java pojo. All env is on aws. Hbase is on a long running EMR and kinesis = on a separate cluster.TIA. Best AyanOn 17 Jun 2015 12:13, "Will Briggs" wrote: The programming models for the two frameworks are conceptually rather diffe= rent; I haven't worked with Storm for quite some time, but based on my old = experience with it, I would equate Spark Streaming more with Storm's Triden= t API, rather than with the raw Bolt API. Even then, there are significant = differences, but it's a bit closer. If you can share your use case, we might be able to provide better guidance= . Regards, Will On June 16, 2015, at 9:46 PM, asoni.learn@gmail.com wrote: Hi All, I am evaluating spark VS storm ( spark streaming=C2=A0 ) and i am not able = to see what is equivalent of Bolt in storm inside spark. Any help will be appreciated on this ? Thanks , Ashish --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org =20 ------=_Part_264015_361895316.1434530000728 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
When you say Sto= rm, did you mean Storm with Trident or Storm?

My use case does not have simple transformation. There are complex events = that need to be generated by joining the incoming event stream.

Also, what do you mean by "No Back PRessure"= ?

=




On Wednesday, 17 June 2015 11:57 AM, Enno Shioji <eshioji@gmail.com&= gt; wrote:


We've evaluated Spark Streaming= vs. Storm and ended up sticking with Storm.

<= div>Some of the important draw backs are:
Spark= has no back pressure (receiver rate limit can alleviate this to a certain = point, but it's far from ideal)
There is also no exactly-once semantics. (updateStateByKey can = achieve this semantics, but is not practical if you have any significant am= ount of state because it does so by dumping the entire state on every check= pointing)

= There are also some minor drawbacks that I'm sure will be fixed quickly, li= ke no task timeout, not being able to read from Kafka using multiple nodes,= data loss hazard with Kafka.

It's = also not possible to attain very low latency in Spark, if that's what you n= eed.

The pos for Spark is the conci= se and IMO more intuitive syntax, especially if you compare it with Storm's= Java API.

I admit I might be a bit= biased towards Storm tho as I'm more familiar with it.

Also, you can do some processing with Kinesis. If all = you need to do is straight forward transformation and you are reading from = Kinesis to begin with, it might be an easier option to just do the transfor= mation in Kinesis.




=

On Wed, Jun 17, 2015 at 7:15 AM, Sabarish Sasidha= ran <sabarish.sasidharan@manthan.com> = wrote:
Whatever you write in bolts would be the logic you want to appl= y on your events. In Spark, that logic would be coded in map() or similar s= uch  transformations and/or actions. Spark doesn't enforce a structure= for capturing your processing logic like Storm does.
Regards
Sab
Probably overloading the question a bit.

In Storm, Bolts have the functionali= ty of getting triggered on events. Is that kind of functionality possible w= ith Spark streaming? During each phase of the data processing, the transfor= med data is stored to the database and this transformed data should then be= sent to a new pipeline for further processing

How can this be achieved using Spark?


On Wed, Ju= n 17, 2015 at 10:10 AM, Spark Enthusiast <sparkenthusiast@yahoo.= in> wrote:
I have a use-case where a stream = of Incoming events have to be aggregated and joined to create Complex event= s. The aggregation will have to happen at an interval of 1 minute (or less)= .

The pipeline is :
      &nb= sp;                     &= nbsp;     send events             &= nbsp;                    =        enrich event
Upstream ser= vices -------------------> KAFKA ---------> event Stream Processor --= ----------> Complex Event Processor ------------> Elastic Search.

From what I un= derstand, Storm will make a very good ESP and Spark Streaming will make a g= ood CEP.

Bu= t, we are also evaluating Storm with Trident.

How does Spark Streaming compare with St= orm with Trident?

Sridhar Chellappa



 



= On Wednesday, 17 June 2015 10:02 AM, ayan guha <guha.ayan@gmail.com> wrote:


I have a similar scenario where we need to bring data f= rom kinesis to hbase. Data volecity is 20k per 10 mins. Little manipulation= of data will be required but that's regardless of the tool so we will be w= riting that piece in Java pojo.
All env is on aws. Hbase is on a long running EMR and kine= sis on a separate cluster.
TIA.
Best
Ayan
On 17 Jun 2015 12:13, "Will Briggs" <wrbriggs@gmail.com> wrote:
The programming models for the two frameworks are conceptually= rather different; I haven't worked with Storm for quite some time, but bas= ed on my old experience with it, I would equate Spark Streaming more with S= torm's Trident API, rather than with the raw Bolt API. Even then, there are= significant differences, but it's a bit closer.

If you can share your use case, we might be able to provide better guidance= .

Regards,
Will

On June 16, 2015, at 9:46 PM, asoni.learn@gmail.com wrote:

Hi All,

I am evaluating spark VS storm ( spark streaming  ) and i am not able = to see what is equivalent of Bolt in storm inside spark.

Any help will be appreciated on this ?

Thanks ,
Ashish
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org



=

<= br>
------=_Part_264015_361895316.1434530000728--