Great discussion!!

One qs abou= t some comment: Also, you can do some processing with Kinesis. If all you n= eed to do is=20 straight forward transformation and you are reading from Kinesis to=20 begin with, it might be an easier option to just do the transformation=20 in Kinesis

- Do you mean KCL application? Or some kind of proc= essing withinKineis?

Can you kindly share a link? I would definitel= y pursue this route as our transformations are really simple.

= Best

On = Wed, Jun 17, 2015 at 10:26 PM, Ashish Soni <asoni.learn@gmail.com&= gt; wrote:
My Use= case is below=C2=A0

We are going to receive lot of even= t as stream ( basically Kafka Stream ) and then we need to process and comp= ute=C2=A0

Consider you have a phone contract with = ATT and every call / sms / data useage you do is an event and then it needs= =C2=A0to calculate your bill on real time basis so when you login to your = account you can see all those variable as how much you used and how much is= left and what is your bill till date ,Also there are different rules which= need to be considered when you calculate the total bill one simple rule wi= ll be 0-500 min it is free but above it is \$1 a min.

How do i maintain a shared state =C2=A0( total amount , total min , tota= l data etc ) so that i know how much i accumulated at any given point as ev= ents for same phone can go to any node / executor.=C2=A0

Can some one please tell me how can i achieve this is spark as in st= orm i can have a bolt which can do this ?

Thanks,<= /div>

=C2=A0

On Wed, J= un 17, 2015 at 4:52 AM, Enno Shioji wrot= e:
I guess both. In term= s of syntax, I was comparing it with Trident.

If you are= joining, Spark Streaming actually does offer windowed join out of the box.= We couldn't use this though as our event stream can grow "out-of-= sync", so we had to implement something on top of Storm. If your event= streams don't become out of sync, you may find the built-in join in Sp= ark Streaming useful. Storm also has a join keyword but its semantics are d= ifferent.

>=C2=A0Also, what do y= ou mean by "No Back Pressure" ?

S= o when a topology is overloaded, Storm is designed so that it will stop rea= ding from the source. Spark on the other hand, will keep reading from the s= ource and spilling it internally. This maybe fine, in fairness, but it does= mean you have to worry about the persistent store usage in the processing = cluster, whereas with Storm you don't have to worry because the message= s just remain in the data store.

Spark came up wit= h the idea of rate limiting, but I don't feel this is as nice as back p= ressure because it's very difficult to tune it such that you don't = cap the cluster's processing power but yet so that it will prevent the = persistent storage to get used up.

On Wed, Jun 17, 2015 = at 9:33 AM, Spark Enthusiast <sparkenthusiast@yahoo.in> wrote:
When you say Storm, did yo= u mean Storm with Trident or Storm?

My use case do= es not have simple transformation. There are complex events that need to be= generated by joining the incoming event stream.

Also, what do you mean by "No Back PRessure" ?

=
On Wednesday, 17 June 2015 11:57 AM, Enno Shioji <eshioji@gmail.com> wrote:

We've evalua= ted Spark Streaming vs. Storm and ended up sticking with Storm.

Some of the important draw backs are:
Spark has no back pressure (receiver rate limit can alleviate= this to a certain point, but it's far from ideal)
There is a= lso no exactly-once=C2=A0semant= ics. (updateStateByKey can achieve this semantics, but is not practical if = you have any significant amount of state because it does so by dumping the = entire state on every checkpointing)
There are also some minor drawbacks that I'm= sure will be fixed quickly, like no task timeout, not being able to read f= rom Kafka using multiple nodes, data loss hazard with Kafka.

It's also not possible to attain very low lat= ency in Spark, if that's what you need.

The pos for Spark is the concise and IMO more intuitive syntax, es= pecially if you compare it with Storm's Java API.

I admit I might be a bit biased towards Storm tho as I= 'm more familiar with it.

Also,= you can do some processing with Kinesis. If all you need to do is straight= forward transformation and you are reading from Kinesis to begin with, it = might be an easier option to just do the transformation in Kinesis.

On Wed, Jun 17, 2015 at 7:15 AM, Sabarish Sasidharan wrote:
Whatever yo= u write in bolts would be the logic you want to apply on your events. In Sp= ark, that logic would be coded in map() or similar such=C2=A0 transformatio= ns and/or actions. Spark doesn't enforce a structure for capturing your= processing logic like Storm does.
Regards
Sab

In Storm, Bolts have the functionalit= y of getting triggered on events. Is that kind of functionality possible wi= th Spark streaming? During each phase of the data processing, the transform= ed data is stored to the database and this transformed data should then be = sent to a new pipeline for further processing

How can this be achieved using Spark?

On We= d, Jun 17, 2015 at 10:10 AM, Spark Enthusiast <sparkenthusiast@yahoo.in> wrote:
I have a use-case where a stream of Inco= ming events have to be aggregated and joined to create Complex events. The = aggregation will have to happen at an interval of 1 minute (or less).

The pipeline is :
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 send events =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0enrich event
Upstream service= s -------------------> KAFKA ---------> event Stream Processor ------= ------> Complex Event Processor ------------> Elastic Search.

From what I unders= tand, Storm will make a very good ESP and Spark Streaming will make a good = CEP.

But, w= e are also evaluating Storm with Trident.

How does Spark Streaming compare with Stor= m with Trident?

Sridhar Chellappa

<= /div>

=C2=A0

On Wednesday, 1= 7 June 2015 10:02 AM, ayan guha <guha.ayan@gmail.com&g= t; wrote:

I have a similar scenario where = we need to bring data from kinesis to hbase. Data volecity is 20k per 10 mi= ns. Little manipulation of data will be required but that's regardless = of the tool so we will be writing that piece in Java pojo.
All env is on aws. Hbase is on a long running EMR and kine= sis on a separate cluster.
TIA.
Best
Ayan
On 17 Jun 2015 12:13, "Will Briggs" <wr= briggs@gmail.com> wrote:
The programming = models for the two frameworks are conceptually rather different; I haven= 9;t worked with Storm for quite some time, but based on my old experience w= ith it, I would equate Spark Streaming more with Storm's Trident API, r= ather than with the raw Bolt API. Even then, there are significant differen= ces, but it's a bit closer.

If you can share your use case, we might be able to provide better guidance= .

Regards,
Will

On June 16, 2015, at 9:46 PM, asoni.learn@gmail.com wro= te:

Hi All,

I am evaluating spark VS storm ( spark streaming=C2=A0 ) and i am not able = to see what is equivalent of Bolt in storm inside spark.

Any help will be appreciated on this ?

Thanks ,
Ashish
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.= apache.org
For additional commands, e-mail: user-help@spark.apach= e.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.= apache.org
For additional commands, e-mail: user-help@spark.apach= e.org

=

<= br>

--
Best Regards,
Ayan Guha
--001a114e6008d7c2090518b5fe10--