spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enno Shioji <eshi...@gmail.com>
Subject Re: Spark or Storm
Date Wed, 17 Jun 2015 14:33:28 GMT
AFAIK KCL is *supposed* to provide fault tolerance and load balancing (plus
additionally, elastic scaling unlike Storm), Kinesis providing the
coordination. My understanding is that it's like a naked Storm worker
process that can consequently only do map.

I haven't really used it tho, so can't really comment how it compares to
Spark/Storm. Maybe somebody else will be able to comment.



On Wed, Jun 17, 2015 at 3:13 PM, ayan guha <guha.ayan@gmail.com> wrote:

> Thanks for this. It's kcl based kinesis application. But because its just
> a Java application we are thinking to use spark on EMR or storm for fault
> tolerance and load balancing. Is it a correct approach?
> On 17 Jun 2015 23:07, "Enno Shioji" <eshioji@gmail.com> wrote:
>
>> Hi Ayan,
>>
>> Admittedly I haven't done much with Kinesis, but if I'm not mistaken you
>> should be able to use their "processor" interface for that. In this
>> example, it's incrementing a counter:
>> https://github.com/awslabs/amazon-kinesis-data-visualization-sample/blob/master/src/main/java/com/amazonaws/services/kinesis/samples/datavis/kcl/CountingRecordProcessor.java
>>
>> Instead of incrementing a counter, you could do your transformation and
>> send it to HBase.
>>
>>
>>
>>
>>
>>
>> On Wed, Jun 17, 2015 at 1:40 PM, ayan guha <guha.ayan@gmail.com> wrote:
>>
>>> Great discussion!!
>>>
>>> One qs about some comment: Also, you can do some processing with
>>> Kinesis. If all you need to do is straight forward transformation and you
>>> are reading from Kinesis to begin with, it might be an easier option to
>>> just do the transformation in Kinesis
>>>
>>> - Do you mean KCL application? Or some kind of processing withinKineis?
>>>
>>> Can you kindly share a link? I would definitely pursue this route as our
>>> transformations are really simple.
>>>
>>> Best
>>>
>>> On Wed, Jun 17, 2015 at 10:26 PM, Ashish Soni <asoni.learn@gmail.com>
>>> wrote:
>>>
>>>> My Use case is below
>>>>
>>>> We are going to receive lot of event as stream ( basically Kafka Stream
>>>> ) and then we need to process and compute
>>>>
>>>> Consider you have a phone contract with ATT and every call / sms / data
>>>> useage you do is an event and then it needs  to calculate your bill on real
>>>> time basis so when you login to your account you can see all those variable
>>>> as how much you used and how much is left and what is your bill till date
>>>> ,Also there are different rules which need to be considered when you
>>>> calculate the total bill one simple rule will be 0-500 min it is free but
>>>> above it is $1 a min.
>>>>
>>>> How do i maintain a shared state  ( total amount , total min , total
>>>> data etc ) so that i know how much i accumulated at any given point as
>>>> events for same phone can go to any node / executor.
>>>>
>>>> Can some one please tell me how can i achieve this is spark as in storm
>>>> i can have a bolt which can do this ?
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> On Wed, Jun 17, 2015 at 4:52 AM, Enno Shioji <eshioji@gmail.com> wrote:
>>>>
>>>>> I guess both. In terms of syntax, I was comparing it with Trident.
>>>>>
>>>>> If you are joining, Spark Streaming actually does offer windowed join
>>>>> out of the box. We couldn't use this though as our event stream can grow
>>>>> "out-of-sync", so we had to implement something on top of Storm. If your
>>>>> event streams don't become out of sync, you may find the built-in join
in
>>>>> Spark Streaming useful. Storm also has a join keyword but its semantics
are
>>>>> different.
>>>>>
>>>>>
>>>>> > Also, what do you mean by "No Back Pressure" ?
>>>>>
>>>>> So when a topology is overloaded, Storm is designed so that it will
>>>>> stop reading from the source. Spark on the other hand, will keep reading
>>>>> from the source and spilling it internally. This maybe fine, in fairness,
>>>>> but it does mean you have to worry about the persistent store usage in
the
>>>>> processing cluster, whereas with Storm you don't have to worry because
the
>>>>> messages just remain in the data store.
>>>>>
>>>>> Spark came up with the idea of rate limiting, but I don't feel this is
>>>>> as nice as back pressure because it's very difficult to tune it such
that
>>>>> you don't cap the cluster's processing power but yet so that it will
>>>>> prevent the persistent storage to get used up.
>>>>>
>>>>>
>>>>> On Wed, Jun 17, 2015 at 9:33 AM, Spark Enthusiast <
>>>>> sparkenthusiast@yahoo.in> wrote:
>>>>>
>>>>>> When you say Storm, did you mean Storm with Trident or Storm?
>>>>>>
>>>>>> My use case does not have simple transformation. There are complex
>>>>>> events that need to be generated by joining the incoming event stream.
>>>>>>
>>>>>> Also, what do you mean by "No Back PRessure" ?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   On Wednesday, 17 June 2015 11:57 AM, Enno Shioji <eshioji@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> We've evaluated Spark Streaming vs. Storm and ended up sticking with
>>>>>> Storm.
>>>>>>
>>>>>> Some of the important draw backs are:
>>>>>> Spark has no back pressure (receiver rate limit can alleviate this
to
>>>>>> a certain point, but it's far from ideal)
>>>>>> There is also no exactly-once semantics. (updateStateByKey can
>>>>>> achieve this semantics, but is not practical if you have any significant
>>>>>> amount of state because it does so by dumping the entire state on
every
>>>>>> checkpointing)
>>>>>>
>>>>>> There are also some minor drawbacks that I'm sure will be fixed
>>>>>> quickly, like no task timeout, not being able to read from Kafka
using
>>>>>> multiple nodes, data loss hazard with Kafka.
>>>>>>
>>>>>> It's also not possible to attain very low latency in Spark, if that's
>>>>>> what you need.
>>>>>>
>>>>>> The pos for Spark is the concise and IMO more intuitive syntax,
>>>>>> especially if you compare it with Storm's Java API.
>>>>>>
>>>>>> I admit I might be a bit biased towards Storm tho as I'm more
>>>>>> familiar with it.
>>>>>>
>>>>>> Also, you can do some processing with Kinesis. If all you need to
do
>>>>>> is straight forward transformation and you are reading from Kinesis
to
>>>>>> begin with, it might be an easier option to just do the transformation
in
>>>>>> Kinesis.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 17, 2015 at 7:15 AM, Sabarish Sasidharan <
>>>>>> sabarish.sasidharan@manthan.com> wrote:
>>>>>>
>>>>>> Whatever you write in bolts would be the logic you want to apply
on
>>>>>> your events. In Spark, that logic would be coded in map() or similar
such
>>>>>> transformations and/or actions. Spark doesn't enforce a structure
for
>>>>>> capturing your processing logic like Storm does.
>>>>>> Regards
>>>>>> Sab
>>>>>> Probably overloading the question a bit.
>>>>>>
>>>>>> In Storm, Bolts have the functionality of getting triggered on
>>>>>> events. Is that kind of functionality possible with Spark streaming?
During
>>>>>> each phase of the data processing, the transformed data is stored
to the
>>>>>> database and this transformed data should then be sent to a new pipeline
>>>>>> for further processing
>>>>>>
>>>>>> How can this be achieved using Spark?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast <
>>>>>> sparkenthusiast@yahoo.in> wrote:
>>>>>>
>>>>>> I have a use-case where a stream of Incoming events have to be
>>>>>> aggregated and joined to create Complex events. The aggregation will
have
>>>>>> to happen at an interval of 1 minute (or less).
>>>>>>
>>>>>> The pipeline is :
>>>>>>                                   send events
>>>>>>                  enrich event
>>>>>> Upstream services -------------------> KAFKA ---------> event
Stream
>>>>>> Processor ------------> Complex Event Processor ------------>
Elastic
>>>>>> Search.
>>>>>>
>>>>>> From what I understand, Storm will make a very good ESP and Spark
>>>>>> Streaming will make a good CEP.
>>>>>>
>>>>>> But, we are also evaluating Storm with Trident.
>>>>>>
>>>>>> How does Spark Streaming compare with Storm with Trident?
>>>>>>
>>>>>> Sridhar Chellappa
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   On Wednesday, 17 June 2015 10:02 AM, ayan guha <guha.ayan@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> I have a similar scenario where we need to bring data from kinesis
to
>>>>>> hbase. Data volecity is 20k per 10 mins. Little manipulation of data
will
>>>>>> be required but that's regardless of the tool so we will be writing
that
>>>>>> piece in Java pojo.
>>>>>> All env is on aws. Hbase is on a long running EMR and kinesis on
a
>>>>>> separate cluster.
>>>>>> TIA.
>>>>>> Best
>>>>>> Ayan
>>>>>> On 17 Jun 2015 12:13, "Will Briggs" <wrbriggs@gmail.com> wrote:
>>>>>>
>>>>>> The programming models for the two frameworks are conceptually rather
>>>>>> different; I haven't worked with Storm for quite some time, but based
on my
>>>>>> old experience with it, I would equate Spark Streaming more with
Storm's
>>>>>> Trident API, rather than with the raw Bolt API. Even then, there
are
>>>>>> significant differences, but it's a bit closer.
>>>>>>
>>>>>> If you can share your use case, we might be able to provide better
>>>>>> guidance.
>>>>>>
>>>>>> Regards,
>>>>>> Will
>>>>>>
>>>>>> On June 16, 2015, at 9:46 PM, asoni.learn@gmail.com wrote:
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I am evaluating spark VS storm ( spark streaming  ) and i am not
able
>>>>>> to see what is equivalent of Bolt in storm inside spark.
>>>>>>
>>>>>> Any help will be appreciated on this ?
>>>>>>
>>>>>> Thanks ,
>>>>>> Ashish
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>

Mime
View raw message