spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Pilat <jrpi...@gmail.com>
Subject Re: Spark or Storm
Date Wed, 17 Jun 2015 22:26:55 GMT
>not being able to read from Kafka using multiple nodes

Kafka is plenty capable of doing this,  by clustering together multiple
consumer instances into a consumer group.
If your topic is sufficiently partitioned, the consumer group can consume
the topic in a parallelized fashion.
If it isn't, you still have the fault tolerance associated with clustering
the consumers.

OK
JRP
On Jun 17, 2015 1:27 AM, "Enno Shioji" <eshioji@gmail.com> wrote:

> We've evaluated Spark Streaming vs. Storm and ended up sticking with Storm.
>
> Some of the important draw backs are:
> Spark has no back pressure (receiver rate limit can alleviate this to a
> certain point, but it's far from ideal)
> There is also no exactly-once semantics. (updateStateByKey can achieve
> this semantics, but is not practical if you have any significant amount of
> state because it does so by dumping the entire state on every checkpointing)
>
> There are also some minor drawbacks that I'm sure will be fixed quickly,
> like no task timeout, not being able to read from Kafka using multiple
> nodes, data loss hazard with Kafka.
>
> It's also not possible to attain very low latency in Spark, if that's what
> you need.
>
> The pos for Spark is the concise and IMO more intuitive syntax, especially
> if you compare it with Storm's Java API.
>
> I admit I might be a bit biased towards Storm tho as I'm more familiar
> with it.
>
> Also, you can do some processing with Kinesis. If all you need to do is
> straight forward transformation and you are reading from Kinesis to begin
> with, it might be an easier option to just do the transformation in Kinesis.
>
>
>
>
>
> On Wed, Jun 17, 2015 at 7:15 AM, Sabarish Sasidharan <
> sabarish.sasidharan@manthan.com> wrote:
>
>> Whatever you write in bolts would be the logic you want to apply on your
>> events. In Spark, that logic would be coded in map() or similar such
>> transformations and/or actions. Spark doesn't enforce a structure for
>> capturing your processing logic like Storm does.
>>
>> Regards
>> Sab
>> Probably overloading the question a bit.
>>
>> In Storm, Bolts have the functionality of getting triggered on events. Is
>> that kind of functionality possible with Spark streaming? During each phase
>> of the data processing, the transformed data is stored to the database and
>> this transformed data should then be sent to a new pipeline for further
>> processing
>>
>> How can this be achieved using Spark?
>>
>>
>>
>> On Wed, Jun 17, 2015 at 10:10 AM, Spark Enthusiast <
>> sparkenthusiast@yahoo.in> wrote:
>>
>>> I have a use-case where a stream of Incoming events have to be
>>> aggregated and joined to create Complex events. The aggregation will have
>>> to happen at an interval of 1 minute (or less).
>>>
>>> The pipeline is :
>>>                                   send events
>>>                enrich event
>>> Upstream services -------------------> KAFKA ---------> event Stream
>>> Processor ------------> Complex Event Processor ------------> Elastic
>>> Search.
>>>
>>> From what I understand, Storm will make a very good ESP and Spark
>>> Streaming will make a good CEP.
>>>
>>> But, we are also evaluating Storm with Trident.
>>>
>>> How does Spark Streaming compare with Storm with Trident?
>>>
>>> Sridhar Chellappa
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   On Wednesday, 17 June 2015 10:02 AM, ayan guha <guha.ayan@gmail.com>
>>> wrote:
>>>
>>>
>>> I have a similar scenario where we need to bring data from kinesis to
>>> hbase. Data volecity is 20k per 10 mins. Little manipulation of data will
>>> be required but that's regardless of the tool so we will be writing that
>>> piece in Java pojo.
>>> All env is on aws. Hbase is on a long running EMR and kinesis on a
>>> separate cluster.
>>> TIA.
>>> Best
>>> Ayan
>>> On 17 Jun 2015 12:13, "Will Briggs" <wrbriggs@gmail.com> wrote:
>>>
>>> The programming models for the two frameworks are conceptually rather
>>> different; I haven't worked with Storm for quite some time, but based on my
>>> old experience with it, I would equate Spark Streaming more with Storm's
>>> Trident API, rather than with the raw Bolt API. Even then, there are
>>> significant differences, but it's a bit closer.
>>>
>>> If you can share your use case, we might be able to provide better
>>> guidance.
>>>
>>> Regards,
>>> Will
>>>
>>> On June 16, 2015, at 9:46 PM, asoni.learn@gmail.com wrote:
>>>
>>> Hi All,
>>>
>>> I am evaluating spark VS storm ( spark streaming  ) and i am not able to
>>> see what is equivalent of Bolt in storm inside spark.
>>>
>>> Any help will be appreciated on this ?
>>>
>>> Thanks ,
>>> Ashish
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message