spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Sharma <>
Subject Re: Is Spark right for my use case?
Date Mon, 08 Aug 2016 07:36:09 GMT
Hi Danellis
For point 1 , spark streaming is something to look at.
For point 2 , you can create DAO from cassandra on each stream
processing.This may be costly operation though , but to do real time
processing of data , you have to live with t.
Point 3 is covered in point 2 above.
Since you are starting fresh , i would suggest going with 2.0 as they have
many features such as dataset /structured querying of streams etc over
previous releases.

On Mon, Aug 8, 2016 at 11:52 AM, danellis <> wrote:

> Spark n00b here.
> Working with online retailers, I start with a list of their products in
> Cassandra (with prices, stock levels, descriptions, etc) and then receive
> an
> HTTP request every time one of them changes. For each change, I update the
> product in Cassandra and store the change with the old and new values.
> What I'd like to do is provide a dashboard with various metrics. Some of
> them are trivial, such as "last n changes". Others, like number of
> in-stock/out-of-stock products would be more complex to retrieve from
> Cassandra, because they're an aggregate of the whole product set.
> I'm thinking about streaming the changes into Spark (via RabbitMQ) to
> generate the data needed for the aggregate metrics, and either storing the
> results in Cassandra or publishing them back to RabbitMQ (depending on
> whether I have the dashboard poll or use a WebSocket).
> I have a few questions:
> 1) Does this seem like a good use case for Spark?
> 2) How much work is it appropriate for a transformation to do? For example,
> my API service currently checks the update against the current data and
> only
> publishes a change if they differ. That sounds to me like it could be a
> filter operation on a stream of all the updates, but it would require
> accessing data from Cassandra inside the filter transformation. Is that
> okay, or something to be avoided? The changes that make it through the
> filter would also have to be logged in Cassandra. Is that crossing concerns
> too much?
> 3) If I'm starting out with existing data, how do I take that into account
> when starting to do stream processing? Would I write something to take my
> logged changes from Cassandra and publish them to RabbitMQ before I start
> my
> real streaming? Seems like the switch-over might be tricky. (Note: I don't
> necessarily need to do this, depending on how things go.)
> 4) Is it a good idea to start with 2.0 now? I see there's an AMQP module
> with 2.0 support and the Cassandra one supports 2.0 with a little work.
> Thanks for any feedback.
> --
> View this message in context: http://apache-spark-user-list.
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe e-mail:


View raw message