spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Miller <>
Subject Re: Correct way to use spark streaming with apache zeppelin
Date Sat, 12 Mar 2016 11:21:15 GMT
I'm pretty new to all of this stuff, so bare with me.

Zeppelin isn't really intended for realtime dashboards as far as I know.
Its reporting features (tables, graphs, etc.) are more for displaying the
results from the output of something. As far as I know, there isn't really
anything to "watch" a dataset and have updates pushed to the Zeppelin UI.

As for Spark, unless you're doing a lot of processing that you didn't
mention here, I don't think it's a good fit just for this.

If it were me (just off the top of my head), I'd just build a simple web
service that uses websockets to push updates to the client which could then
be used to update graphs, tables, etc. The data itself -- that is, the
accumulated totals -- you could store in something like Redis. When an
order comes in, just add that quantity and price to the existing value and
trigger your code to push out an updated value to any clients via the
websocket. You could use something like a Redis pub/sub channel to trigger
the web app to notify clients of an update.

There are about 5 million other ways you could design this, but I would
just keep it as simple as possible. I just threw one idea out...

Good luck.

Chris Miller

On Sat, Mar 12, 2016 at 6:58 PM, trung kien <> wrote:

> Thanks Chris and Mich for replying.
> Sorry for not explaining my problem clearly.  Yes i am talking about a
> flexibke dashboard when mention Zeppelin.
> Here is the problem i am having:
> I am running a comercial website where we selle many products and we have
> many branchs in many place. We have a lots of realtime transactions and
> want to anaylyze it in realtime.
> We dont want every time doing analytics we have to aggregate every single
> transactions ( each transaction have BranchID, ProductID, Qty, Price). So,
> we maintain intermediate data which contains : BranchID, ProducrID,
> totalQty, totalDollar
> Ideally, we have 2 tables:
>    Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)
> And intermediate table Stats is just sum of every transaction group by
> BranchID and ProductID( i am using Sparkstreaming to calculate this table
> realtime)
> My thinking is that doing statistics ( realtime dashboard)  on Stats table
> is much easier, this table is also not enough for maintain.
> I'm just wondering, whats the best way to store Stats table( a database or
> parquet file?)
> What exactly are you trying to do? Zeppelin is for interactive analysis of
> a dataset. What do you mean "realtime analytics" -- do you mean build a
> report or dashboard that automatically updates as new data comes in?
> --
> Chris Miller
> On Sat, Mar 12, 2016 at 3:13 PM, trung kien <> wrote:
>> Hi all,
>> I've just viewed some Zeppenlin's videos. The intergration between
>> Zeppenlin and Spark is really amazing and i want to use it for my
>> application.
>> In my app, i will have a Spark streaming app to do some basic realtime
>> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
>> realtime analytics on the intermediate data.
>> My question is what's the most efficient storage engine to store realtime
>> intermediate data? Is parquet file somewhere is suitable?

View raw message