spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Miller <cmiller11...@gmail.com>
Subject Re: Correct way to use spark streaming with apache zeppelin
Date Sun, 13 Mar 2016 08:25:42 GMT
Cool! Thanks for sharing.


--
Chris Miller

On Sun, Mar 13, 2016 at 12:53 AM, Todd Nist <tsindotg@gmail.com> wrote:

> Below is a link to an example which Silvio Fiorito put together
> demonstrating how to link Zeppelin with Spark Stream for real-time charts.
> I think the original thread was pack in early November 2015, subject: Real
> time chart in Zeppelin, if you care to try to find it.
>
> https://gist.github.com/granturing/a09aed4a302a7367be92
>
> HTH.
>
> -Todd
>
> On Sat, Mar 12, 2016 at 6:21 AM, Chris Miller <cmiller11101@gmail.com>
> wrote:
>
>> I'm pretty new to all of this stuff, so bare with me.
>>
>> Zeppelin isn't really intended for realtime dashboards as far as I know.
>> Its reporting features (tables, graphs, etc.) are more for displaying the
>> results from the output of something. As far as I know, there isn't really
>> anything to "watch" a dataset and have updates pushed to the Zeppelin UI.
>>
>> As for Spark, unless you're doing a lot of processing that you didn't
>> mention here, I don't think it's a good fit just for this.
>>
>> If it were me (just off the top of my head), I'd just build a simple web
>> service that uses websockets to push updates to the client which could then
>> be used to update graphs, tables, etc. The data itself -- that is, the
>> accumulated totals -- you could store in something like Redis. When an
>> order comes in, just add that quantity and price to the existing value and
>> trigger your code to push out an updated value to any clients via the
>> websocket. You could use something like a Redis pub/sub channel to trigger
>> the web app to notify clients of an update.
>>
>> There are about 5 million other ways you could design this, but I would
>> just keep it as simple as possible. I just threw one idea out...
>>
>> Good luck.
>>
>>
>> --
>> Chris Miller
>>
>> On Sat, Mar 12, 2016 at 6:58 PM, trung kien <kientt86@gmail.com> wrote:
>>
>>> Thanks Chris and Mich for replying.
>>>
>>> Sorry for not explaining my problem clearly.  Yes i am talking about a
>>> flexibke dashboard when mention Zeppelin.
>>>
>>> Here is the problem i am having:
>>>
>>> I am running a comercial website where we selle many products and we
>>> have many branchs in many place. We have a lots of realtime transactions
>>> and want to anaylyze it in realtime.
>>>
>>> We dont want every time doing analytics we have to aggregate every
>>> single transactions ( each transaction have BranchID, ProductID, Qty,
>>> Price). So, we maintain intermediate data which contains : BranchID,
>>> ProducrID, totalQty, totalDollar
>>>
>>> Ideally, we have 2 tables:
>>>    Transaction ( BranchID, ProducrID, Qty, Price, Timestamp)
>>>
>>> And intermediate table Stats is just sum of every transaction group by
>>> BranchID and ProductID( i am using Sparkstreaming to calculate this table
>>> realtime)
>>>
>>> My thinking is that doing statistics ( realtime dashboard)  on Stats
>>> table is much easier, this table is also not enough for maintain.
>>>
>>> I'm just wondering, whats the best way to store Stats table( a database
>>> or parquet file?)
>>> What exactly are you trying to do? Zeppelin is for interactive analysis
>>> of a dataset. What do you mean "realtime analytics" -- do you mean build a
>>> report or dashboard that automatically updates as new data comes in?
>>>
>>>
>>> --
>>> Chris Miller
>>>
>>> On Sat, Mar 12, 2016 at 3:13 PM, trung kien <kientt86@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I've just viewed some Zeppenlin's videos. The intergration between
>>>> Zeppenlin and Spark is really amazing and i want to use it for my
>>>> application.
>>>>
>>>> In my app, i will have a Spark streaming app to do some basic realtime
>>>> aggregation ( intermediate data). Then i want to use Zeppenlin to do some
>>>> realtime analytics on the intermediate data.
>>>>
>>>> My question is what's the most efficient storage engine to store
>>>> realtime intermediate data? Is parquet file somewhere is suitable?
>>>>
>>>
>>>
>>
>

Mime
View raw message