spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Mistroni <>
Subject Re: Re: Re: Spark Streaming prediction
Date Tue, 03 Jan 2017 09:49:21 GMT
 ok then my suggestion stays.Check out ML
you can train your ML model on past data (let's say, either yesteday or
past x days) to have Spark find out what is the relation betwen the value
you have at T-zero and the value you have at T+n hours and you can try ml
outside your. Streaming app by gathering data for x days , feed it to your
model and see results

On Mon, Jan 2, 2017 at 9:51 PM, Daniela S <> wrote:

> Dear Marco
> No problem, thank you very much for your help!
> Yes, that is correct. I always know the minute values for the next e.g.
> 180 minutes (may vary between the different devices) and I want to predict
> the values for the next 24 hours (one value per minute). So as long as
> I know the values (e.g. 180 minutes) I would of course like to use these
> values and the missing ones to get values for the next 24 hours (one value
> per minute) should be predicted.
> Thank you in advance.
> Regards,
> Daniela
> *Gesendet:* Montag, 02. Januar 2017 um 22:30 Uhr
> *Von:* "Marco Mistroni" <>
> *An:* "Daniela S" <>
> *Cc:* User <>
> *Betreff:* Re: Re: Spark Streaming prediction
> Apologies, perhaps i misunderstood your usecase.
> My assumption was that you have 2-3 hours worth fo data and you want to
> know the values for the next 24 based on the values you already have, that
> is why i suggested  the ML path.
> If that is not the case please ignore everything i said..
> so, let's take the simple case where you have only 1 device
> So every event contains the minute value of that device for the next 180
> mins. So at any point in time you only  have visibility of the next 180
> minutes, correct?
> Now do you want to predict what the value will be for the next 24 hrs, or
> do you  just want to accumulate data worth of 24 hrs and display it in the
> dashboard?
> or is it something else?
> for dashboard update, i guess you either
> - poll 'a  database' (where you store the compuation of your spark logic )
> periodically
> - propagate events from your spark streaming application to your dashboard
> somewhere (via actors/ JMS or whatever mechanism)
> kr
>  marco
> On Mon, Jan 2, 2017 at 8:26 PM, Daniela S <> wrote:
>> Hi
>> Thank you very much for your answer!
>> My problem is that I know the values for the next 2-3 hours in advance
>> but i do not know the values from hour 2 or 3 to hour 24. How is it
>> possible to combine the known values with the predicted values as both are
>> values in the future? And how can i ensure that there are always 1440
>> values?
>> And I do not know how to map the values for 1440 minutes to a specific
>> time on the dashboard (e.g. how does the dashboard know that the value for
>> minute 300 maps to time 15:05?
>> Thank you in advance.
>> Best regards,
>> Daniela
>> *Gesendet:* Montag, 02. Januar 2017 um 21:07 Uhr
>> *Von:* "Marco Mistroni" <>
>> *An:* "Daniela S" <>
>> *Cc:* User <>
>> *Betreff:* Re: Spark Streaming prediction
>> Hi
>>  you  might want to have a look at the Regression ML  algorithm and
>> integrate it in your SparkStreaming application, i m sure someone on the
>> list has  a similar use case
>> shortly, you'd want to process all your events and feed it through a ML
>> model which,based on your inputs will predict output
>> You say that your events predict minutes values for next 2-3 hrs...
>> gather data for a day and train ur model based on that. Then save it
>> somewhere and have your streaming app load the module and have the module
>> do the predictions based on incoming events from your streaming app.
>> Save the results somewhere and have your dashboard poll periodically your
>> data store to read the predictions
>> I have seen ppl on the list doing ML over a Spark streaming app, i m sure
>> someone can reply back....
>> Hpefully i gave u a starting point....
>> hth
>>  marco
>> On 2 Jan 2017 4:03 pm, "Daniela S" <> wrote:
>>> Hi
>>> I am trying to solve the following problem with Spark Streaming.
>>> I receive timestamped events from Kafka. Each event refers to a device
>>> and contains values for every minute of the next 2 to 3 hours. What I would
>>> like to do is to predict the minute values for the next 24 hours. So I
>>> would like to use the known values and to predict the other values to
>>> achieve the 24 hours prediction. My thought was to use arrays with a length
>>> of 1440 (1440 minutes = 24 hours). One for the known values and one for the
>>> predicted values for each device. Then I would like to show the next 24
>>> hours on a dashboard. The dashboard should be updated automatically in
>>> realtime.
>>> My questions:
>>> is this a possible solution?
>>> how is it possible to combine known future values and predicted values?
>>> how should I treat the timestamp as the length of 1440 does not
>>> correspond to a timestamp?
>>> how is it possible to update the dashboard automatically in realtime?
>>> Thank you in advance!
>>> Best regards,
>>> Daniela
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail:
>> --------------------------------------------------------------------- To
>> unsubscribe e-mail:

View raw message