mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svetlomir Kasabov <>
Subject Re: Logistic Regression + Time Series
Date Mon, 06 Jun 2011 13:36:54 GMT
Thanks for the useful replies, I really appreciate that!

@Ted and Hector: My initial parameters (predictors) are blood pressures, 
heart rates, etc: they come every minute from a patient's monitor.
In my implementation, I plan refering to this Paper : on page 7 (Table 1) you can 
see the parameters used.  On page 17, figure 4 you can see vizualization 
of the prediction using time series:

I think I still plan using the logistic regression implementation (since 
I am already worked into it), but I am confuzed how to implement time 
series with Mahout. Should I create periodically (for example every 15 
minutes) a new logistic regression model, in order to predict the 
probability of instability? Then the amount of training data depends on 
the 'time window for the past' that I will be using.  For example, for 
data only two hours from the past, I will have only circa 60 * 2 = 120 
examples for creating a temporal model (I assume that I will need one 
compound data vector pro minute, including blood pressures, heart rates, 

Or should I implement the time series so, that I train the model only 
once with old data of many patients and the training algorithm will be 
so, that it checks what is the patient's hemodynamic stability in two 
hours (since this data is also known during the training)? In this case, 
I will have potentually many more examples (one million or more...)

Many thanks, best regards and sorry for the long post.


Am 06.06.2011 12:12, schrieb Ted Dunning:
> What Hector said.
> You will need to extract features from your time history.
> The question also comes up about how large is  your data set.  If it is less
> than 100,000 training examples or so, then you will probably be better off
> using a system like R which handles that much data easily and has
> essentially every kind of classifier available for you to try.
> If you have 1 million training examples or more, then Mahout begins to
> dominate alternatives.  Even there, Mahout is currently optimized for sparse
> data which is not what you have.  My guess is that using the
> OnlineLogisticRegression or some of Hector's recent patches is the way to
> go. The AdaptiveLogisticRegression is heavily oriented around per term
> annealing and magic knob tuning in the context of sparse data.
> Can you post your data?
> On Sun, Jun 5, 2011 at 10:04 AM, Hector Yee<>  wrote:
>> You can also try HMMs:
>> If you want to do it with a classifier you can window your time series and
>> make a training set
>> e.g.
>> label, feature
>> stable, (last X seconds of time series)
>> unstable, (last X seconds of time series)
>> On Sun, Jun 5, 2011 at 8:08 AM, Svetlomir Dimitrov Kasabov<
>>>  wrote:
>>> Hello,
>>> I plan using Apache Mahout's Logistic Regression (LR) implementation in
>> my
>>> Master-Thesis. We plan using time series in order to predict, whether a
>>> particular patient will have an instable blood flow soon or not. Thats's
>> why
>>> I want to ask you if it is possible to use Mahout in connection with time
>>> series ? Do you see any potential problems / risks ?
>>> Many thanks and best regards!
>>> Svetlomir Kasabov.
>>> --
>>> Svetlomir Dimitrov Kasabov
>>> ----------------------------------------------------------------
>>> This message was sent using IMP, the Internet Messaging Program.
>> --
>> Yee Yang Li Hector
>>  (tech + travel)
>>  (book reviews)

View raw message