mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajat Banerjee <>
Subject Re: Looking for Linear Regression on Hadoop
Date Mon, 07 Dec 2009 23:29:07 GMT
Dear Ted, Thanks for your prompt reply.

There are 16,000 rows of data. There are only four significant
variables in my hypothesis. The regression shouldn't be too nasty.
I've looked at some non-distributed libraries and they seem capable,
but would love to get it started in hadoop since that's my end goal.

single-threaded :

Thanks. Best,

On Mon, Dec 7, 2009 at 6:21 PM, Ted Dunning <> wrote:
> We don't have these right now.  We had a summer of code student start on
> Logistic Regression, but she didn't complete the project.
> Can you say more about your problem?  Are you saying that you have 16,000
> predictor variables sampled in time and one prediction variable (presence of
> short squeeze)?  Or is it possible for short squeezes to be applied to
> individual equities so that you have 16,000 time series each annotated with
> whether a short squeeze occurred?
> If the former, then you have a much bigger problem than just doing the
> regression.  If the latter, then you might be able to use some on-line
> learning software like Vowpal Wabbit to do your job.
> Can you say more?
> On Mon, Dec 7, 2009 at 3:04 PM, Rajat Banerjee <> wrote:
>> Dear Apache Community,
>> I am looking to perform a linear regression on a rather large amount
>> of data in my hadoop cluster. It is part of my master's thesis at
>> harvard university.
>> After perusing the docs on the Mahout site, it seems like the
>> following algorithms havent been implemented yet-
>> Locally-Weighted Linear Regression
>> Linear Regression
>> Logistic Regression
>> Basically, there is a stock market phenomenon which I'm trying to
>> predict. It is called a short squeeze. I have about 16,000 data points
>> - stocks and a point in time where the phenomenon has occurred. I'm
>> trying to develop a predictive model in a hadoop cluster.
>> The accuracy of the model doesn't matter much at this point, the goal
>> and what would make my prof happy is to see the cluster grinding away,
>> doing some relevant but perhaps not totally correct mathematical
>> operations. Read: If its a linear regression i'll be happy, but if it
>> isn't possible yet I dont mind.
>> Can anyone suggest something I can use? I've downloaded Mahout 0.2 and
>> searched through it, but nothing for performing regressions has jumped
>> out at me.
>> Thank you.
>> Best,
>> Rajat
> --
> Ted Dunning, CTO
> DeepDyve

View raw message