We don't have these right now. We had a summer of code student start on
Logistic Regression, but she didn't complete the project.
Can you say more about your problem? Are you saying that you have 16,000
predictor variables sampled in time and one prediction variable (presence of
short squeeze)? Or is it possible for short squeezes to be applied to
individual equities so that you have 16,000 time series each annotated with
whether a short squeeze occurred?
If the former, then you have a much bigger problem than just doing the
regression. If the latter, then you might be able to use some online
learning software like Vowpal Wabbit to do your job.
Can you say more?
On Mon, Dec 7, 2009 at 3:04 PM, Rajat Banerjee <rqbanerjee@gmail.com> wrote:
> Dear Apache Community,
> I am looking to perform a linear regression on a rather large amount
> of data in my hadoop cluster. It is part of my master's thesis at
> harvard university.
>
> After perusing the docs on the Mahout site, it seems like the
> following algorithms havent been implemented yet
> LocallyWeighted Linear Regression
> Linear Regression
> Logistic Regression
>
> Basically, there is a stock market phenomenon which I'm trying to
> predict. It is called a short squeeze. I have about 16,000 data points
>  stocks and a point in time where the phenomenon has occurred. I'm
> trying to develop a predictive model in a hadoop cluster.
>
> The accuracy of the model doesn't matter much at this point, the goal
> and what would make my prof happy is to see the cluster grinding away,
> doing some relevant but perhaps not totally correct mathematical
> operations. Read: If its a linear regression i'll be happy, but if it
> isn't possible yet I dont mind.
>
> Can anyone suggest something I can use? I've downloaded Mahout 0.2 and
> searched through it, but nothing for performing regressions has jumped
> out at me.
> Thank you.
> Best,
> Rajat
>

Ted Dunning, CTO
DeepDyve
