mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Looking for Linear Regression on Hadoop
Date Mon, 07 Dec 2009 23:33:40 GMT
If you only have 4 variables and 16k rows, why do you need anything even
close to Hadoop?  This is is a problem which could be regressed on an
iPhone,
couldn't it?

  -jake

On Mon, Dec 7, 2009 at 3:29 PM, Rajat Banerjee <rqbanerjee@gmail.com> wrote:

> Dear Ted, Thanks for your prompt reply.
>
> There are 16,000 rows of data. There are only four significant
> variables in my hypothesis. The regression shouldn't be too nasty.
> I've looked at some non-distributed libraries and they seem capable,
> but would love to get it started in hadoop since that's my end goal.
>
> single-threaded :
> http://www.ee.ucl.ac.uk/~mflanaga/java/Regression.html#sumgl<http://www.ee.ucl.ac.uk/%7Emflanaga/java/Regression.html#sumgl>
>
>
> Thanks. Best,
> Rajat
>
>
> On Mon, Dec 7, 2009 at 6:21 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> > We don't have these right now.  We had a summer of code student start on
> > Logistic Regression, but she didn't complete the project.
> >
> > Can you say more about your problem?  Are you saying that you have 16,000
> > predictor variables sampled in time and one prediction variable (presence
> of
> > short squeeze)?  Or is it possible for short squeezes to be applied to
> > individual equities so that you have 16,000 time series each annotated
> with
> > whether a short squeeze occurred?
> >
> > If the former, then you have a much bigger problem than just doing the
> > regression.  If the latter, then you might be able to use some on-line
> > learning software like Vowpal Wabbit to do your job.
> >
> > Can you say more?
> >
> > On Mon, Dec 7, 2009 at 3:04 PM, Rajat Banerjee <rqbanerjee@gmail.com>
> wrote:
> >
> >> Dear Apache Community,
> >> I am looking to perform a linear regression on a rather large amount
> >> of data in my hadoop cluster. It is part of my master's thesis at
> >> harvard university.
> >>
> >> After perusing the docs on the Mahout site, it seems like the
> >> following algorithms havent been implemented yet-
> >> Locally-Weighted Linear Regression
> >> Linear Regression
> >> Logistic Regression
> >>
> >> Basically, there is a stock market phenomenon which I'm trying to
> >> predict. It is called a short squeeze. I have about 16,000 data points
> >> - stocks and a point in time where the phenomenon has occurred. I'm
> >> trying to develop a predictive model in a hadoop cluster.
> >>
> >> The accuracy of the model doesn't matter much at this point, the goal
> >> and what would make my prof happy is to see the cluster grinding away,
> >> doing some relevant but perhaps not totally correct mathematical
> >> operations. Read: If its a linear regression i'll be happy, but if it
> >> isn't possible yet I dont mind.
> >>
> >> Can anyone suggest something I can use? I've downloaded Mahout 0.2 and
> >> searched through it, but nothing for performing regressions has jumped
> >> out at me.
> >> Thank you.
> >> Best,
> >> Rajat
> >>
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message