mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florent Empis <florent.em...@gmail.com>
Subject Re: Beginner questions on clustering & M/R
Date Fri, 16 Jul 2010 12:01:52 GMT
Hi,

First of all, let me stress I'm not actually trying to do quant analysis...
it's just for fun, not pratical use is expected, other than learning some
new stuff.

I also thought of using a transform from time to frequency (fourrier...) but
it was only a wild guess based on my limited knoweldge of electronics and
signal processing where the usual answer to a complex signal analysis is "do
a fourier transform, it will help" :)

What makes you think that Gabor would help? Because of phase shifting? I
would then basically be clustering my data by phase shifting, is that right
?

Thanks for your help!

Florent




2010/7/15 Ted Dunning <ted.dunning@gmail.com>

> Clustering of time series data is usually better done in an abstract
> relatively low dimensional coordinate space based on some transform like a
> locality sensitive frequency transform.  Gabor transforms might be
> appropriate.
>
> You might be able to get away with something like an SVD of your daily
> change data.
>
> On Thu, Jul 15, 2010 at 7:51 AM, Florent Empis <florent.empis@gmail.com
> >wrote:
>
> > Hi,
> >
> > I want to learn more on clustering techniques. I have skimmed through
> > Programming Collective Intelligence and Mahout in Action in the past but
> I
> > don't have them on hand at the moment... :(
> > I've seen Isabel Drost mail about test data on http://mldata.org/about/
> > I've had an idea of using
> http://mldata.org/repository/view/stockvalues/for
> > a pet project.
> > My idea is as follow: can we see a common behaviour between companies'
> > stock
> > value?
> > I would expect ending up with cluster of banking sector shares, utilities
> > share, media etc... and maybe some more unexpected cluster, who knows?
> >
> > My idea is basically:
> > 1°)Transform the dataset from values to daily variation as percentage
> > drop/raise (data is then normalized)
> > 2°)Apply clustering technique(s)
> >
> > The issue may seem silly but as I understand it, clustering happens in a
> 2
> > (or more) dimension space.
> > I know I have 2 dimensions: variation and time, but I can't wrap my head
> on
> > the problem...
> >
> > I *think* that the K-Means example does exactly what I intend to do my
> > second step, is this correct?
> > However, I can grasp what the 2 dimensional display represent exactly:
> what
> > are the x and y axis ?
> >
> > Added question: I am fairly new to the M/R paradigm, but let's say I
> would
> > like to do step 1 (data normalization) in a M/R fashion. Would the
> > following
> > be a good idea:
> > My data is a matrix of k stock values S in n intervals of time.
> > I call the first stock in the file, first and second period:
> > S1,t & S1,t+1 ...
> >
> > Map Step: input: ((S1,t ... S1,t+n),... ,(Sk,t ... Sk,t+n) )
> > output (( (S1,t;S1,t+1),...,(S1,t+n-1;S1,t+n)), ... ,(
> > (Sk,t;Sk,t+1),...,(Sk,t+n-1;Sk,t+n)) )
> > Reduce Step:
> > ( (%S1,t+1.....%S1,t+n), ...,(%S1,t+1.....%S1,t+n))
> >
> > I apologize for my beginner's questions but.... everyone has to start
> > somewhere :-)
> >
> > BR,
> >
> > Florent Empis
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message