mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: PCA using Java Code
Date Wed, 03 Jul 2013 15:58:05 GMT
On Jul 3, 2013 6:56 AM, "Chirag Lakhani" <clakhani@zaloni.com> wrote:
>
> So how does the column mean get calculated if the --pcaOffset option is
not
By taking average of all row vectors. See code for details.
> specified?  I would think you are just doing SVD at that point.
This statement is incorrect. I know becuse i designed this code.
>
>
> On Tue, Jul 2, 2013 at 5:52 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
wrote:
>
> > On Tue, Jul 2, 2013 at 1:52 PM, Chirag Lakhani <clakhani@zaloni.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I am trying to use the Mahout/Java API to do PCA but I am confused
about
> > > the write order to do things.  To start, I have a list of DenseVectors
> > that
> > > I am reading into the code and turning it into a distributed matrix in
> > the
> > > following form.
> > >
> > >  DistributedRowMatrix m = new DistributedRowMatrix(input_vec,
> > matrix_path,
> > > num_rows,num_cols);
> > >
> > > When I run this code, I would have thought it would output the result
> > into
> > > the path called "matrix_path" so that I can then use something like
> > > MatrixColumnMeansJob.run
> > > to get mean. When I run this bit of code I get no output, is there
> > > something else I should do or is there a better way to calculate the
mean
> > > for my file.
> > >
> > >
> > > From what I understand about the SSVD CI code, you need to calculate
the
> > > column mean and then output it into a directory
> >
> > .
> >
> >
> > No, you don't have to (although you have an _option_ to calculate and
> > substitute one yourself if for some reason it is already known.) Default
> > use assumes it would calculate it for you.
> >
> >
> >
> > > Is there a good way to do
> > > this if I am starting from a file which is a sequence file of
> > DenseVectors?
> > >
> >
> > Yes. just don't specify --pcaOffset option.
> >
> >
> > >
> > > --
> > >
> > > *Chirag Lakhani*
> > >
> > > Data Scientist
> > >
> > > Zaloni, Inc. | www.zaloni.com
> > >
> > > 633 Davis Dr., Suite 200
> > >
> > > Durham, NC 27713
> > > e: clakhani@zaloni.com
> > > p: 919.602.4965 x7020
> > >
> >
>
>
>
> --
>
> *Chirag Lakhani*
>
> Data Scientist
>
> Zaloni, Inc. | www.zaloni.com
>
> 633 Davis Dr., Suite 200
>
> Durham, NC 27713
> e: clakhani@zaloni.com
> p: 919.602.4965 x7020

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message