mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kris Jack <mrkrisj...@gmail.com>
Subject Re: Reading Vectors Created from a Lucene Index
Date Fri, 02 Jul 2010 09:42:54 GMT
Hi Drew,

That indeed causes the name to be emitted now.  With the change that I
suggested and your patch, I'm now getting the names of vectors, as provided
by the -idField, being output with the vectors themselves.

Thanks again,
Kris



2010/7/2 Drew Farris <drew.farris@gmail.com>

> Hi Kris,
>
> Could you try the code in the patch at:
> https://issues.apache.org/jira/secure/attachment/12448536/MAHOUT-402.patch
>
> This should cause VectorDumper to emit the names found in NamedVectors.
>
> Thanks,
> Drew
>
> On Thu, Jul 1, 2010 at 10:23 AM, Kris Jack <mrkrisjack@gmail.com> wrote:
>
> > Hi Grant,
> >
> > I applied the patch but still no luck.  In debugging, I found that in
> > LuceneIterable, line 129:
> >
> > <<
> >  result = result.normalize(normPower);
> > >>
> >
> > seems to make result, which was before a NamedVector, back into a Vector
> > and
> > causes the name to be lost.  If I change the code to allow the name to be
> > kept by replacing the line with:
> >
> > <<
> >  result = new NamedVector(result.normalize(normPower), name);
> > >>
> >
> > then the name is included and the result remains a NamedVector but the
> > VectorDumper code still just prints out Vectors and not NamedVectors.
> > Perhaps I am going back this wrong but shouldn't there be a check in the
> > VectorDumper to find out the type of vector being dumped?
> >
> > Thanks,
> > Kris
> >
> >
> >
> > 2010/6/30 Grant Ingersoll <gsingers@apache.org>
> >
> > > Kris,
> > >
> > > Can you try the patch at
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12448396/MAHOUT-379-lucene.patch
> > >
> > > Thanks,
> > > Grant
> > >
> > > On Jun 30, 2010, at 8:53 AM, Grant Ingersoll wrote:
> > >
> > > >
> > > > On Jun 30, 2010, at 8:39 AM, Grant Ingersoll wrote:
> > > >
> > > >>
> > > >> On Jun 29, 2010, at 1:54 PM, Kris Jack wrote:
> > > >>
> > > >>> Hi everyone,
> > > >>>
> > > >>> I have been using mahout to generate vectors from a lucene index
> > using:
> > > >>>
> > > >>> $MAHOUT_HOME/bin/mahout lucene.vector
> > > >>>
> > > >>> In doing so, mahout creates an output file that has new ids for
my
> > > >>> documents, that are completely unlike my original --idField, that
> is
> > a
> > > >>> string.  How can I relate the new ids to my original ids?  Is
there
> > is
> > > a
> > > >>> method that allows me to output the vectors with the original
> > --idField
> > > >>> values that appear in the lucene index rather than the new doc
ids?
> > > >>
> > > >>
> > > >> Hmm, it seems the --idField stuff has been commented out, likely
> with
> > > the change of labels.
> > > >>
> > > >
> > > > I've brought the issue up over on dev@, as it is a bug.
> > >
> > > --------------------------
> > > Grant Ingersoll
> > > http://www.lucidimagination.com/
> > >
> > > Search the Lucene ecosystem using Solr/Lucene:
> > > http://www.lucidimagination.com/search
> > >
> > >
> >
> >
> > --
> > Dr Kris Jack,
> > http://www.mendeley.com/profiles/kris-jack/
> >
>



-- 
Dr Kris Jack,
http://www.mendeley.com/profiles/kris-jack/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message