mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Farris <drew.far...@gmail.com>
Subject Re: Reading Vectors Created from a Lucene Index
Date Fri, 02 Jul 2010 03:05:50 GMT
Hi Kris,

Could you try the code in the patch at:
https://issues.apache.org/jira/secure/attachment/12448536/MAHOUT-402.patch

This should cause VectorDumper to emit the names found in NamedVectors.

Thanks,
Drew

On Thu, Jul 1, 2010 at 10:23 AM, Kris Jack <mrkrisjack@gmail.com> wrote:

> Hi Grant,
>
> I applied the patch but still no luck.  In debugging, I found that in
> LuceneIterable, line 129:
>
> <<
>  result = result.normalize(normPower);
> >>
>
> seems to make result, which was before a NamedVector, back into a Vector
> and
> causes the name to be lost.  If I change the code to allow the name to be
> kept by replacing the line with:
>
> <<
>  result = new NamedVector(result.normalize(normPower), name);
> >>
>
> then the name is included and the result remains a NamedVector but the
> VectorDumper code still just prints out Vectors and not NamedVectors.
> Perhaps I am going back this wrong but shouldn't there be a check in the
> VectorDumper to find out the type of vector being dumped?
>
> Thanks,
> Kris
>
>
>
> 2010/6/30 Grant Ingersoll <gsingers@apache.org>
>
> > Kris,
> >
> > Can you try the patch at
> >
> https://issues.apache.org/jira/secure/attachment/12448396/MAHOUT-379-lucene.patch
> >
> > Thanks,
> > Grant
> >
> > On Jun 30, 2010, at 8:53 AM, Grant Ingersoll wrote:
> >
> > >
> > > On Jun 30, 2010, at 8:39 AM, Grant Ingersoll wrote:
> > >
> > >>
> > >> On Jun 29, 2010, at 1:54 PM, Kris Jack wrote:
> > >>
> > >>> Hi everyone,
> > >>>
> > >>> I have been using mahout to generate vectors from a lucene index
> using:
> > >>>
> > >>> $MAHOUT_HOME/bin/mahout lucene.vector
> > >>>
> > >>> In doing so, mahout creates an output file that has new ids for my
> > >>> documents, that are completely unlike my original --idField, that is
> a
> > >>> string.  How can I relate the new ids to my original ids?  Is there
> is
> > a
> > >>> method that allows me to output the vectors with the original
> --idField
> > >>> values that appear in the lucene index rather than the new doc ids?
> > >>
> > >>
> > >> Hmm, it seems the --idField stuff has been commented out, likely with
> > the change of labels.
> > >>
> > >
> > > I've brought the issue up over on dev@, as it is a bug.
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem using Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
>
>
> --
> Dr Kris Jack,
> http://www.mendeley.com/profiles/kris-jack/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message