mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: ItemSimilarityJob creates no output
Date Wed, 06 Jun 2012 17:20:17 GMT
Just make, say, a completely dense fake data set over 1000 users and items.
Something will come out.
On Jun 6, 2012 6:11 PM, "Something Something" <mailinglists19@gmail.com>
wrote:

> Hmm... that's what I am thinking.. something is a miss!  A few lines from
> the files are pasted above.  The pattern is fairly similar.  Is there a
> place where I can upload part of my file for someone else to try?
>
> OR BETTER YET - Can someone provide a small file that always returns a few
> similarities?  Does a file such as this included in the source?
>
> Thanks for the help.
>
> On Wed, Jun 6, 2012 at 9:01 AM, Sean Owen <srowen@gmail.com> wrote:
>
> > That sounds like plenty of data -- doubting that's any issue. Is it
> > very sparse? Meaning many items exist just for one user? It's really
> > sparseness that might produce few or no similarities.
> >
> > I think something else is at work here but don't know off the top of
> > my head based on the info so far.
> >
> > Yes it is always the same hash function -- top 8 bytes of the MD5
> > hash. Same input means same output.
> >
> > Sean
> >
> > On Wed, Jun 6, 2012 at 4:57 PM, Something Something
> > <mailinglists19@gmail.com> wrote:
> > > The input size was about 6 Million so I was expecting to find some
> > > similarities.  Anyway, I have started a test with the real dataset that
> > > contains 700 million lines.  We shall see how that goes.  One quick
> > > question, though:
> > >
> > > I am using MemoryIDMigrator to convert UserIds from String to Long as
> > > follows:
> > >
> > >    static UpdatableIDMigrator migrator = new MemoryIDMigrator();
> > > <some code omitted here...>
> > >    migrator.toLongID(strUserID);
> > >
> > > Question:  If I pass the same userId multiple times to this method, I
> am
> > > guaranteed to get the same 'Long' number back, correct?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message