mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: ItemSimilarityJob creates no output
Date Wed, 06 Jun 2012 17:10:45 GMT
Hmm... that's what I am thinking.. something is a miss!  A few lines from
the files are pasted above.  The pattern is fairly similar.  Is there a
place where I can upload part of my file for someone else to try?

OR BETTER YET - Can someone provide a small file that always returns a few
similarities?  Does a file such as this included in the source?

Thanks for the help.

On Wed, Jun 6, 2012 at 9:01 AM, Sean Owen <srowen@gmail.com> wrote:

> That sounds like plenty of data -- doubting that's any issue. Is it
> very sparse? Meaning many items exist just for one user? It's really
> sparseness that might produce few or no similarities.
>
> I think something else is at work here but don't know off the top of
> my head based on the info so far.
>
> Yes it is always the same hash function -- top 8 bytes of the MD5
> hash. Same input means same output.
>
> Sean
>
> On Wed, Jun 6, 2012 at 4:57 PM, Something Something
> <mailinglists19@gmail.com> wrote:
> > The input size was about 6 Million so I was expecting to find some
> > similarities.  Anyway, I have started a test with the real dataset that
> > contains 700 million lines.  We shall see how that goes.  One quick
> > question, though:
> >
> > I am using MemoryIDMigrator to convert UserIds from String to Long as
> > follows:
> >
> >    static UpdatableIDMigrator migrator = new MemoryIDMigrator();
> > <some code omitted here...>
> >    migrator.toLongID(strUserID);
> >
> > Question:  If I pass the same userId multiple times to this method, I am
> > guaranteed to get the same 'Long' number back, correct?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message