mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: ItemSimilarityJob
Date Mon, 04 Jun 2012 22:29:21 GMT
That's how it used to work but it was restricted to integers a long time
ago purely for speed and memory. It makes a big difference. Many (most?)
use cases have some numeric ID for these guys already.  Otherwise no reason
it needs to be an integer it just needs to have an ordering.

You can retain the mapping how you like. All you really need are the
original ID values to recreate the mapping as it is just bases on MD5. So a
file is sufficient for example. But to do the mapping on the fly it has to
be in memory yes or else it is too slow.

Best is to find a numeric ID to use in your model if you can.

Myrrix works this way too, if desired, but almost as a feature as the
'real' IDs need never be sent into the hosted recommender in the cloud,
just a hashed numeric ID. That's nice from a security or privacy
standpoint.
 On Jun 4, 2012 11:05 PM, "Something Something" <mailinglists19@gmail.com>
wrote:

> Hmm.. that's a bit weird.  Looking at the algorithm, I don't understand why
> UserID has to be Long.  It's just an Identifier of a row, isn't it?  The
> algorithm really only works with Item IDs and even with ItemIDs I would
> argue they don't need to be Numeric.  Am I missing something?
>
> We have over billion user ids.  So for each ID I need to create a
> corresponding 'long' value in Memory?  Is that what this class is doing?
>
> On Mon, Jun 4, 2012 at 2:50 PM, Manuel Blechschmidt <
> Manuel.Blechschmidt@gmx.de> wrote:
>
> > Hi Something,
> > actually this is correct.
> >
> > You can use the MemoryIDMigrator
> >
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/MemoryIDMigrator.htmltocreate
Longs from your strings.
> >
> > /Manuel
> >
> > On 04.06.2012, at 23:47, Something Something wrote:
> >
> > > Trying to use this class.  Noticed that 'UserID' must be Long.  That
> > > doesn't sound right.  Isn't there a way to tell this class that the
> > > 'UserID' is String?  Please let me know.  Thanks.
> >
> > --
> > Manuel Blechschmidt
> > M.Sc. IT Systems Engineering
> > Dortustr. 57
> > 14467 Potsdam
> > Mobil: 0173/6322621
> > Twitter: http://twitter.com/Manuel_B
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message