mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Something Something <mailinglist...@gmail.com>
Subject Re: ItemSimilarityJob
Date Mon, 04 Jun 2012 23:00:38 GMT
Fair enough.  Just one more question:

1)  >>it just needs to have an ordering
The input data doesn't need to be in any particular sequence, correct?  Not
sure what you mean by 'needs to have an ordering'.


On Mon, Jun 4, 2012 at 3:29 PM, Sean Owen <srowen@gmail.com> wrote:

> That's how it used to work but it was restricted to integers a long time
> ago purely for speed and memory. It makes a big difference. Many (most?)
> use cases have some numeric ID for these guys already.  Otherwise no reason
> it needs to be an integer it just needs to have an ordering.
>
> You can retain the mapping how you like. All you really need are the
> original ID values to recreate the mapping as it is just bases on MD5. So a
> file is sufficient for example. But to do the mapping on the fly it has to
> be in memory yes or else it is too slow.
>
> Best is to find a numeric ID to use in your model if you can.
>
> Myrrix works this way too, if desired, but almost as a feature as the
> 'real' IDs need never be sent into the hosted recommender in the cloud,
> just a hashed numeric ID. That's nice from a security or privacy
> standpoint.
>  On Jun 4, 2012 11:05 PM, "Something Something" <mailinglists19@gmail.com>
> wrote:
>
> > Hmm.. that's a bit weird.  Looking at the algorithm, I don't understand
> why
> > UserID has to be Long.  It's just an Identifier of a row, isn't it?  The
> > algorithm really only works with Item IDs and even with ItemIDs I would
> > argue they don't need to be Numeric.  Am I missing something?
> >
> > We have over billion user ids.  So for each ID I need to create a
> > corresponding 'long' value in Memory?  Is that what this class is doing?
> >
> > On Mon, Jun 4, 2012 at 2:50 PM, Manuel Blechschmidt <
> > Manuel.Blechschmidt@gmx.de> wrote:
> >
> > > Hi Something,
> > > actually this is correct.
> > >
> > > You can use the MemoryIDMigrator
> > >
> >
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/MemoryIDMigrator.htmltocreateLongs
from your strings.
> > >
> > > /Manuel
> > >
> > > On 04.06.2012, at 23:47, Something Something wrote:
> > >
> > > > Trying to use this class.  Noticed that 'UserID' must be Long.  That
> > > > doesn't sound right.  Isn't there a way to tell this class that the
> > > > 'UserID' is String?  Please let me know.  Thanks.
> > >
> > > --
> > > Manuel Blechschmidt
> > > M.Sc. IT Systems Engineering
> > > Dortustr. 57
> > > 14467 Potsdam
> > > Mobil: 0173/6322621
> > > Twitter: http://twitter.com/Manuel_B
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message