mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: UUID based user IDs
Date Wed, 01 Aug 2012 20:34:39 GMT
Yep, just hash to a long, from UUID or String or whatever. The occasional
collision does not cause a real problem. If you mix the tastes of two users
or items once in a billion times, the overall results will hardly be
different.

You have to maintain the reverse mapping of course. Look at the IDMigrator
class for a little help there.

You can rewrite to use UUID or String, but believe me, it will be an
immense amount of change and make things much slower. It used to work this
way for recommenders in about 2006 and the Object overhead and GC pressure
was by far the bottleneck. That's why it's all long now.

On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <goodieboy@gmail.com> wrote:

> Question about dealing with UUIDs as Mahout user IDs. I'm considering
> ways to deal with these values:
>
> 1. use getLeastSignificantBits
> 2. re-map to a database auto-increment number (this would take very
> long time to do?)
> 3. customize mahout so that it accepts UUIDs as user IDs
>
> Any feedback here? If I went with #3 (seems the safest) how would I do
> this and, what are the consequences?
>
> The user count is in the millions.
>
> Thanks!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message