mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Mitchell <goodie...@gmail.com>
Subject Re: UUID based user IDs
Date Wed, 01 Aug 2012 20:40:21 GMT
Thanks Sean! That all makes sense. Would you mind recommended a
hashing function for this? Is there something in Mahout I could use?

- Matt

On Wed, Aug 1, 2012 at 4:34 PM, Sean Owen <srowen@gmail.com> wrote:
> Yep, just hash to a long, from UUID or String or whatever. The occasional
> collision does not cause a real problem. If you mix the tastes of two users
> or items once in a billion times, the overall results will hardly be
> different.
>
> You have to maintain the reverse mapping of course. Look at the IDMigrator
> class for a little help there.
>
> You can rewrite to use UUID or String, but believe me, it will be an
> immense amount of change and make things much slower. It used to work this
> way for recommenders in about 2006 and the Object overhead and GC pressure
> was by far the bottleneck. That's why it's all long now.
>
> On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <goodieboy@gmail.com> wrote:
>
>> Question about dealing with UUIDs as Mahout user IDs. I'm considering
>> ways to deal with these values:
>>
>> 1. use getLeastSignificantBits
>> 2. re-map to a database auto-increment number (this would take very
>> long time to do?)
>> 3. customize mahout so that it accepts UUIDs as user IDs
>>
>> Any feedback here? If I went with #3 (seems the safest) how would I do
>> this and, what are the consequences?
>>
>> The user count is in the millions.
>>
>> Thanks!
>>

Mime
View raw message