mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Mitchell <goodie...@gmail.com>
Subject Re: UUID based user IDs
Date Thu, 02 Aug 2012 02:40:40 GMT
Thanks Manuel, that's very helpful. So you're saying I can just use
MemoryIDMigrator, even after my preferences have bee created with UUID
values? Or, should I create my preferences using the MemoryIDMigrator?

- Matt


On Wed, Aug 1, 2012 at 8:49 PM, Manuel Blechschmidt
<Manuel.Blechschmidt@gmx.de> wrote:
> Hello Matt,
>
> On 01.08.2012, at 22:40, Matt Mitchell wrote:
>
>> Thanks Sean! That all makes sense. Would you mind recommended a
>> hashing function for this? Is there something in Mahout I could use?
>
> The following class uses an string to long mapping based on a MemoryIDMigrator:
>
> https://github.com/ManuelB/facebook-recommender-demo/blob/master/src/main/java/de/apaxo/bedcon/FacebookRecommender.java
>
> Internally mahout uses parts of the md5 hashes. Which can be fir example directly expressed
in SQL:
>
> cast(conv(substring(md5([column name]), 1, 16),16,10) as signed)
>
> Javadoc can be found here:
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/model/IDMigrator.html
>
> /Manuel
>
>>
>> - Matt
>>
>> On Wed, Aug 1, 2012 at 4:34 PM, Sean Owen <srowen@gmail.com> wrote:
>>> Yep, just hash to a long, from UUID or String or whatever. The occasional
>>> collision does not cause a real problem. If you mix the tastes of two users
>>> or items once in a billion times, the overall results will hardly be
>>> different.
>>>
>>> You have to maintain the reverse mapping of course. Look at the IDMigrator
>>> class for a little help there.
>>>
>>> You can rewrite to use UUID or String, but believe me, it will be an
>>> immense amount of change and make things much slower. It used to work this
>>> way for recommenders in about 2006 and the Object overhead and GC pressure
>>> was by far the bottleneck. That's why it's all long now.
>>>
>>> On Wed, Aug 1, 2012 at 9:29 PM, Matt Mitchell <goodieboy@gmail.com> wrote:
>>>
>>>> Question about dealing with UUIDs as Mahout user IDs. I'm considering
>>>> ways to deal with these values:
>>>>
>>>> 1. use getLeastSignificantBits
>>>> 2. re-map to a database auto-increment number (this would take very
>>>> long time to do?)
>>>> 3. customize mahout so that it accepts UUIDs as user IDs
>>>>
>>>> Any feedback here? If I went with #3 (seems the safest) how would I do
>>>> this and, what are the consequences?
>>>>
>>>> The user count is in the millions.
>>>>
>>>> Thanks!
>>>>
>
> --
> Manuel Blechschmidt
> M.Sc. IT Systems Engineering
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>

Mime
View raw message