mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Cooccurrence to align different categorization systems (many to many occurrence)
Date Mon, 19 Jul 2010 08:34:29 GMT
You have it right. The easiest way to deal with community 1 vs
community 2 is to pool all of the categories together into one data
model, but simply ignore most-similar categories from the wrong
category. That is you're computing similarity between a community 1
"user" and al community 2 "users" only.

On Mon, Jul 19, 2010 at 4:27 AM, Chantal Ackermann
<> wrote:
> Hi Sean,
> hi Ted,
> hi Sebastian,
> thanks a lot for all those detailed answers. I'll need some time to
> digest the technical details, I'm afraid. I find Sean's suggestion on
> thinking of categories as users and using the recommendation classes for
> the task the easiest to understand, right now.
> It's not completely the same situation, though. Or only if thinking of
> two user communities, and the recommendations presented to a user of
> Community 1 should be from Community 2.
> (@Sebastian)
> Each item is categorized in each of the systems but it's allowed that
> the item can have zero categories. There are a few hundred categories in
> each system.
> The data is in lists of the following structure:
> <ITEM (ID)> [List of categories System 1] [List of categories System 2]
> The approach I'll take:
> 1. normalize all the cateogory strings and give them unique number
> identifiers (unique across both systems, distinct ranges).
> 2. walk trough the list and per item: extract one category (= user) and
> create a BooleanPreference for that user and item pair.
> 3. for each category (System 1) request similar categories (=user
> similarity) that are from System 2. I probably have to request a mixed
> list (both systems) and filter out the ones from System 1.
> I'll keep you posted. If you have more tipps or things I should take
> into account - or if you think that this approach won't return any
> decent results I'm glad if you could share.
> Thanks!
> Chantal

View raw message