mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <ssc.o...@googlemail.com>
Subject Re: Cooccurrence to align different categorization systems (many to many occurrence)
Date Mon, 19 Jul 2010 08:38:51 GMT
Hi Chantal,

I think you're taking the right approach.

I missed the line in your first mail, where you say you have only 10k
items, so you definitely won't need hadoop or the RowSimilarityJob.

--sebastian

Am 19.07.2010 10:27, schrieb Chantal Ackermann:
> Hi Sean,
> hi Ted,
> hi Sebastian,
>
> thanks a lot for all those detailed answers. I'll need some time to
> digest the technical details, I'm afraid. I find Sean's suggestion on
> thinking of categories as users and using the recommendation classes for
> the task the easiest to understand, right now.
>
> It's not completely the same situation, though. Or only if thinking of
> two user communities, and the recommendations presented to a user of
> Community 1 should be from Community 2.
>
> (@Sebastian)
> Each item is categorized in each of the systems but it's allowed that
> the item can have zero categories. There are a few hundred categories in
> each system.
>
> The data is in lists of the following structure:
> <ITEM (ID)> [List of categories System 1] [List of categories System 2]
>
> The approach I'll take:
> 1. normalize all the cateogory strings and give them unique number
> identifiers (unique across both systems, distinct ranges).
> 2. walk trough the list and per item: extract one category (= user) and
> create a BooleanPreference for that user and item pair.
> 3. for each category (System 1) request similar categories (=user
> similarity) that are from System 2. I probably have to request a mixed
> list (both systems) and filter out the ones from System 1.
>
> I'll keep you posted. If you have more tipps or things I should take
> into account - or if you think that this approach won't return any
> decent results I'm glad if you could share.
>
> Thanks!
> Chantal
>
>   



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message