mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Cooccurrence to align different categorization systems (many to many occurrence)
Date Fri, 16 Jul 2010 15:51:12 GMT
Lets clarify your situation. You are making recommendations or what?
Shouldn't have anything to do with Lucene per se. You do not need Hadoop for
recommendations if you don't want. ItemSimilarity is not related to Hadoop.
Yes you can define whatever notion of similarity that you like this way. Its
up to you not the framework really. But are you doing recommendations?

On Jul 16, 2010 2:01 PM, "Chantal Ackermann" <> wrote:
> Hi all,
> my goal is to align two slightly different categorization systems where
> each categorized item can have multiple categories in one of these
> systems.
> E.g.:
> Categorized item: "Harry Potter"
> Category system 1: Fiction, Fantasy, Children
> Category system 2: Youth, Fantasy
> The alignment would then produce a similarity between "Fantasy" (used in
> both systems) and "Children" (1) and Youth (2).
> I *think* ItemSimilarity is what I want but if anyone can provide me
> with the correct keywords for googling - that would be great.
> If a Lucene/SOLR index is more efficient as source than the lists I have
> I'm fine with setting that up. However, I am not sure how the schema
> would have to be structured? Would it use the categorized items as
> document entities - if not what then?
> Any pointers where to start would be very much appreciated! Also the
> information whether I need a full Hadoop installation or whether Mahout
> as checked out from trunk is sufficient. It is not very much data
> altogether (<10k categorized items).
> Thanks!
> Chantal

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message