mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Email and Collab. Filtering
Date Thu, 01 Sep 2011 13:58:33 GMT
Assuming I've done my own translation (I followed Ted's piece), how do I get this into the
rest of the RecJob?  Right now, I have a NamedVector (the name is the id of the from email
address) and the cells are {0,1} for each message id (1 if that user has interacted with that
message id).  In looking at the RecommenderJob, it seems like I could skip the first couple
of phases, but it also seems like I need a DistributedRowMatrix as input for the next phase
(maybePruneAndTranspose).  Is my understanding correct?  I guess I need to convert my seq.
file of NamedVectors to the DistributedRowMatrix?


On Aug 31, 2011, at 11:55 AM, Sean Owen wrote:

> Yes, I'm suggesting that could at least be 80% of what you need. If you can
> generalize that bit further and refactor it, all the better.
> 
> I wouldn't bother necessarily extending to support the "user: item item
> item" syntax or else we'd get into supporting a lot of stuff. That
> conversion IMHO can be left to the caller.
> 
> On Wed, Aug 31, 2011 at 4:52 PM, Grant Ingersoll <gsingers@apache.org>wrote:
> 
>> 
>> On Aug 31, 2011, at 11:47 AM, Sean Owen wrote:
>> 
>>> No it still wants "user,item[,rating]" input. But otherwise yes, it's
>>> translated and un-translated internally as needed.
>>> 
>>> You could change the mapper to read that input easily though.
>>> 
>>> it still wants numeric input. It's hashing longs to ints. But this could
>>> easily be changed to record a more general mapping.
>> 
>> Ah, so I would still have to do the conversion, or hash on the string.
>> 
>>> 
>>> On Wed, Aug 31, 2011 at 4:44 PM, Grant Ingersoll <gsingers@apache.org
>>> wrote:
>>> 
>>>> 
>>>> On Aug 31, 2011, at 11:26 AM, Sean Owen wrote:
>>>> 
>>>>> Is the problem not just a matter of "translating" from the original
>>>>> identifiers to ints, so they can be used as offsets into a vector, and
>>>> then
>>>>> back again?
>>>> 
>>>> Yeah, I was wondering about that when looking at the RecommenderJob.
>>>> 
>>>> If I understand you right, I could just output lines of text as:
>>>> from: msgId1,  msgId3, ... msgIdn
>>>> ...
>>>> 
>>>> And the RecommenderJob would automatically do the translation?
>>>> 
>>>> 
>> 
>> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Eurocon 2011: http://www.lucene-eurocon.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message