mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Email and Collab. Filtering
Date Thu, 01 Sep 2011 15:30:55 GMT

On Sep 1, 2011, at 10:04 AM, Sean Owen wrote:

> Your input needs to be CSV if you want to use it all as-is. But, it quickly
> creates vectors out of things, so really you can comment out the first
> mapper than creates user vectors, and just wire it to use yours instead. it
> should do all the rest from there.
> 

I could use the --startPhase functionality to skip the first two phases, right?

> On Thu, Sep 1, 2011 at 2:58 PM, Grant Ingersoll <gsingers@apache.org> wrote:
> 
>> Assuming I've done my own translation (I followed Ted's piece), how do I
>> get this into the rest of the RecJob?  Right now, I have a NamedVector (the
>> name is the id of the from email address) and the cells are {0,1} for each
>> message id (1 if that user has interacted with that message id).  In looking
>> at the RecommenderJob, it seems like I could skip the first couple of
>> phases, but it also seems like I need a DistributedRowMatrix as input for
>> the next phase (maybePruneAndTranspose).  Is my understanding correct?  I
>> guess I need to convert my seq. file of NamedVectors to the
>> DistributedRowMatrix?
>> 
>> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Eurocon 2011: http://www.lucene-eurocon.com


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message