mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Clustering boolean vectors
Date Mon, 09 May 2011 21:02:14 GMT
GroupLens doesn't *require* a rating per se -- you are free to ignore
it if you want!

Boolean data is all 1, in Mahout. There are no 0 ratings. If you just
mean that the non-existent preferences are "0", OK. But having two
ratings, 0 and 1, along with the possibility of not existing, is three
states, not two.

You can easily have a DataModel, if you have the GroupLens data.
Convert it to CSV, or just use the GroupLensDataModel in examples/.

But, to really answer your question: first you should define what you
are trying to do. Then we can help decide how to do it. I don't know
if you need clustering or not so far.


On Mon, May 9, 2011 at 8:38 PM, mail2abin <> wrote:
> Hi,
> I was trying to run ItemBasedRecommender on GroupLens movie sample data,
> which requires the rating ( user preferences inp). But suppose I do not have
> the rating ( user prefereces) , rather I have an
> Item boolean attribute vector. [ like God father - 0|1|0|0|0|0|1 ] , where
> the two 1's may say Crime, Drama.
> ItemBasedRecommender requires a DataModel, which I do not have. Instead I
> think I should use some Clustering techniques based on the Item boolean
> attribute vector, as I understand and later get items which belongs to the
> cluster.
> Please give pointers to the right Clustering API ( though I have see
> TanimotoCluster etc.), not sure if they are good for boolean vectors.
> Abin
> Software Developer
> NY
> --
> View this message in context:
> Sent from the Mahout User List mailing list archive at

View raw message