mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: ItemSimilarityJob Cooccurrence Question
Date Sat, 04 Jun 2011 21:28:36 GMT
Hi Derek,

this shouldn't be happening and we have unit tests explicitly checking 
that.

Which version do you use? Please be sure to use Mahout 0.5 or the 
current trunk. Could you provide sample data where you see this happening?

--sebastian

On 04.06.2011 23:21, djn wrote:
> Regarding ItemSimilarityJob, it is my understanding that if there are two
> input lines of the form&lt;user1, product1&gt; and&lt;user1, product2&gt;,
> then that would constitute a co-occurrence between product1 and product2.
>
> I've generated a large test dataset under this assumption, and it guarantees
> that there will only be co-occurrences between pairs of product IDs that
> I've predefined. I'm not using preference values and I'm setting
> --booleanData true.
>
> While the ItemSimilarityJob's output does include these predefined
> co-occurrences, it also outputs a large number of co-occurrences (with small
> co-occurrence counts) between products that are not co-occurring in the
> input dataset. Can anyone provide some insight as to why this might be
> happening?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/ItemSimilarityJob-Cooccurrence-Question-tp3024516p3024516.html
> Sent from the Mahout User List mailing list archive at Nabble.com.


Mime
View raw message