mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Mahout performance issues
Date Fri, 02 Dec 2011 11:10:04 GMT
On Fri, Dec 2, 2011 at 11:02 AM, Daniel Zohar <dissoman@gmail.com> wrote:

> Hi guys,
>
> @Sean, You are obviously right by saying that reducing the cap limit would
> yield better performance. However I believe it would yield worse accuracy.
> This is because the more items a user interacted with, the smaller is
> the percentage of the capped possible items relatively to the actual
> possible items.
>

That's right. I'm saying there must be a middle ground that works on both
counts, since it works fine at smaller scales, where you only have hundreds
of interactions per recommendation computation. So, if you tune it to use
100, for example, I imagine you get "good" recommendations and it should be
pretty fast, right?

I don't see why this isn't the solution.


>
> I just ran the fix I proposed earlier and I got great results! The query
> time was reduced to about a third for the 'heavy users'. Before it was 1-5
> secs and now it's 0.5-1.5. The best part is that the accuracy level should
> remain exactly the same. I also believe it should reduce memory
> consumption, as the GenericBooleanPrefDataModel.preferenceForItems gets
> significantly smaller (in my case at least).
>
> The fix is merely adding two lines of code to one of
> the GenericBooleanPrefDataModel constructors. See
> http://pastebin.com/K5PB68Et, the lines I added are #11, #22.
>

I don't think this works though, because you've deleted the one data point
you have for those users. They can't get recommendations now.

I can't figure out how that speeds up recommendations though, what am I
missing? these users aren't providing any more item-item interactions to
consider.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message