mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Recommenders and DataModels
Date Wed, 06 Oct 2010 19:44:49 GMT
Since I have a synthetic predictor built-in to the DataModel, do I
need a Recommender?

On Wed, Oct 6, 2010 at 5:20 AM, Sean Owen <srowen@gmail.com> wrote:
> Interesting question. So the preferences are synthetic in some cases -- you
> have a pref for ever user-item combination? (Then what do you recommend? but
> I can imagine some answers.)
>
>
> By "not work well" do you mean performance or accuracy?
>
>
> For performance, yes, having very dense input will really slow down the
> pre-computation step, which is more or less linear in the size of the input.
> The resulting diffs table is usually dense-ish, since an entry exists any
> time two items co-occur; in this case it would be completely filled. This
> would also slow down things at runtime.
>
> This is all a symptom of having such dense data. One answer would be to
> 'prune' noise from your data (or generate less synthetic data, if I guess
> that right).
>
> Another answer is to prune the diffs table. The least interesting entries
> are those with highest standard deviation. You could hack the code to trim
> based on that to get better runtime performance.
>
>
> If you mean accuracy, then one guess is that the big assumption that
> slope-one makes for the input isn't valid for your data. Slope-one assumes
> that the ratings for item X and item Y are linearly related: Y = mX + b.
> Rather than spend time regressing to determine m and b for each pair, which
> would be hugely expensive, it makes the reasonable assumption that m=1 in
> all cases. So the problem is vastly simpler: computing the best b = Y-X,
> which is just the average difference across all X / Y prefs.
>
> That's a good assumption for most "normal" scenarios. But to the extent it's
> systematically not true of your data, this will fall apart. Since I am
> guessing much data is synthetic, that's why I wonder if there is some
> systematic incompatibility with this assumption.
>
>
> On Wed, Oct 6, 2010 at 5:37 AM, Lance Norskog <goksron@gmail.com> wrote:
>
>> I'm working with a DataModel that estimates preferences for all items
>> from any user. This seems to not work well with the SlopeOne
>> recommender. Are there tips&tricks for making recommenders work well
>> with this class of model? That is, the sample datamodels all seem to
>> explicitly store items and only return those prefs My model cheerfully
>> generates 1000 preferences if there are 1000 items.
>>
>> Thanks,
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message