mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Recommendations from flat data
Date Tue, 05 May 2009 11:46:32 GMT
On Tue, May 5, 2009 at 11:56 AM, Paul Loy <> wrote:
> I made a subclass of MySQLJDBCDataModel and had to copy some of the code
> from both AbstractJDBCDataModel. I'm thinking there could be a better way to
> do this but would require a bit of a refactor. I might give it a go today if

You should just have to override buildUser() to return a
BooleanPrefUser instead.

> I have time. I'd also like to have the queries injected so we could make a
> pure JDBC Model and not require subclasses for each SQL database out there.
> I'll have a look at that too.

That's what AbstractJDBCDataModel should be about. The constructor
takes a bunch of SQL queries.

> it's taking about a minute to get a recommendation. I'm guessing with an
> index on user_id column it will be even quicker (also, if I'm not importing
> millions of rows in the background).

Oh my yes, you absolutely need indexes, on the user ID and item ID
column. The composite primary key should be both of these columns.

> The only issue I have with recommendations from the
> BooleanTanimotoCoefficientSimilarity is that there is no way to order the
> results as they all come out with a value of 1. So the least relevant item
> may come out at the top. So instead of using a recommender, what I do is get
> the items from the 20 nearest neighbours, remove from that list the items my

Yeah you are right, in this scenario a user-based recommender breaks
down somewhat since all prefs are the same, and recommendations are
based on weighted preferences, but that always comes out 1!

Instead try BooleanUserGenericUserBasedRecommender. The 'estimated
preferences' you get out are bogus in the sense that all prefs really
*should* be 1, if anything, but, instead you get a value which is the
sum of similarity to all users who express a pref for that item. It
should give a desirable ordering. Try it and see what happens.

The whole 'boolean user' thing is important but I am have trouble
thinking of how to efficiently piece it into the whole framework...
it's a lot of copy-and-paste and tortuously long class names now, but,
should work at least.

View raw message