mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From k4200 <k4...@kazu.tv>
Subject Re: How to recommend users?
Date Fri, 15 Jun 2012 02:43:44 GMT
Hi,

Thank you for the answers. All my questions are solved now.
Also, thank you for the good book.

Regards,
Kaz

2012/6/14 Sean Owen <srowen@gmail.com>:
> To recommend users to users, you need some kind of user-user interaction
> data. You are right, you don't have that directly. But at first you
> described this as merely finding similar users. For that, you can use your
> data. You don't need a Recommender even. You just need any implementation
> of UserSimilarity. This can help you find which users are most similar --
> just loop over all of them and keep the top N. (There are classes like TopN
> that let you do this easily.)
>
> It's the same for items. DataModel has methods to get all user and item IDs.
>
> You can't use a JDBC data source directly unless you have very little data.
> It's way too slow. You will need to load it in memory with
> ReloadFromJDBCDataModel. VM settings are irrelevant in this regard.
>
> On Thu, Jun 14, 2012 at 2:24 PM, k4200 <k4200@kazu.tv> wrote:
>
>> Hi,
>>
>> I bought Mahout in Action several days ago and am now trying Mahout
>> out. I've also read two books about collective intelligence, so I
>> think I have some basic knowledge.
>>
>> Before going to my questions, here's my use case:
>> * I'm developing a web site that has
>>  - users, items and preferences in MySQL
>>  - item pages that both logged-in and non-logged-in users can view
>> * I'd like to
>>  - recommend items for logged-in users (#1)
>>  - recommend similar users based on preferences for logged-in users (#2)
>>  - show similar items on every item page (#3)
>>
>> #1 is the typical scenario that the book and other web pages cover, so
>> it shouldn't be a problem, and actually I wrote code that seems
>> working more or less shown below. Though, I have a question regarding
>> performance, which I'll write later.
>>
>> MySQLJDBCDataModel dataModel = new MySQLJDBCDataModel (dataSource, ....);
>> ItemSimilarity itemSimilarity = new
>> PearsonCorrelationSimilarity(dataModel);
>> Recommender recommender = new GenericItemBasedRecommender(dataModel,
>> itemSimilarity);
>> // Then, for each user, get recommendations and store them in DB
>>
>>
>> My first question is how to implement #2. Chapter 5 of the book seems
>> a bit similar, but the difference is that our users don't rate other
>> users (or profiles). I have no clue how to achieve this using Mahout,
>> so any hints/suggestions would be appreciated.
>>
>> The second question is about #3. Each item page needs to show similar
>> items, which I believe is a typical use case for many web sites. The
>> code above calculates item similarity so I'm thinking of storing the
>> data to DB. It seems like I need to call allSimilarItemIDs for each
>> item ID, but is there any way to get all the item IDs? Of course, I
>> could execute a query via JDBC, which would be a bit of hassle.
>>
>> The last question is regarding performance. I set the JDBC driver
>> options according to the javadoc shown below.
>>
>> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/impl/model/jdbc/MySQLJDBCDataModel.html
>>
>> I use test data of several thousands of preferences, so the
>> calculations should be fast, but it took more than 10 minutes. What
>> should I do to speed it up?
>>
>> Here's the code.
>>    MysqlDataSource dataSource = new MysqlDataSource();
>>        dataSource.setServerName("hostname");
>>        dataSource.setUser("user");
>>        dataSource.setPassword("pass");
>>        dataSource.setDatabaseName("mydb");
>>
>>        dataSource.setCachePreparedStatements(true);
>>        dataSource.setCachePrepStmts(true);
>>        dataSource.setCacheResultSetMetadata(true);
>>        dataSource.setAlwaysSendSetIsolation(false);
>>        dataSource.setElideSetAutoCommits(true);
>>
>> I follow the VM settings on a page on the Mahout site (or somewhere else).
>> -server -Xms1024m -Xmx1024m -da -dsa -XX:NewRatio=9 -XX:+UseParallelGC
>> -XX:+UseParallelOldGC -XX:-DisableExplicitGC
>>
>> Thank you,
>> Kaz
>>

Mime
View raw message