mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: "Binary" Data
Date Fri, 16 May 2014 23:33:08 GMT
The easiest way to shoehorn this data into the binary framework for
recommenders is to keep two matrices, one for success, one for failure.

There is lots to do from there.

Most analyses of this kind of data (so-called item-response data [1]),
however, requires some kind of hidden variable analysis beyond that
available in Mahout.  The good news is that the data available in these
kinds of problems is almost always relatively small (millions or tens of
millions of observations is pretty rare).  This means that conventional
tools like R are pretty easy to use [2,3,4].

You could try using some of the matrix decomposition algorithms in Mahout
on these data, but I really think that a more nuanced analysis would be
better.

[1]
https://en.wikipedia.org/wiki/Item_response_theory#Three_parameter_logistic_model

[2] http://cran.r-project.org/web/views/Psychometrics.html

[3] http://cran.r-project.org/web/packages/mirt/mirt.pdf

[4] http://cran.r-project.org/web/packages/ltm/ltm.pdf


On Thu, May 15, 2014 at 8:53 AM, Floris Devriendt <florisdevriendt@gmail.com
> wrote:

> Hello everybody,
>
> I'm a new Mahout user and I was hoping to some people could point me in the
> right direction.
>
> My data consists of exercise results made by different users and I want to
> recommend different exercises to different users using the collaborative
> filtering techniques available in Mahout. The idea is that the 'items' in
> my data consists of the exercises and the relations between users and items
> can take up three values:
>
>    - A user has correctly completed the exercise.
>    - A user has incorrectly completed the exercise.
>    - A user has not made an attempt at the exercise.
>
> In essence this data can be compared to like/dislike/unknown type of data.
>
> Now I know more or less how to build a recommender in Mahout but I'm having
> some difficulties in designing it. A lot depends on the similarity measure
> used, but most similarity measures take into account a rating style of
> preferences (e.g. when rating movies or music). The exceptions, if I
> interpret it correctly, are the Tanimoto Coefficient and the log likelihood
> Similarity. But those similarities seem to focus on boolean data where a
> user either has a relation with an item or there doesn't exist one.
>
> What are the key aspects to keep into account when working with this kind
> of data (with three distinct values)? Does it all depend on my similarity
> measure used? Or are there other aspects I need to take into account to
> make the recommendations worthwhile for this kind of data?
>
> I also have some more questions on some of the similarity measures
> implemented in Mahout, but I don't want to ask too much at once. If
> somebody can guide me in the right direction with the above questions, then
> this would be appreciated.
>
> Kind regards,
> Floris Devriendt
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message