mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Floris Devriendt <florisdevrie...@gmail.com>
Subject Re: "Binary" Data
Date Sat, 17 May 2014 13:04:43 GMT
Hello Ted Dunning,

First of all thank you for the response, I appreciate it.

Am I right if I say you are suggesting a combination of recommendation
systems and an an item-response analysis of the data?
You're right when saying my data isn't huge, so R could work as a tool. I'm
just a little bit confused on the topic still.

How exactly can I combine recommender systems with the item-response
analysis?
I'm just thinking out loud here, but do you mean I could determine the
users ability level (using R) and then search for similar users in the
user-user collaborative filtering technique?

The data I have is very limited. I have users and their given solutions to
exercises and whether or not they were successful. The exercises themselves
are all language cloze exercises. The idea was to use a CF technique to
determine similar users (determined by the similarity of users scores (1 =
correct; 0 = incorrect)) and then suggest exercises to users from which we
think the user will fail in the question (because similar users as him have
also failed there).

Your idea about splitting up my matrix into two matrices is interesting,
however I'm still thinking on what I can do with that. Is it true if I say
you're suggesting a more different approach, or is the item-response
analysis something I can use within the recommender system?

Kind regards,
Floris Devriendt


2014-05-17 1:33 GMT+02:00 Ted Dunning <ted.dunning@gmail.com>:

> The easiest way to shoehorn this data into the binary framework for
> recommenders is to keep two matrices, one for success, one for failure.
>
> There is lots to do from there.
>
> Most analyses of this kind of data (so-called item-response data [1]),
> however, requires some kind of hidden variable analysis beyond that
> available in Mahout.  The good news is that the data available in these
> kinds of problems is almost always relatively small (millions or tens of
> millions of observations is pretty rare).  This means that conventional
> tools like R are pretty easy to use [2,3,4].
>
> You could try using some of the matrix decomposition algorithms in Mahout
> on these data, but I really think that a more nuanced analysis would be
> better.
>
> [1]
>
> https://en.wikipedia.org/wiki/Item_response_theory#Three_parameter_logistic_model
>
> [2] http://cran.r-project.org/web/views/Psychometrics.html
>
> [3] http://cran.r-project.org/web/packages/mirt/mirt.pdf
>
> [4] http://cran.r-project.org/web/packages/ltm/ltm.pdf
>
>
> On Thu, May 15, 2014 at 8:53 AM, Floris Devriendt <
> florisdevriendt@gmail.com
> > wrote:
>
> > Hello everybody,
> >
> > I'm a new Mahout user and I was hoping to some people could point me in
> the
> > right direction.
> >
> > My data consists of exercise results made by different users and I want
> to
> > recommend different exercises to different users using the collaborative
> > filtering techniques available in Mahout. The idea is that the 'items' in
> > my data consists of the exercises and the relations between users and
> items
> > can take up three values:
> >
> >    - A user has correctly completed the exercise.
> >    - A user has incorrectly completed the exercise.
> >    - A user has not made an attempt at the exercise.
> >
> > In essence this data can be compared to like/dislike/unknown type of
> data.
> >
> > Now I know more or less how to build a recommender in Mahout but I'm
> having
> > some difficulties in designing it. A lot depends on the similarity
> measure
> > used, but most similarity measures take into account a rating style of
> > preferences (e.g. when rating movies or music). The exceptions, if I
> > interpret it correctly, are the Tanimoto Coefficient and the log
> likelihood
> > Similarity. But those similarities seem to focus on boolean data where a
> > user either has a relation with an item or there doesn't exist one.
> >
> > What are the key aspects to keep into account when working with this kind
> > of data (with three distinct values)? Does it all depend on my similarity
> > measure used? Or are there other aspects I need to take into account to
> > make the recommendations worthwhile for this kind of data?
> >
> > I also have some more questions on some of the similarity measures
> > implemented in Mahout, but I don't want to ask too much at once. If
> > somebody can guide me in the right direction with the above questions,
> then
> > this would be appreciated.
> >
> > Kind regards,
> > Floris Devriendt
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message