mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: "Binary" Data
Date Sat, 17 May 2014 17:20:51 GMT
Floris,

Given the size of the data you have and the goals that you have, I am not
convinced that recommendation is the right fit for your needs.

I would recommend using multi-dimensional response analysis and then define
distance between users in terms of the latent variables you get from that.
 You should be able to cluster directly in terms of those latent variables.

Also, for cloze exercises, I think you may be missing some important
information by only counting correct/incorrect.  The word that is filled
(if any) could be a huge hint if you know it.

My feeling is that since you need lots of algorithmic flexibility and since
your dataset fits R, that you really will not be well served by Mahout.
 The virtues of Mahout really only come out at very large scale and only
for particular problems.

Also, you have at this point pretty much exhausted my knowledge of
item-response theory.




On Sat, May 17, 2014 at 6:04 AM, Floris Devriendt <florisdevriendt@gmail.com
> wrote:

> Hello Ted Dunning,
>
> First of all thank you for the response, I appreciate it.
>
> Am I right if I say you are suggesting a combination of recommendation
> systems and an an item-response analysis of the data?
> You're right when saying my data isn't huge, so R could work as a tool. I'm
> just a little bit confused on the topic still.
>
> How exactly can I combine recommender systems with the item-response
> analysis?
> I'm just thinking out loud here, but do you mean I could determine the
> users ability level (using R) and then search for similar users in the
> user-user collaborative filtering technique?
>
> The data I have is very limited. I have users and their given solutions to
> exercises and whether or not they were successful. The exercises themselves
> are all language cloze exercises. The idea was to use a CF technique to
> determine similar users (determined by the similarity of users scores (1 =
> correct; 0 = incorrect)) and then suggest exercises to users from which we
> think the user will fail in the question (because similar users as him have
> also failed there).
>
> Your idea about splitting up my matrix into two matrices is interesting,
> however I'm still thinking on what I can do with that. Is it true if I say
> you're suggesting a more different approach, or is the item-response
> analysis something I can use within the recommender system?
>
> Kind regards,
> Floris Devriendt
>
>
> 2014-05-17 1:33 GMT+02:00 Ted Dunning <ted.dunning@gmail.com>:
>
> > The easiest way to shoehorn this data into the binary framework for
> > recommenders is to keep two matrices, one for success, one for failure.
> >
> > There is lots to do from there.
> >
> > Most analyses of this kind of data (so-called item-response data [1]),
> > however, requires some kind of hidden variable analysis beyond that
> > available in Mahout.  The good news is that the data available in these
> > kinds of problems is almost always relatively small (millions or tens of
> > millions of observations is pretty rare).  This means that conventional
> > tools like R are pretty easy to use [2,3,4].
> >
> > You could try using some of the matrix decomposition algorithms in Mahout
> > on these data, but I really think that a more nuanced analysis would be
> > better.
> >
> > [1]
> >
> >
> https://en.wikipedia.org/wiki/Item_response_theory#Three_parameter_logistic_model
> >
> > [2] http://cran.r-project.org/web/views/Psychometrics.html
> >
> > [3] http://cran.r-project.org/web/packages/mirt/mirt.pdf
> >
> > [4] http://cran.r-project.org/web/packages/ltm/ltm.pdf
> >
> >
> > On Thu, May 15, 2014 at 8:53 AM, Floris Devriendt <
> > florisdevriendt@gmail.com
> > > wrote:
> >
> > > Hello everybody,
> > >
> > > I'm a new Mahout user and I was hoping to some people could point me in
> > the
> > > right direction.
> > >
> > > My data consists of exercise results made by different users and I want
> > to
> > > recommend different exercises to different users using the
> collaborative
> > > filtering techniques available in Mahout. The idea is that the 'items'
> in
> > > my data consists of the exercises and the relations between users and
> > items
> > > can take up three values:
> > >
> > >    - A user has correctly completed the exercise.
> > >    - A user has incorrectly completed the exercise.
> > >    - A user has not made an attempt at the exercise.
> > >
> > > In essence this data can be compared to like/dislike/unknown type of
> > data.
> > >
> > > Now I know more or less how to build a recommender in Mahout but I'm
> > having
> > > some difficulties in designing it. A lot depends on the similarity
> > measure
> > > used, but most similarity measures take into account a rating style of
> > > preferences (e.g. when rating movies or music). The exceptions, if I
> > > interpret it correctly, are the Tanimoto Coefficient and the log
> > likelihood
> > > Similarity. But those similarities seem to focus on boolean data where
> a
> > > user either has a relation with an item or there doesn't exist one.
> > >
> > > What are the key aspects to keep into account when working with this
> kind
> > > of data (with three distinct values)? Does it all depend on my
> similarity
> > > measure used? Or are there other aspects I need to take into account to
> > > make the recommendations worthwhile for this kind of data?
> > >
> > > I also have some more questions on some of the similarity measures
> > > implemented in Mahout, but I don't want to ask too much at once. If
> > > somebody can guide me in the right direction with the above questions,
> > then
> > > this would be appreciated.
> > >
> > > Kind regards,
> > > Floris Devriendt
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message