mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Problem with CNB implementation?
Date Sat, 05 Mar 2011 05:44:22 GMT
Robin (S),

As far as I know, your issue concerned the fact that results from the CNB in
Mahout used the convention than increasing score indicated decreasing
relevance.

The Mahout (Robin A, I will call him) Robin claimed that this was, in fact,
correct because the score from CNB really was just the relevance score for
the complementary class.  Thus, increasing score means that the document
being scored is more like the complementary class than the class being
scored.  This position is internally consistent at least and, since Robin A
wrote that code, there is some credibility to the thought that the Mahout
code really does work this way.

It sounds like your professor (who is nameless and thus I will annoint him
"the professor") feels that this score should be somehow inverted and less
relevance should mean lower score.

I am not at all clear about who is claiming what, so could your say whether
I have the right idea about what everybody is saying?

On Fri, Mar 4, 2011 at 8:56 PM, Robin M. E. Swezey <robin@toralab.org>wrote:

> Hello,
>
> My name is Robin Swezey.
>
> We have a paper accepted to an international conference, which
> mentions the use of Mahout and its Complementary Naive Bayes (CNB)
> algorithm.
>
> The deadline for submitting the final version of this paper is set to
> March 5, 23:59 PST (GMT - 8), which is today.
>
> Yet, I have reported what we believe is an issue with the CNB
> implementation:
>
> https://issues.apache.org/jira/browse/MAHOUT-605?focusedCommentId=13000838&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13000838
> (please look at this comment from 01/Mar/11 11:42, my previous ones
> aren't so clear)
>
> Basically, Mahout developers claim that weights decrease with class
> affinity (as in a real CNB), but my professor claims that this is not
> the case. So we conducted a test to prove this. The test is easy to
> make, so I suggest you conduct it as well in case you need to verify.
>
> The point is that we want to know if it is a _real_ CNB or not. We
> cannot really write false statements in a paper.
>
> Actually, we don't really need to solve the issue right now, just
> confirmation that there is one (or not).
> - If there is an issue, we can correct the paper and say that it was
> an improved NB (or whatever this is) instead of a real CNB.
> - If there is no issue, we will leave the paper as it is now.
>
> The source is quite difficult and complex to navigate, and honestly I
> don't want to write claims based on my sole understanding of it.
>
> Can I kindly ask for your help on this one?
>
> Btw, the paper relates to a nationwide governmental project for Japan,
> and could have a good impact for Mahout when published. We also intend
> on using Mahout in further papers.
>
> We really thank you for your work and efforts on the Mahout platform
> and hope to contribute to it as much as we can.
>
> Best regards,
> Robin S
>
> PS: We have a Mahout course/training in our lab, I will share the
> documentation as soon as I translate it (I originally wrote it in
> Japanese).
>
> --
> Robin M. E. Swezey <robin@toralab.org>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message