spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Szymkiewicz <mszymkiew...@gmail.com>
Subject Re: [MLLIB] RankingMetrics.precisionAt
Date Wed, 07 Dec 2016 00:02:33 GMT
This sounds much better.

Follow up question is if we should provide MAP@k, which I believe is
wider used metric.


On 12/06/2016 09:52 PM, Sean Owen wrote:
> As I understand, this might best be called "mean precision@k", not
> "mean average precision, up to k".
>
> On Tue, Dec 6, 2016 at 9:43 PM Maciej Szymkiewicz
> <mszymkiewicz@gmail.com <mailto:mszymkiewicz@gmail.com>> wrote:
>
>     Thank you Sean.
>
>     Maybe I am just confused about the language. When I read that it
>     returns "the average precision at the first k ranking positions" I
>     somehow expect there will ap@k there and a the final output would
>     be MAP@k not average precision at the k-th position.
>
>     I guess it is not enough sleep.
>
>     On 12/06/2016 02:45 AM, Sean Owen wrote:
>>     I read it again and that looks like it implements mean
>>     precision@k as I would expect. What is the issue?
>>
>>     On Tue, Dec 6, 2016, 07:30 Maciej Szymkiewicz
>>     <mszymkiewicz@gmail.com <mailto:mszymkiewicz@gmail.com>> wrote:
>>
>>         Hi,
>>
>>         Could I ask fora fresh pair of eyes on this piece of code:
>>
>>         https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala#L59-L80
>>
>>           @Since("1.2.0")
>>           def precisionAt(k: Int): Double = {
>>             require(k > 0, "ranking position k should be positive")
>>             predictionAndLabels.map { case (pred, lab) =>
>>               val labSet = lab.toSet
>>
>>               if (labSet.nonEmpty) {
>>                 val n = math.min(pred.length, k)
>>                 var i = 0
>>                 var cnt = 0
>>                 while (i < n) {
>>                   if (labSet.contains(pred(i))) {
>>                     cnt += 1
>>                   }
>>                   i += 1
>>                 }
>>                 cnt.toDouble / k
>>               } else {
>>                 logWarning("Empty ground truth set, check input data")
>>                 0.0
>>               }
>>             }.mean()
>>           }
>>
>>
>>         Am I the only one who thinks this doesn't do what it claims?
>>         Just for reference:
>>
>>           * https://web.archive.org/web/20120415101144/http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf
>>           * https://github.com/benhamner/Metrics/blob/master/Python/ml_metrics/average_precision.py
>>
>>         -- 
>>         Best,
>>         Maciej
>>
>
>     -- 
>     Maciej Szymkiewicz
>

-- 
Maciej Szymkiewicz


Mime
View raw message