samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maciej Grzenda (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SAMOA-68) Saving true and predicted labels to file
Date Mon, 03 Jul 2017 09:28:02 GMT

    [ https://issues.apache.org/jira/browse/SAMOA-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072168#comment-16072168
] 

Maciej Grzenda commented on SAMOA-68:
-------------------------------------

Let me refer ro the comment suggesting dropping Vote class and keeping the data contained
in the objects of this class in double[] table.

Class (for classification tasks) is reported in the prediction file that we create as a class
label (not as a double) to avoid confusion. Hence, it is a String. Moreover, Samoa internally
occasionally reports shorter table of votes than the number of classes (when remaining votes
are zero). Hence, keeping explicit named vote object with explicit class label/value of vote
it refers to seems to us to be more explicit and safer than relying on knowing that e.g. index
5 of the table is the value of votes for third class. From a wider perspective, now that Kafka
extension is prepared, similarly to saving accuracy of the methods, I believe both files will
not be created under extremely high load. Otherwise, in case high throughput is expected,
accuracy and prediction files should be produced in a stream manner (similarly to what e.g.
Spark does) i.e. as part* files. Hence, in the case of this code (and other border case problems
such as these) perhaps clarity to performance could be preferred. To sum up, we  suggest keeping
current solution based on Vote class (and not drop Vote class, which is what we understand
has been suggested). 

> Saving true and predicted labels to file
> ----------------------------------------
>
>                 Key: SAMOA-68
>                 URL: https://issues.apache.org/jira/browse/SAMOA-68
>             Project: SAMOA
>          Issue Type: New Feature
>          Components: SAMOA-API
>            Reporter: Maciej Grzenda
>              Labels: features
>
> Currently PrequentialEvaluation task supports dumpFile option.  With this option model
performance can be saved to a file. However, in some cases it would be good to save also individual
predictions made by a model.  This is useful for model debugging and method development.
> This could be also used to visualize model output, calculate custom performance indicators
(e.g. model accuracy for instances of a certain class or sharing the same feature value).
 Such saving of model output (if done) should be made for every instance. Hence, a new option
making it possible to dump predictions to a separate file seems justified.  For classification,
it should include votes made for individual classes, if available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message