spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "miroslav Balaz (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-17498) StringIndexer.setHandleInvalid sohuld have another option 'new'
Date Mon, 12 Sep 2016 13:46:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484129#comment-15484129
] 

miroslav Balaz edited comment on SPARK-17498 at 9/12/16 1:46 PM:
-----------------------------------------------------------------

No I meant, that it should return 3 and 3 for "d" and "e", it corresponds to mapping unseen
lables to one 'unknown' class. 

[~sowen] I see it like a problem that you have to ensure that training set contains all the
lables that also test set, the assumption is that it will perform poorly if it does not contain
the same labels but it would be good if it was possible to run it easily to see that. Or the
performance might be good anyway.


was (Author: mirob):
No I meant, that it should return 3 and 3 for "d" and "e", it corresponds to mapping unseen
lables to one 'unknown' class. 

> StringIndexer.setHandleInvalid sohuld have another option 'new'
> ---------------------------------------------------------------
>
>                 Key: SPARK-17498
>                 URL: https://issues.apache.org/jira/browse/SPARK-17498
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: Miroslav Balaz
>
> That will map unseen label to maximum known label +1, IndexToString would map that back
to "<undef>" or NA if there is something like that in spark,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message