spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seth Hendrickson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-7159) Support multiclass logistic regression in spark.ml
Date Fri, 20 May 2016 22:18:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294372#comment-15294372
] 

Seth Hendrickson edited comment on SPARK-7159 at 5/20/16 10:17 PM:
-------------------------------------------------------------------

[~dbtsai][~josephkb] I'd like to take this one if it's still open. I have an implementation
that is functional except for some corner cases, and can have a PR submitted before too long.


One part of the design that needs to be discussed (as far as I can tell), is how to handle
passing the coefficients/intercepts to the model without breaking the API. If we were not
concerned about the API compatibility, I'd say the best way would be to make the intercept
a {{Vector}} and the coefficients a {{Vector}} (flattened) or a {{Matrix}}. I can't think
of a way that would be both easy to use and not break the API. With that in mind, another
option may be to stick with the same convention used in MLlib where the intercept/coefficients
follow the obvious convention for binary logistic regression, but in the case of multinomial
the intercept is always zero (meaningless), and the coefficients are a flattened {{Vector}}
with the intercepts baked in. This is not a user-friendly solution IMO, but it would not break
the API. Perhaps this has already been discussed? 

Thanks for your input!


was (Author: sethah):
[~dbtsai][~josephkb] I'd like to take this one if it's still open. I have an implementation
that is functional except for some corner cases, and can have a PR submitted before too long.


One part of the design that needs to be discussed (as far as I can tell), is how to handle
passing the coefficients/intercepts to the model without breaking the API. If we were not
concerned about the API compatibility, I'd say the best way would be to make the intercept
an {{Vector}} and the coefficients a {{Vector}} (flattened) or a {{Matrix}}. I can't think
of a way that would be both easy to use and not break the API. With that in mind, another
option may be to stick with the same convention used in MLlib where the intercept/coefficients
follow the obvious convention for binary logistic regression, but in the case of multinomial
the intercept is always zero (meaningless), and the coefficients are a flattened {{Vector}}
with the intercepts baked in. This is not a user-friendly solution IMO, but it would not break
the API. Perhaps this has already been discussed? 

Thanks for your input!

> Support multiclass logistic regression in spark.ml
> --------------------------------------------------
>
>                 Key: SPARK-7159
>                 URL: https://issues.apache.org/jira/browse/SPARK-7159
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: DB Tsai
>            Priority: Critical
>
> This should be implemented by checking the input DataFrame's label column for feature
metadata specifying the number of classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message