spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seth Hendrickson <seth.hendrickso...@gmail.com>
Subject Re: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0
Date Wed, 01 Feb 2017 16:49:26 GMT
In Spark.ML the coefficients are not "pivoted" meaning that they do not set
one of the coefficient sets equal to zero. You can read more about it here:
https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_set_of_independent_binary_regressions

You can translate your set of coefficients to a pivoted version by simply
subtracting one of the sets of coefficients from all the others. That
leaves the one you selected, the "pivot", as all zeros. You can then pass
this into the mllib model, disregarding the "pivot" coefficients. The
coefficients should be laid out like:

[feature0_class0, feature1_class0, feature2_class0, intercept0,
feature0_class1, ..., intercept1]

So you have 9 coefficients and 3 intercepts, but you are going to get rid
of one class's coefficients, leaving you with 6 coefficients and two
intercepts - so a vector of length 8 for mllib's model.

Note: if you use regularization then it is not exactly correct to convert
from the non-pivoted version to the pivoted one, since the algorithms will
give different results in those cases, though it is still possible to do it.

On Wed, Feb 1, 2017 at 3:42 AM, Aseem Bansal <asmbansal2@gmail.com> wrote:

> *What I want to do*
> I have a trained a ml.classification.LogisticRegressionModel using spark
> ml package.
>
> It has 3 features and 3 classes. So the generated model has coefficients
> in (3, 3) matrix and intercepts in Vector of length (3) as expected.
>
> Now, I want to take these coefficients and convert this ml.classification.LogisticRegressionModel
> model to an instance of mllib.classification.LogisticRegressionModel
> model.
>
> *Why I want to do this*
> Computational Speed as SPARK-10413 is still in progress and scheduled for
> Spark 2.2 which is not yet released.
>
> *Why I think this is possible*
> I checked https://spark.apache.org/docs/latest/mllib-linear-
> methods.html#logistic-regression and in that example a multinomial
> Logistic Regression is trained. So as per this the class
> mllib.classification.LogisticRegressionModel can encapsulate these
> parameters.
>
> *Problem faced*
> The only constructor in mllib.classification.LogisticRegressionModel
> takes a single vector as coefficients and single double as intercept but I
> have a Matrix of coefficients and Vector of intercepts respectively.
>
> I tried converting matrix to a vector by just taking the values (Guess
> work) but got
>
> requirement failed: LogisticRegressionModel.load with numClasses = 3 and
> numFeatures = 3 expected weights of length 6 (without intercept) or 8 (with
> intercept), but was given weights of length 9
>
> So any ideas?
>

Mime
View raw message