spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hollin Wilkins <hol...@combust.ml>
Subject Re: Question about Multinomial LogisticRegression in spark mllib in spark 2.1.0
Date Wed, 01 Feb 2017 23:56:33 GMT
Hey Aseem,

If you are looking for a full-featured library to execute Spark ML
pipelines outside of Spark, take a look at MLeap:
https://github.com/combust/mleap

Not only does it support transforming single instances of a feature vector,
but you can execute your entire ML pipeline including feature extraction.

Cheers,
Hollin

On Wed, Feb 1, 2017 at 8:49 AM, Seth Hendrickson <
seth.hendrickson16@gmail.com> wrote:

> In Spark.ML the coefficients are not "pivoted" meaning that they do not
> set one of the coefficient sets equal to zero. You can read more about it
> here: https://en.wikipedia.org/wiki/Multinomial_logistic_
> regression#As_a_set_of_independent_binary_regressions
>
> You can translate your set of coefficients to a pivoted version by simply
> subtracting one of the sets of coefficients from all the others. That
> leaves the one you selected, the "pivot", as all zeros. You can then pass
> this into the mllib model, disregarding the "pivot" coefficients. The
> coefficients should be laid out like:
>
> [feature0_class0, feature1_class0, feature2_class0, intercept0,
> feature0_class1, ..., intercept1]
>
> So you have 9 coefficients and 3 intercepts, but you are going to get rid
> of one class's coefficients, leaving you with 6 coefficients and two
> intercepts - so a vector of length 8 for mllib's model.
>
> Note: if you use regularization then it is not exactly correct to convert
> from the non-pivoted version to the pivoted one, since the algorithms will
> give different results in those cases, though it is still possible to do it.
>
> On Wed, Feb 1, 2017 at 3:42 AM, Aseem Bansal <asmbansal2@gmail.com> wrote:
>
>> *What I want to do*
>> I have a trained a ml.classification.LogisticRegressionModel using spark
>> ml package.
>>
>> It has 3 features and 3 classes. So the generated model has coefficients
>> in (3, 3) matrix and intercepts in Vector of length (3) as expected.
>>
>> Now, I want to take these coefficients and convert this
>> ml.classification.LogisticRegressionModel model to an instance of
>> mllib.classification.LogisticRegressionModel model.
>>
>> *Why I want to do this*
>> Computational Speed as SPARK-10413 is still in progress and scheduled for
>> Spark 2.2 which is not yet released.
>>
>> *Why I think this is possible*
>> I checked https://spark.apache.org/docs/latest/mllib-linear-me
>> thods.html#logistic-regression and in that example a multinomial
>> Logistic Regression is trained. So as per this the class
>> mllib.classification.LogisticRegressionModel can encapsulate these
>> parameters.
>>
>> *Problem faced*
>> The only constructor in mllib.classification.LogisticRegressionModel
>> takes a single vector as coefficients and single double as intercept but I
>> have a Matrix of coefficients and Vector of intercepts respectively.
>>
>> I tried converting matrix to a vector by just taking the values (Guess
>> work) but got
>>
>> requirement failed: LogisticRegressionModel.load with numClasses = 3 and
>> numFeatures = 3 expected weights of length 6 (without intercept) or 8 (with
>> intercept), but was given weights of length 9
>>
>> So any ideas?
>>
>
>

Mime
View raw message