Nice to hear that your experiment is consistent to my assumption. The
current L1/L2 will penalize the intercept as well which is not idea.
I'm working on GLMNET in Spark using OWLQN, and I can exactly get the
same solution as R but with scalability in # of rows and columns. Stay
tuned!
Sincerely,
DB Tsai

My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Mon, Sep 29, 2014 at 11:45 AM, Yanbo Liang <yanbohappy@gmail.com> wrote:
> Thank you for all your patient response.
>
> I can conclude that if the data is totally separable or overfit occurs,
> weights may be different.
> And it also consistent with my experiment.
>
> I have evaluate two different dataset and the result as followed:
> Loss function: LogisticGradient
> Regularizer: L2
> regParam: 1.0
> numIterations: 10000 (SGD)
>
> Dataset 1: spark1.1.0/data/mllib/sample_binary_classification_data.txt
> # of classes: 2
> # of samples: 100
> # of features: 692
> areaUnderROC of both SGD and LBFGS can reach nearly 1.0
> Loss function of both optimization method converge nearly
> 1.7147811767900675E5 (very very small)
> Weights of each optimization method is different but looks like multiple
> relationship (not very strict) just as what DB Tsai mention above. It might
> be the dataset is totally separable.
>
> Dataset 2:
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#german.numer
> # of classes: 2
> # of samples: 1000
> # of features: 24
> areaUnderROC of both SGD and LBFGS both are nearly 0.8
> Loss function of both optimization method converge nearly 0.5367041390107519
> Weights of each optimization method is just the same.
>
>
>
> 20140929 16:05 GMT+08:00 DB Tsai <dbtsai@dbtsai.com>:
>>
>> Can you check the loss of both LBFGS and SGD implementation? One
>> reason maybe SGD doesn't converge well and you can see that by
>> comparing both loglikelihoods. One other potential reason maybe the
>> label of your training data is totally separable, so you can always
>> increase the loglikelihood by multiply a constant to the weights.
>>
>> Sincerely,
>>
>> DB Tsai
>> 
>> My Blog: https://www.dbtsai.com
>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>
>>
>> On Sun, Sep 28, 2014 at 11:48 AM, Yanbo Liang <yanbohappy@gmail.com>
>> wrote:
>> > Hi
>> >
>> > We have used LogisticRegression with two different optimization method
>> > SGD
>> > and LBFGS in MLlib.
>> > With the same dataset and the same training and test split, but get
>> > different weights vector.
>> >
>> > For example, we use
>> > spark1.1.0/data/mllib/sample_binary_classification_data.txt as our
>> > training
>> > and test dataset.
>> > With LogisticRegressionWithSGD and LogisticRegressionWithLBFGS as
>> > training
>> > method and the same other parameters.
>> >
>> > The precisions of these two methods almost near 100% and AUCs are also
>> > near
>> > 1.0.
>> > As far as I know, the convex optimization problem will converge to the
>> > global minimum value. (We use SGD with mini batch fraction as 1.0)
>> > But I got two different weights vector? Is this expectation or make
>> > sense?
>
>

To unsubscribe, email: devunsubscribe@spark.apache.org
For additional commands, email: devhelp@spark.apache.org
