I just used random numbers.

(My ML lib was spark-mllib_2.10-1.2.1)

Please see the attached log.  In the middle of the log, I dumped the data set before feeding into LogisticRegressionWithLBFGS.  The first column false/true was the label (attribute “a”), and columns 2-5 (attributes “x”, “y”, “z”, and “i”) were the features.  The 6th column was just row ID and was not used.

The relationship was arbitrarily: a = (0.3 * x + 0.5 * y - 0.2 *z > 0.4)

After that you can find LBFGS was doing its job and then pumped out the error messages.

The model showed coefficients:

396.57624765427323, x
662.7969020937115, y
-259.0975519038385, z
12.352037503257826, i
-538.8516249699426, @a

The last one was the intercept.  As you can see, the model seemed close enough. 

After that I fed the same data back to the model to see how the predictions worked.   (here attribute “a” was the prediction and “aa” was the original label)  I only displayed 20 rows.

The error rate showed 2 errors out of 1000.

count(INTEGER), errorRate(DOUBLE), countDiff(INTEGER)
key=[], rows=1
1000, 0.0020000000949949026, 2

So, the algorithm worked, just spitting out the errors was kind of annoying.  If this is not result affecting, maybe it should be warning or info.

C.J.