I just used random numbers.

(My ML lib was spark-mllib_2.10-1.2.1)

Please see the attached log. In the middle of the log, I dumped the data set before feeding into LogisticRegressionWithLBFGS. The first column false/true was the label (attribute “a”), and columns 2-5 (attributes “x”, “y”, “z”, and “i”) were the features. The 6th column was just row ID and was not used.

The relationship was arbitrarily: a = (0.3 * x + 0.5 * y - 0.2 *z > 0.4)

After that you can find LBFGS was doing its job and then pumped out the error messages.

The model showed coefficients:

396.57624765427323, x

662.7969020937115, y

-259.0975519038385, z

12.352037503257826, i

-538.8516249699426, @a

The last one was the intercept. As you can see, the model seemed close enough.

After that I fed the same data back to the model to see how the predictions worked. (here attribute “a” was the prediction and “aa” was the original label) I only displayed 20 rows.

The error rate showed 2 errors out of 1000.

count(INTEGER), errorRate(DOUBLE), countDiff(INTEGER)

key=[], rows=1

1000, 0.0020000000949949026, 2

So, the algorithm worked, just spitting out the errors was kind of annoying. If this is not result affecting, maybe it should be warning or info.

C.J.