Hi Sourav,
1. In the GLMpredict.dml I could see 'means' is the output variable. In my
understanding it is same as the probability matrix u have mentioned in your
mail (to be used to compute the prediction). Am I right ?
Yes, that's correct.
2. From GLM.dml I get the 'betas' as output using
outputs.getBinaryBlockedRDD("beta_out"). The same I pass to GLMpredict.dml
as B.
Can you try this ?
// Get output from GLM
val beta = outputs.getBinaryBlockedRDD("beta_out")
val betaMC = outputs.getMatrixCharacteristics("beta_out") // This way you
don't have to worry about dimensions.
// 
val Xin = DataFrame/RDD of values (or even text/csv file) you want to
predict
// 
// Execute GLMpredict
ml.reset()
// Please read
https://github.com/apache/incubatorsystemml/blob/master/scripts/algorithms/GLM.dml
// dfam Int 1 Distribution family code: 1 = Power, 2 = Binomial
val cmdLineParamsPredict = Map("X" > " ", "B" > " ", "dfam" > "...") //
family of distribution ?
ml.registerInput("X", Xin)
ml.registerInput("B_full", beta, betaMC)
ml.registerOutput("means")
val outputsPredict = ml.execute
("/home/systemml0.9.0SNAPSHOT/algorithms/GLMpredict.dml",
cmdLineParamsPredict)
val prob = out.getBinaryBlockedRDD("means");
val probMC = out.getMatrixCharacteristics("means");
// 
// Get predicted label
ml.reset()
ml.registerInput("Prob",prob, probMC)
ml.registerOutput("Prediction")
val outputsLabels = = mlNew.executeScript("Prob = read(\"temp1\"); "
+ "Prediction = rowIndexMax(Prob); "
+ "write(Prediction, \"tempOut\", \"csv\")")
val pred = outputsLabels.getDF(sqlContext, "Prediction").withColumnRenamed
("C1", "prediction")
// 
3. Say I get back prediction matrix as an output (from predictions =
rowIndexMax(means);). Now can I read add that as a column to my original
data frame (the one from which I created the feature vector for the
original model) ? My concern is whether adding back will ensure the right
order so that teh key for the feature vector and the predicted value remain
same ? If not how to achieve the same ?
In above example 'pred' is a DataFrame with column 'ID' which provides the
row ID.
Thanks,
Niketan Pansare
IBM Almaden Research Center
Email: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=usnpansar
From: Sourav Mazumder <sourav.mazumder00@gmail.com>
To: dev@systemml.incubator.apache.org, Niketan
Pansare/Almaden/IBM@IBMUS
Date: 12/08/2015 10:53 PM
Subject: Re: Using GLMpredict
Hi Niketan,
Thanks again for the detailed inputs.
Some more follow up Qs 
Regards,
Sourav
On Tue, Dec 8, 2015 at 2:08 PM, Niketan Pansare <npansar@us.ibm.com> wrote:
