Hi Niketan,
Thanks again for the detailed inputs.
Some more follow up Qs -
1. In the GLM-predict.dml I could see 'means' is the output variable. In my
understanding it is same as the probability matrix u have mentioned in your
mail (to be used to compute the prediction). Am I right ?
2. From GLM.dml I get the 'betas' as output using
outputs.getBinaryBlockedRDD("beta_out"). The same I pass to GLM-predict.dml
as B. For registering B following statements are used
val beta = outputs.getBinaryBlockedRDD("beta_out")
ml.registerInput("B", beta, 1, 4) // I have four feature vectors so I get 4
coefficients
However, when I execute GLM-predict.dml I get following error.
val outputs =
ml.execute("/home/system-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml",
cmdLineParams)
15/12/09 05:32:47 WARN Expression: Metadata file: .mtd not provided
15/12/09 05:32:47 ERROR Expression: ERROR:
/home/system-ml-0.9.0-SNAPSHOT/algori
thms/GLM-predict.dml -- line 117, column 8 -- Missing or incomplete dimensio
n information in read statement: .mtd
com.ibm.bi.dml.parser.LanguageException: Invalid Parameters : ERROR:
/home/syste
m-ml-0.9.0-SNAPSHOT/algorithms/GLM-predict.dml -- line 117, column 8 -- Miss
ing or incomplete dimension information in read statement: .mtd
In line 117 we have following statement : X = read (fileX);
3. Say I get back prediction matrix as an output (from predictions =
rowIndexMax(means);). Now can I read add that as a column to my original
data frame (the one from which I created the feature vector for the
original model) ? My concern is whether adding back will ensure the right
order so that teh key for the feature vector and the predicted value remain
same ? If not how to achieve the same ?
Regards,
Sourav
On Tue, Dec 8, 2015 at 2:08 PM, Niketan Pansare wrote:
> Hi Sourav,
>
> For some reason, I didn't get your email on "*Tue, 08 Dec 2015 12:56:38
> -0800*
> "
> (which I noticed in the archive).
>
> >> Not sure how exactly I can modify the GLM-predict.dml to get some
> prediction to start with.
> There are two options here:
> 1. Modify GLM-predict.dml as suggested by Shirish (better approach with
> respect to the SystemML optimizer) or
>
> 2. Run a new script on the output of GLM-predict. Please see:
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/api/ml/LogisticRegressionModel.java#L163
> If you chose to go with option 2, you might also want to read the
> documentation of following two built-in functions:
> a. rowIndexMax (See
> http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-andor-scalar-comparison-built-in-functions
>
> )
> b. ppred
>
> >> Can you give me some idea how from here I can calculate the predicted
> value of the label using some value of probability threshold ?
> Very simple way to predict the label given probability matrix:
> Prediction = rowIndexMax(Prob) # predicts the label with highest
> probability. This assumes one-based labels.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Shirish Tatikonda ---12/08/2015 12:49:47
> PM---Hi Sourav, Yes, GLM-predict.dml gives out only the prob]Shirish
> Tatikonda ---12/08/2015 12:49:47 PM---Hi Sourav, Yes, GLM-predict.dml gives
> out only the probabilities. You can put a
>
> From: Shirish Tatikonda
> To: dev@systemml.incubator.apache.org
> Date: 12/08/2015 12:49 PM
> Subject: Re: Using GLM-predict
> ------------------------------
>
>
>
> Hi Sourav,
>
> Yes, GLM-predict.dml gives out only the probabilities. You can put a
> threshold on the resulting probabilities to get the actual class labels --
> for example, prob > 0.5 is positive and <=0.5 as negative.
>
> The exact value of threshold typically depends on the data and the
> application. Different thresholds yield different classifiers with
> different performance (precision, recall, etc.). You can find the best
> threshold for the given data set by finding a value that gives the desired
> classifier performance (for example, a threshold that gives roughly equal
> precision and recall). Such an optimization is obviously done during the
> training phase using a held out test set.
>
> If you wish, you can also modify the DML script to perform this entire
> process.
>
> Shirish
>
>
> On Tue, Dec 8, 2015 at 12:23 PM, Sourav Mazumder <
> sourav.mazumder00@gmail.com> wrote:
>
> > Hi,
> >
> > I have used GLM.dml to create a model using some sample data. It returns
> to
> > me the matrix of Beta, B.
> >
> > Now I want to use this matrix of Beta on a new set of data points and
> > generate predicted value of the dependent variable/observation.
> >
> > When I checked GLM-predict, I could see that one can pass feature vector
> > for the new data set and also the matrix of beta.
> >
> > But I could not see any way to get the predicted value of the dependent
> > variable/observation. The output parameter only supports matrix of
> > predicted means/probabilities.
> >
> > Is there a way one can get the predicted value of the dependent
> > variable/observation from GLM-predict ?
> >
> > Regards,
> > Sourav
> >
>
>
>