There's nothing unusual about negative values from a l= inear regression. If, generally, your predicted values are far from your ac= tual values, then your model hasn't fit well. You may have a bug somewh= ere in your pipeline or you may have data without much linear relationship.= Most of this isn't a Spark problem.

On Mon, Mar 6, 2017 at 8:05 AM Manish Maheshwari <myloginid@gmail.com> wrote:
<= /div>
Hi= All,

We are using a=C2=A0LinearRegressionModel in Scala. We are using = a standard=C2=A0StandardScal= er=C2=A0to normalize the data before modelling.. the Code snippet lo= oks like this -=C2=A0

Modellng -=C2=A0=
val labeledPointsRD= D =3D tableRecords.map(row =3D>
{
val filtered =3D row.toSeq.filter({ case s: String = =3D> false case _ =3D> true })
val conv= erted =3D filtered.map({ case i: Int =3D> i.toDouble case l: Long =3D>= ; l.toDouble case d: Double =3D> d case _ =3D> 0.0 })
val features =3D Vectors.dense(converted.slice(1, converted.= length).toArray)
LabeledPoint(converted(0), f= eatures)
})
val= scaler1 =3D new StandardScaler().fit(labeledPointsRDD.map(x =3D> x.feat= ures))
save(sc, scalarModelOutputPath, scaler= 1)
val normalizedData =3D labeledPointsRDD.ma= p(lp =3D> {LabeledPoint(lp.label, scaler1.transform(lp.features))})
val splits =3D normalizedData.randomSplit(Array(0= .8, 0.2))
val trainingData =3D splits(0)
val testingData =3D splits(1)
trainingData.cache()
var regressio= n =3D new LinearRegressionWithSGD().setIntercept(true)
regression.optimizer.setStepSize(0.01)
val model =3D regression.run(trainingData)
model.save(sc, modelOutputPath)

Post that when we score t= he model on the same data that it was trained on using the below snippet we= see this -=C2=A0

Scoring -=C2=A0
val labeledPointsRDD = =3D tableRecords.map(row =3D>
{val filtered =3D row.toSeq.filter({ c= ase s: String =3D> false case _ =3D> true })
val converted =3D filtered.map({ case i: Int =3D> i.toDouble case = l: Long =3D> l.toDouble case d: Double =3D> d case _ =3D> 0.0 })
val features =3D Vectors.dense(converted.toArr= ay)
(row(0), features)=C2=A0
})
val normalizedData =3D la= beledPointsRDD.map(p =3D> (p._1, scaler1.transform(p._2)))
normalizedData.cache()