spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject LabeledPoint dump LibSVM if SparseVector
Date Sun, 11 May 2014 16:40:10 GMT
Hi,

I need to change the toString on LabeledPoint to libsvm format so that I
can dump RDD[LabeledPoint] as a format that could be read by sparse
glmnet-R and other packages to benchmark mllib classification accuracy...

Basically I have to change the toString of LabeledPoint and toString of
SparseVector....

Should I add it as a PR or is it already being added ?

I added these functions toLibSvm in my internal util class for now...

def toLibSvm(labelPoint: LabeledPoint): String = {

    labelPoint.label.toString + " " +
toLibSvm(labelPoint.features.asInstanceOf[SparseVector])

  }

  def toLibSvm(features: SparseVector): String = {

    val indices = features.indices

    val values = features.values

    indices.zip(values).mkString("
").replace(',', ':').replace("(", "").replace(")","")

  }
Thanks.
Deb

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message