spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject RE: Prediction using Classification with text attributes in Apache Spark MLLib
Date Wed, 25 Jun 2014 16:07:40 GMT
Libsvm dataset converters are data dependent since your input data can be
in any serialization format and not necessarily csv...

We have flows that coverts hdfs data to libsvm/sparse vector rdd which is
sent to mllib....

I am not sure if it will be easy to standardize libsvm converter on data
that can be on hdfs,hbase, cassandra or solr....but of course libsvm,
netflix format, csv are standard for algorithms and mllib supports all 3...
 On Jun 25, 2014 6:00 AM, "Ulanov, Alexander" <alexander.ulanov@hp.com>
wrote:

> Hi Imk,
>
> I am not aware of any classifier in MLLib that accept nominal type of
> data. They do accept RDD of LabeledPoints, which are label + vector of
> Double. So, you'll need to convert nominal to double.
>
> Best regards, Alexander
>
> -----Original Message-----
> From: lmk [mailto:lakshmi.muralikrishnan@gmail.com]
> Sent: Wednesday, June 25, 2014 1:27 PM
> To: user@spark.incubator.apache.org
> Subject: RE: Prediction using Classification with text attributes in
> Apache Spark MLLib
>
> Hi Alexander,
> Just one more question on a related note. Should I be following the same
> procedure even if my data is nominal (categorical), but having a lot of
> combinations? (In Weka I used to have it as nominal data)
>
> Regards,
> -lmk
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166p8249.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message