spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS
Date Tue, 17 Mar 2015 18:48:40 GMT
Please check this section in the user guide:
http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection

You need `import sqlContext.implicits._` to use `toDF()`.

-Xiangrui

On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jkatukuri@apple.com> wrote:
> Hi Xiangrui,
> Thanks a lot for the quick reply.
>
> I am still facing an issue.
>
> I have tried the code snippet that you have suggested:
>
>  val ratings = purchase.map { line =>
>  line.split(',') match { case Array(user, item, rate) =>
>  (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate”)}
>
> for this, I got the below error:
>
> error: ';' expected but '.' found.
> [INFO] }.toDF("user", "item", "rate”)}
> [INFO]  ^
>
> when I tried below code
>
>  val ratings = purchase.map ( line =>
>     line.split(',') match { case Array(user, item, rate) =>
>     (user.toInt, item.toInt, rate.toFloat)
>     }).toDF("user", "item", "rate")
>
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
> Float)]
> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
> [INFO]     }).toDF("user", "item", "rate")
>
>
>
> I have looked at the document that you have shared and tried the following
> code:
>
> case class Record(user: Int, item: Int, rate:Double)
> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>
> for this, I got the below error:
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>
>
> Appreciate your help !
>
> Thanks,
> Jay
>
>
> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
>
> Try this:
>
> val ratings = purchase.map { line =>
>  line.split(',') match { case Array(user, item, rate) =>
>  (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate")
>
> Doc for DataFrames:
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jkatukuri@apple.com> wrote:
>
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
> def train(
>      ratings: RDD[Rating],
>      rank: Int,
>      iterations: Int,
>      lambda: Double,
>      blocks: Int,
>      seed: Long
>    ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
> "val sc = new SparkContext(conf)
>
>     val pfile = args(0)
>     val purchase=sc.textFile(pfile)
>    val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>        Rating(user.toInt, item.toInt, rate.toInt)
>    })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>       .setRank(rank)
>      .setRegParam(regParam)
>      .setImplicitPrefs(implicitPrefs)
>      .setNumUserBlocks(numUserBlocks)
>      .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
>     val pfile = args(0)
>     val purchase=sc.textFile(pfile)
>    val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>        Rating(user.toInt, item.toInt, rate.toInt)
>    })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message