spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaonary Rabarisoa <jaon...@gmail.com>
Subject Re: Unable to save dataframe with UDT created with sqlContext.createDataFrame
Date Wed, 01 Apr 2015 07:57:42 GMT
Hmm, I got the same error with the master. Here is another test example
that fails. Here, I explicitly create
a Row RDD which corresponds to the use case I am in :









*object TestDataFrame {  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("TestDataFrame").setMaster("local[4]")
   val sc = new SparkContext(conf)    val sqlContext = new
SQLContext(sc)*
*    import sqlContext.implicits._*

















*    val data = Seq(LabeledPoint(1, Vectors.zeros(10)))    val dataDF
= sc.parallelize(data).toDF    dataDF.printSchema()
dataDF.save("test1.parquet") // OK    val dataRow = data.map {case
LabeledPoint(l: Double, f: mllib.linalg.Vector)=>      Row(l,f)    }
 val dataRowRDD = sc.parallelize(dataRow)    val dataDF2 =
sqlContext.createDataFrame(dataRowRDD, dataDF.schema)
dataDF2.printSchema()    dataDF2.saveAsParquetFile("test3.parquet") //
FAIL !!!  }}*


On Tue, Mar 31, 2015 at 11:18 PM, Xiangrui Meng <mengxr@gmail.com> wrote:

> I cannot reproduce this error on master, but I'm not aware of any
> recent bug fixes that are related. Could you build and try the current
> master? -Xiangrui
>
> On Tue, Mar 31, 2015 at 4:10 AM, Jaonary Rabarisoa <jaonary@gmail.com>
> wrote:
> > Hi all,
> >
> > DataFrame with an user defined type (here mllib.Vector) created with
> > sqlContex.createDataFrame can't be saved to parquet file and raise
> > ClassCastException: org.apache.spark.mllib.linalg.DenseVector cannot be
> cast
> > to org.apache.spark.sql.Row error.
> >
> > Here is an example of code to reproduce this error :
> >
> > object TestDataFrame {
> >
> >   def main(args: Array[String]): Unit = {
> >     //System.loadLibrary(Core.NATIVE_LIBRARY_NAME)
> >     val conf = new
> > SparkConf().setAppName("RankingEval").setMaster("local[8]")
> >       .set("spark.executor.memory", "6g")
> >
> >     val sc = new SparkContext(conf)
> >     val sqlContext = new SQLContext(sc)
> >
> >     import sqlContext.implicits._
> >
> >     val data = sc.parallelize(Seq(LabeledPoint(1, Vectors.zeros(10))))
> >     val dataDF = data.toDF
> >
> >     dataDF.save("test1.parquet")
> >
> >     val dataDF2 = sqlContext.createDataFrame(dataDF.rdd, dataDF.schema)
> >
> >     dataDF2.save("test2.parquet")
> >   }
> > }
> >
> >
> > Is this related to https://issues.apache.org/jira/browse/SPARK-5532 and
> how
> > can it be solved ?
> >
> >
> > Cheers,
> >
> >
> > Jao
>

Mime
View raw message