spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haoming Zhang <haoming.zh...@outlook.com>
Subject RE: SparkSQL with sequence file RDDs
Date Tue, 08 Jul 2014 00:46:11 GMT
Hi Gray,

Like Michael mentioned, you need to take care of the scala case classes or java beans, because
SparkSQL need the schema.

Currently we are trying insert our data to HBase with Scala 2.10.4 and Spark 1.0. 

All the data are tables. We created one case class for each rows, which means the parameter
number of case class should as the same as the column number. But Scala 2.10.4 has a limitation
that is the max parameter number for case class is 22. So here the problem occurs. If the
table is small, and the column number less than 22, everything will be fine. But if we got
a larger table with more than 22 columns, then error will be reported.

We know Scala 2.11 has remove the limitation of parameter number, but Spark 1.0 is not compatible
with it. So now we are considering use java beans instead of Scala case classes.

Best,
Haoming



From: michael@databricks.com
Date: Mon, 7 Jul 2014 17:12:42 -0700
Subject: Re: SparkSQL with sequence file RDDs
To: user@spark.apache.org

I haven't heard any reports of this yet, but I don't see any reason why it wouldn't work.
You'll need to manually convert the objects that come out of the sequence file into something
where SparkSQL can detect the schema (i.e. scala case classes or java beans) before you can
register the RDD as a table.


If you run into any issues please let me know.

On Mon, Jul 7, 2014 at 12:36 PM, Gary Malouf <malouf.gary@gmail.com> wrote:


Has anyone reported issues using SparkSQL with sequence files (all of our data is in this
format within HDFS)?  We are considering whether to burn the time upgrading to Spark 1.0 from
0.9 now and this is a main decision point for us.  



 		 	   		  
Mime
View raw message