spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Girardot <ssab...@gmail.com>
Subject Re: RDD to Multiple Tables SparkSQL
Date Tue, 21 Oct 2014 08:38:08 GMT
If you already know your keys the best way would be to "extract"
one RDD per key (it would not bring the content back to the master and you
can take advantage of the caching features) and then execute a
registerTempTable by Key.

But I'm guessing, you don't know the keys in advance, and in this case, I
think it becomes a very confusing point to put everything in different
tables,
First of all - how would you query it afterwards ?

Regards,

Olivier.

2014-10-20 13:02 GMT+02:00 critikaled <isasmani.git@gmail.com>:

> Hi I have a rdd which I want to register as multiple tables based on key
>
> ................
> val context = new SparkContext(conf)
> val sqlContext = new org.apache.spark.sql.hive.HiveContext(context)
> import sqlContext.createSchemaRDD
>
> case class KV(key:String,id:String,value:String)
> val logsRDD = context.textFile("logs", 10).map{line=>
>   val Array(key,id,value) = line split ' '
>   (key,id,value)
> }.registerTempTable("KVS")
>
> I want to store the above information to multiple tables based on key
> without bringing the entire data to master
>
> Thanks in advance.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-Multiple-Tables-SparkSQL-tp16807.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message