spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Save an RDD to a SQL Database
Date Thu, 07 Aug 2014 15:25:17 GMT
Maybe a little off topic, but would you mind to share your motivation of saving the RDD into
an SQL DB?

If you’re just trying to do further transformations/queries with SQL for convenience, then
you may just use Spark SQL directly within your Spark application without saving them into
DB:

  val sqlContext = new org.apache.spark.sql.SQLContext(sparkContext)
  import sqlContext._

  // First create a case class to describe your schema
  case class Record(fieldA: T1, fieldB: T2, …)

  // Transform RDD elements to Records and register it as a SQL table
  rdd.map(…).registerAsTable(“myTable”)

  // Torture them until they tell you the truth :)
  sql(“SELECT fieldA FROM myTable WHERE fieldB > 10”)

On Aug 6, 2014, at 11:29 AM, Vida Ha <vidaha@gmail.com> wrote:

> 
> Hi,
> 
> I would like to save an RDD to a SQL database.  It seems like this would be a common
enough use case.  Are there any built in libraries to do it?
> 
> Otherwise, I'm just planning on mapping my RDD, and having that call a method to write
to the database.   Given that a lot of records are going to be written, the code would need
to be smart and do a batch insert after enough records have collected.  Does that sound like
a reasonable approach?
> 
> 
> -Vida
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message