spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ninad Shringarpure <ni...@cloudera.com>
Subject Fwd: jdbcRDD for data ingestion from RDBMS
Date Tue, 18 Oct 2016 02:24:47 GMT
Hi Team,

One of my client teams is trying to see if they can use Spark to source
data from RDBMS instead of Sqoop.  Data would be substantially large in the
order of billions of records.

I am not sure reading the documentations whether jdbcRDD by design is going
to be able to scale well for this amount of data. Plus some in-built
features provided in Sqoop like --direct might give better performance than
straight up jdbc.

My primary question to this group is if it is advisable to use jdbcRDD for
data sourcing and can we expect it to scale. Also performance wise how
would it compare to Sqoop.

Please let me know your thoughts and any pointers if anyone in the group
has already implemented it.

Thanks,
Ninad

Mime
View raw message