spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Spark JDBC reads
Date Tue, 07 Mar 2017 11:19:09 GMT
Can you provide some source code? I am not sure I understood the problem .
If you want to do a preprocessing at the JDBC datasource then you can write your own data
source. Additionally you may want to modify the sql statement to extract the data in the right
format and push some preprocessing to the database.

> On 7 Mar 2017, at 12:04, El-Hassan Wanas <elhassan.wanas@gmail.com> wrote:
> 
> Hello,
> 
> There is, as usual, a big table lying on some JDBC data source. I am doing some data
processing on that data from Spark, however, in order to speed up my analysis, I use reduced
encodings and minimize the general size of the data before processing.
> 
> Spark has been doing a great job at generating the proper workflows that do that preprocessing
for me, but it seems to generate those workflows for execution on the Spark Cluster. The issue
with that is the large transfer cost is still incurred.
> 
> Is there any way to force Spark to run the preprocessing on the JDBC data source and
get the prepared output DataFrame instead?
> 
> Thanks,
> 
> Wanas
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message