spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Spitzer <russell.spit...@gmail.com>
Subject Re: spark architecture question -- Pleas Read
Date Sat, 28 Jan 2017 07:22:06 GMT
You can treat Oracle as a JDBC source (
http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases)
and skip Sqoop, HiveTables and go straight to Queries. Then you can skip
hive on the way back out (see the same link) and write directly to Oracle.
I'll leave the performance questions for someone else.

On Fri, Jan 27, 2017 at 11:06 PM Sirisha Cheruvu <siri8123@gmail.com> wrote:

>
> On Sat, Jan 28, 2017 at 6:44 AM, Sirisha Cheruvu <siri8123@gmail.com>
> wrote:
>
> Hi Team,
>
> RIght now our existing flow is
>
> Oracle-->Sqoop --> Hive--> Hive Queries on Spark-sql (Hive
> Context)-->Destination Hive table -->sqoop export to Oracle
>
> Half of the Hive UDFS required is developed in Java UDF..
>
> SO Now I want to know if I run the native scala UDF's than runninng hive
> java udfs in spark-sql will there be any performance difference
>
>
> Can we skip the Sqoop Import and export part and
>
> Instead directly load data from oracle to spark and code Scala UDF's for
> transformations and export output data back to oracle?
>
> RIght now the architecture we are using is
>
> oracle-->Sqoop (Import)-->Hive Tables--> Hive Queries --> Spark-SQL-->
> Hive --> Oracle
> what would be optimal architecture to process data from oracle using spark
> ?? can i anyway better this process ?
>
>
>
>
> Regards,
> Sirisha
>
>
>

Mime
View raw message