spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Naik <sachin.u.n...@gmail.com>
Subject Re: spark architecture question -- Pleas Read
Date Sat, 28 Jan 2017 17:08:12 GMT
I strongly agree with Jorn and Russell. There are different solutions for data movement depending
upon your needs frequency, bi-directional drivers. workflow, handling duplicate records. This
is a space is known as " Change Data Capture - CDC" for short. If you need more information,
I would be happy to chat with you.  I built some products in this space that extensively used
connection pooling over ODBC/JDBC. 

Happy to chat if you need more information. 

-Sachin Naik

>>Hard to tell. Can you give more insights >>on what you try to achieve and what
the data is about?
>>For example, depending on your use case sqoop can make sense or not.
Sent from my iPhone

> On Jan 27, 2017, at 11:22 PM, Russell Spitzer <russell.spitzer@gmail.com> wrote:
> 
> You can treat Oracle as a JDBC source (http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases)
and skip Sqoop, HiveTables and go straight to Queries. Then you can skip hive on the way back
out (see the same link) and write directly to Oracle. I'll leave the performance questions
for someone else. 
> 
>> On Fri, Jan 27, 2017 at 11:06 PM Sirisha Cheruvu <siri8123@gmail.com> wrote:
>> 
>> On Sat, Jan 28, 2017 at 6:44 AM, Sirisha Cheruvu <siri8123@gmail.com> wrote:
>> Hi Team,
>> 
>> RIght now our existing flow is
>> 
>> Oracle-->Sqoop --> Hive--> Hive Queries on Spark-sql (Hive Context)-->Destination
Hive table -->sqoop export to Oracle
>> 
>> Half of the Hive UDFS required is developed in Java UDF..
>> 
>> SO Now I want to know if I run the native scala UDF's than runninng hive java udfs
in spark-sql will there be any performance difference
>> 
>> 
>> Can we skip the Sqoop Import and export part and 
>> 
>> Instead directly load data from oracle to spark and code Scala UDF's for transformations
and export output data back to oracle?
>> 
>> RIght now the architecture we are using is
>> 
>> oracle-->Sqoop (Import)-->Hive Tables--> Hive Queries --> Spark-SQL-->
Hive --> Oracle 
>> what would be optimal architecture to process data from oracle using spark ?? can
i anyway better this process ?
>> 
>> 
>> 
>> 
>> Regards,
>> Sirisha 
>> 

Mime
View raw message