spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Is Spark suited for this use case?
Date Mon, 16 Oct 2017 06:38:37 GMT
Hi,

What is the motivation behind your question? Save costs?

You seem to be happy with the functional/non-functional requirements. So the only thing that
it could be is cost or need for innovation in the future.

Best regards

> On 16. Oct 2017, at 06:32, van den Heever, Christian CC <Christian.vandenHeever@standardbank.co.za>
wrote:
> 
> Hi,
>  
> We basically have the same scenario but worldwide as we have bigger Datasets we use OGG
à local à Sqoop Into Hadoop.
> By all means you can have spark reading the oracle tables and then do some changes to
data in need which will not be done on scoop qry. Ie fraudulent detection on transaction records.
>  
> But some time the simplest way is the best. Unless you need a change or need more then
I would advise not using another hop.
> I would rather move away from files as OGG can do files and direct table loading then
sqoop for the rest.
>  
> Simpler is better.
>  
> Hope this helps.
> C.
>  
> From: Saravanan Thirumalai [mailto:saravanan.thirumalai@gmail.com] 
> Sent: Monday, 16 October 2017 4:29 AM
> To: user@spark.apache.org
> Subject: Is Spark suited for this use case?
>  
> We are an Investment firm and have a MDM platform in oracle at a vendor location and
use Oracle Golden Gate to replicat data to our data center for reporting needs. 
> Our data is not big data (total size 6 TB including 2 TB of archive data). Moreover our
data doesn't get updated often, nightly once (around 50 MB) and some correction transactions
during the day (<10 MB). We don't have external users and hence data doesn't grow real-time
like e-commerce.
>  
> When we replicate data from source to target, we transfer data through files. So, if
there are DML operations (corrections) during day time on a source table, the corresponding
file would have probably 100 lines of table data that needs to be loaded into the target database.
Due to low volume of data we designed this through Informatica and this works in less than
2-5 minutes. Can Spark be used in this case or would it be an overkill of technology use?
>  
>  
>  
> 
> 
> Standard Bank email disclaimer and confidentiality note
> Please go to www.standardbank.co.za/site/homepage/emaildisclaimer.html to read our email
disclaimer and confidentiality note. Kindly email disclaimer@standardbank.co.za (no content
or subject line necessary) if you cannot view that page and we will email our email disclaimer
and confidentiality note to you.
> 
> 

Mime
View raw message