sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Wein <mcsm...@mcsmurf.de>
Subject Use of SQL transactions on import
Date Sat, 31 Mar 2018 08:09:06 GMT
Hi all,
I'm currently researching if Sqoop is the right tool for our task 
(basically setting up an RDBMS-to-RDMBS ETL system with Sqoop, HDFS, 
Spark and Oozie). So far it looks very promising :-) now I wonder one 
thing: Does the "sqoop import-all-tables" command use a single SQL 
transaction to fetch all the tables from the database or is this not 
done because of (I guess) performance reasons? As then it could not run 
a parallel import? The potential problem I see is that there might be 
changes on the database tables while reading the data. If Sqoop reads 
the tables one-by-one without a transaction, it might get different 
"states" of data, right (like for one table it gets what transaction t1 
has committed and for the next table it gets what transaction t2 has 
committed)? This is what worries me a bit in our case.



View raw message