sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Robson <david.rob...@quest.com>
Subject Re: Review Request 50155: SQOOP-2983: OraOop export has degraded performance with wide tables
Date Wed, 20 Jul 2016 05:39:52 GMT


> On July 19, 2016, 5:06 a.m., David Robson wrote:
> > src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java, line 244
> > <https://reviews.apache.org/r/50155/diff/1/?file=1446151#file1446151line244>
> >
> >     Have you done extensive testing with all data types for this change? Originally
Sqoop didn't work too well with Oracle data types which is why there is code here to do different
things with bind variables based on the data type. Also this means there will now be a different
code path for update/merge export jobs compared to insert jobs so I think it would be best
to fix it in OraOopOutputFormatBase if you want to improve the performance then the new code
can be used for all job types.
> 
> Attila Szabo wrote:
>     Hi Dave,
>     
>     Thanks for you invaluable feedback. I've been also considering do the fix a level
above to have the same execution path for insert/update/merge, I was just not confident enough
if this change should affect those parts as well. As you've advised that too, let me provide
a new version of patch soon.
>     
>     On the types front:
>     Could you please give me a few concrete example which types caused problems in the
past. In that case I would be able to add a more serious testing around those once

OraOopOutputFormatBase.configurePreparedStatementColumns calls setBindValueAtName which has
the code related to this. This was written a long time ago now so perhaps Sqoop has been updated
since then to cope better with various data types and it might not be needed anymore (or could
be refactored to be much faster). The main ones are the timestamp related columns - Oracle
stores dates and timezones differently to Sqoop so we mapped these as a String which overcomes
most of the problems. Of course some users wanted to still map these as Timestamp so OraOop
has an option for this. There is also some code in there for binary floats and binary doubles
- not sure if it's still needed or not.
I guess the fact there is a fair bit of code in here that is called for every row that loops
through every column is not ideal, so if the simpler way works then that would be good.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/50155/#review142693
-----------------------------------------------------------


On July 18, 2016, 7:19 p.m., Attila Szabo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/50155/
> -----------------------------------------------------------
> 
> (Updated July 18, 2016, 7:19 p.m.)
> 
> 
> Review request for Sqoop, David Robson, Jarek Cecho, and Kathleen Ting.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Proposed changes for SQOOP-2983
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOracleQueries.java 82e4266 
>   src/java/org/apache/sqoop/manager/oracle/OraOopOutputFormatInsert.java d5eebf4 
> 
> Diff: https://reviews.apache.org/r/50155/diff/
> 
> 
> Testing
> -------
> 
> 800 columns with table
> 100.000 lines (156mb data)
> 1.000.000 lines (1.56 GB data)
> 3.000.000 lines (4.5 GB data)
> 
> 
> Thanks,
> 
> Attila Szabo
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message