sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Szabo <asz...@cloudera.com>
Subject Review Request 47108: Proposed changes for SQOOP-2920
Date Mon, 09 May 2016 07:14:41 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/47108/
-----------------------------------------------------------

Review request for Sqoop.


Repository: sqoop-trunk


Description
-------

With the current implementation of ClassWriter the generated table ORM classes contains a
setField which is built around long if statemetns (having a single branch for every private
field). Altough this concept works perfectly for small/midsize (regarding to the number of
columns) tables, in case of wide ones (>>500 column) it causes a relevant performance
degradation (and thus making export much slower than should be, as seen in the JIRA task).
Attached I provide a proposed solution to avoid it. According to my own measurements this
solution is 250x faster than the current one. (Tested with 800 field wide table ORMs 20000,100000,1m,5m
rows).

Please review it and share your thoughts!


Diffs
-----

  src/java/org/apache/sqoop/orm/ClassWriter.java 23a9c41 
  src/java/org/apache/sqoop/orm/CompilationManager.java ce165e8 
  src/test/com/cloudera/sqoop/orm/TestClassWriter.java 498db73 

Diff: https://reviews.apache.org/r/47108/diff/


Testing
-------

The current unit testcase has been only extended with one test method which simulates the
"insertion" of 20000 rows (calling all the 800 setters 20000 times with random values), but
I've also tested with 100000,1m,5m rows on my local environment. It showed this solution is
at least 250x faster.

Any additional idea for testing is more than welcome from the community.


Thanks,

Attila Szabo


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message