sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [sqoop] wa-ooo edited a comment on pull request #92: SQOOP-3487-Add PUT repeatedly when importing to HBase
Date Tue, 16 Mar 2021 09:14:08 GMT

wa-ooo edited a comment on pull request #92:
URL: https://github.com/apache/sqoop/pull/92#issuecomment-800087918


   > Hi hong ,
   > 
   > I've reviewed your changes (both Github and issues.apache.org), but TBH in the current
state I'm concerned both about the intention of the change, and the correctness as well.
   > 
   > First of all:
   > Could you please provide a bit more detail around what performance gain do you expect
from this change and how did you measure it? Could you please provide also some automated
testcase which would show the effect of this gain, and would ensure we don't loose it in the
future?
   > 
   > On the front of correctness:
   > SQOOP-3149 introduced the line you'd like to remove, and if I do remember correctly
absolutely intentionally. Because of this reason:
   > Could you please provide automated test cases which ensures that SQOOP-3149 changes
won't be undone by your change (so we keep the current correctness around NULL column updates)?
   > 
   > Many thanks in advance,
   > Attila Szabo
   ----------
   hi @maugly24 
          thk for review this pr
          our production environment was upgraded from CDH-5.13.0 to CDH-6.3.2, and it was
found that the task of importing data from RDM into HBase in 6.3.2 cluster took 3\~4 hours
longer (\~ 50 million records). The record output in MR log was much more than that in 5.13.0.
So I compared the changes of Hbase-import-job in SQOOP between the two versions and found
the problem here.
           I think this is an easy-fix for HBase developers, so there is not much description
in the issue.
   This change is also easy to understand, since it was added to the mutationList when PUT
was initialized, and no subsequent PUT needs to be added again. Otherwise, the PUT will be
recorded repeatedly in the generated HFILE.
           I looked at SQOOP-3149, and there is no explanation for this line


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message