sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sri Kumaran Thirupathy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3317) org.apache.sqoop.validation.RowCountValidator in live RDBMS system
Date Sun, 22 Apr 2018 01:49:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447063#comment-16447063
] 

Sri Kumaran Thirupathy commented on SQOOP-3317:
-----------------------------------------------

[~BoglarkaEgyed] 

01. Source = Oracle OL*T*P DB

02. Command used, 

--validate --validator org.apache.sqoop.validation.RowCountValidator \
--validation-threshold \
org.apache.sqoop.validation.AbsoluteValidationThreshold \
--validation-failurehandler \
org.apache.sqoop.validation.AbortOnFailureHandler

03. Issue: sqoop.validation.RowCountValidator is pulling source count after Mapper job completes
and verifying with the retrived records. 

i) When the Sqoop Mapper started, the source count was 8427534. 

ii) After Mapper completed, when the Sqoop validator retrived count from Source, the source
count is 8427566. 

In OLTP DB, the count varies every second. The validator can use Mapper input/output count
to verify/validate the transaction. 

Sqoop Logs:

Map-Reduce Framework
Map *input* records=8427534
Map *output* records=8427534

[main] DEBUG org.apache.sqoop.validation.RowCountValidator - Validating data using row counts:
*Source [8427566]* with *Target[8427534]*
[main] DEBUG org.apache.sqoop.validation.AbsoluteValidationThreshold - Absolute Validation
threshold comparing 8427566 with 8427534

04. How to use Percentage Tolerant? 

 

> org.apache.sqoop.validation.RowCountValidator in live RDBMS system
> ------------------------------------------------------------------
>
>                 Key: SQOOP-3317
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3317
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Sri Kumaran Thirupathy
>            Priority: Major
>
> org.apache.sqoop.validation.RowCountValidator is retrieving count from Source after
the MR completes. This fails in live RDBMS case.
> org.apache.sqoop.validation.RowCountValidator can retrive count during MR execution phase.  
> Also, How to use Percentage Tolerant? Reference: [https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message