sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: Can sqoop validate the data from each database
Date Wed, 13 Aug 2014 20:42:06 GMT
Yeah, I think you are on your own there.

If you manage to come up with something generic, consider contributing
it back into Sqoop :)

On Tue, Aug 12, 2014 at 5:49 PM, tobe <tobeg3oogle@gmail.com> wrote:
> Thanks @Gwen.
>
> I want to compare the content between MySQL and HBase. It's not suitable to
> use row count or checksums because the values are increasing. I think I have
> to write a script to read from two databases and compare by myself.
>
>
> On Wed, Aug 13, 2014 at 1:04 AM, Gwen Shapira <gshapira@cloudera.com> wrote:
>>
>> By data validator you mean comparing entire table contents between
>> HDFS and the database?
>>
>> This is not currently supported by Sqoop validators. Most users
>> implement it by using Sqoop to re-load the table from HDFS to the
>> database and do the comparison within the DB (typically using hashes
>> or checksums).
>>
>> Gwen
>>
>> On Tue, Aug 12, 2014 at 12:56 AM, tobe <tobeg3oogle@gmail.com> wrote:
>> > An amateur question, can sqoop validate the values of date from each
>> > database?
>> >
>> > I have read
>> > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#validation and
>> > find
>> > out RowCountValidator. But what I want is to verify the values from each
>> > database.
>> >
>> > If it's not supported now, how can I validator the data from two
>> > databases?
>
>

Mime
View raw message