sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi, Chandramouli" <Chandramouli.R...@vantiv.com>
Subject RE: sqoop import for UUID(primary key)
Date Mon, 26 Sep 2016 13:06:00 GMT
I am repeating same but in detail.

Any other numeric index which can give even split of data can be used as split key.
Otherwise, use single mapper.

I have tried date field as split key which is alpha numeric.
Sqoop cannot do split ranges accurately and I have seen split range values as unreadable when
splits are calculated on Oracle.
So there is no way to know if data is coming back is good or not.
If good, I don’t if all data is coming or extra data is coming.

So I have changed to different index with numeric field which may not be 1st solution but
close to what I need.

If you don’t have any numeric index that gives even splits , try to build one on the Source
database.

Thanks,
Chandra

From: Selvam Raman [mailto:selmna@gmail.com]
Sent: Sunday, September 25, 2016 10:15 AM
To: Markus Kemper
Cc: user@sqoop.apache.org
Subject: Re: sqoop import for UUID(primary key)


I have 1 TB of data in databse. Primary key are alphanumeric.
Now how can I use sqoop.

Is it possible to use sqoop to import.

Thanks,
Selvam R
+91-97877-87724
On Sep 23, 2016 3:17 PM, "Markus Kemper" <markus@cloudera.com<mailto:markus@cloudera.com>>
wrote:
As Ravi noted, non-numeric keys are not reliable and can result in both duplicate as well
as missing rows.  When using a non-numeric key for split-by you should observe a warning in
the debug console output.


Markus Kemper
Customer Operations Engineer
[www.cloudera.com]<http://www.cloudera.com>


On Fri, Sep 23, 2016 at 10:11 AM, Ravi, Chandramouli <Chandramouli.Ravi@vantiv.com<mailto:Chandramouli.Ravi@vantiv.com>>
wrote:
It won't work well when Primary key is alpha numeric. I think data will be skewed or won't
come back as expected creating non-balanced split files.
Specify different numeric index as Split key if numeric primary key is not present.

From: Selvam Raman [mailto:selmna@gmail.com<mailto:selmna@gmail.com>]
Sent: Friday, September 23, 2016 10:09 AM
To: user@sqoop.apache.org<mailto:user@sqoop.apache.org>
Subject: sqoop import for UUID(primary key)

Hi,

In Sqoop If i am having primary key (Number value) and number of parallel task then it will
work (max-min/number of task), to pull the data from table to hdfs.

suppose if i have the primary key as UUID(alpha numeric value), how the load will be distributed.

Thank you for your help.

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

 **NOTICE: This e-mail message, including any attachments hereto, is for the sole use of the
intended recipient(s) and may contain confidential and/or privileged information.  If you
are not the intended recipient(s), any unauthorized review, use, copying, disclosure or distribution
is prohibited.  If you are not the intended recipient(s), please contact the sender by reply
e-mail immediately and destroy the original and all copies (including electronic versions)
of this message and any of its attachments.


 **NOTICE: This e-mail message, including any attachments hereto, is for the sole use of the
intended recipient(s) and may contain confidential and/or privileged information.  If you
are not the intended recipient(s), any unauthorized review, use, copying, disclosure or distribution
is prohibited.  If you are not the intended recipient(s), please contact the sender by reply
e-mail immediately and destroy the original and all copies (including electronic versions)
of this message and any of its attachments.
Mime
View raw message