sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Kemper <mar...@cloudera.com>
Subject Re: sqoop import for UUID(primary key)
Date Fri, 23 Sep 2016 14:17:39 GMT
As Ravi noted, non-numeric keys are not reliable and can result in both
duplicate as well as missing rows.  When using a non-numeric key for
split-by you should observe a warning in the debug console output.


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Fri, Sep 23, 2016 at 10:11 AM, Ravi, Chandramouli <
Chandramouli.Ravi@vantiv.com> wrote:

> It won't work well when Primary key is alpha numeric. I think data will be
> skewed or won't come back as expected creating non-balanced split files.
>
> Specify different numeric index as Split key if numeric primary key is not
> present.
>
>
>
> *From:* Selvam Raman [mailto:selmna@gmail.com]
> *Sent:* Friday, September 23, 2016 10:09 AM
> *To:* user@sqoop.apache.org
> *Subject:* sqoop import for UUID(primary key)
>
>
>
> Hi,
>
>
>
> In Sqoop If i am having primary key (Number value) and number of parallel
> task then it will work (max-min/number of task), to pull the data from
> table to hdfs.
>
>
>
> suppose if i have the primary key as UUID(alpha numeric value), how the
> load will be distributed.
>
>
>
> Thank you for your help.
>
>
>
> --
>
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>
>
>  **NOTICE: This e-mail message, including any attachments hereto, is for
> the sole use of the intended recipient(s) and may contain confidential
> and/or privileged information.  If you are not the intended recipient(s),
> any unauthorized review, use, copying, disclosure or distribution is
> prohibited.  If you are not the intended recipient(s), please contact the
> sender by reply e-mail immediately and destroy the original and all copies
> (including electronic versions) of this message and any of its attachments.
>

Mime
View raw message