sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: having trouble with --hive-delims-replacement
Date Wed, 11 Mar 2015 17:03:18 GMT
Hey man,

Maybe regexp_replace + free-form query?

http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm

-Abe

On Tue, Mar 10, 2015 at 9:34 AM, Glen Hein <glenhein@gmail.com> wrote:

>
> I have an oracle db were some of the records contain \n and \r characters.
> I
> m trying to use --hive-delims-replacement to convert them to spaces. I've
> had mixed results. It seems that when the \n and \r appear at the beginning
> of the fields data, then they are converted as expected. But when they are
> embedded in the middle of the field data, then they are not being replaced.
> Here's a hexdump showing the unconverted \n and \r characters:
>
> 000000c0  6e 75 6c 6c 01 48 65 6c  6c 6f 2c 0d 0a 20 20 20
> |null.Hello,..   |
> 000000d0  20 20 20 4d 79 20 6e 61  6d 65 20 69 73 20 4d 61  |   My name is
> Ma|
> 000000e0  72 79 20 48 75 6e 74 2c  20 49 20 61 6d 20 31 39  |ry Hunt, I am
> 19|
> 000000f0  20 79 65 61 72 73 20 6f  6c 64 20 61 6e 64 20 6c  | years old
> and l|
> 00000100  69 76 65 20 69 6e 20 43  61 6c 69 66 6f 72 6e 69  |ive in
> Californi|
> 00000110  61 2e 20 49 20 67 72 61  64 75 61 74 65 64 20 68  |a. I
> graduated h|
> 00000120  69 67 68 20 73 63 68 6f  6f 6c 20 6f 6e 20 32 30  |igh school on
> 20|
> 00000130  31 31 20 61 6e 64 20 64  65 63 69 64 65 64 20 74  |11 and
> decided t|
> 00000140  6f 20 74 61 6b 65 20 61  20 79 65 61 72 20 6f 66  |o take a year
> of|
> 00000150  66 20 74 6f 20 73 65 65  20 69 66 20 77 68 61 74  |f to see if
> what|
> 00000160  20 49 20 63 68 6f 73 65  20 61 73 20 61 20 66 75  | I chose as a
> fu|
> 00000170  74 75 72 65 20 63 61 72  65 65 72 20 69 73 20 77  |ture career
> is w|
> 00000180  68 61 74 20 49 20 72 65  61 6c 79 20 77 61 6e 74  |hat I realy
> want|
> 00000190  65 64 20 74 6f 20 64 6f  2e 20 4d 79 20 79 65 61  |ed to do. My
> yea|
> 000001a0  72 20 69 73 20 75 70 20  61 6e 64 20 49 20 73 74  |r is up and I
> st|
> 000001b0  69 6c 6c 20 77 61 6e 74  20 74 6f 20 62 65 20 61  |ill want to
> be a|
> 000001c0  20 74 65 61 63 68 65 72  2e 20 49 20 6c 6f 6f 6b  | teacher. I
> look|
> 000001d0  20 66 6f 77 61 72 64 20  74 6f 20 6c 65 61 72 6e  | foward to
> learn|
> 000001e0  69 6e 67 20 61 6c 6c 20  74 68 61 74 20 49 20 63  |ing all that
> I c|
> 000001f0  61 6e 20 61 6e 64 20 62  65 69 6e 67 20 61 6e 20  |an and being
> an |
> 00000200  65 6c 65 6d 65 6e 74 72  79 20 73 63 68 6f 6f 6c  |elementry
> school|
> 00000210  20 74 65 61 63 68 65 72  2e 01 6e 75 6c 6c 01 31  |
> teacher..null.1|
>
> You can see in the first row of data there is a "01" that starts the
> field, and then a few characters later the 0d and 0a. I'm passing the
> following option to sqoop:
>
>                 <arg>--hive-delims-replacement</arg>
>                 <arg>ABCDEFG</arg>
>
> I've tried several different values for the actual replacement string, but
> the results are always the same.
>
> Is there a better way to replace the unwanted characters?
>
> Thanks,
> Glen
>
>

Mime
View raw message