sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glen Hein <glenh...@gmail.com>
Subject Re: having trouble with --hive-delims-replacement
Date Wed, 11 Mar 2015 18:42:27 GMT
I meant to say that the older system is running WITHOUT the extra mapping
arguments.


On Wed, Mar 11, 2015 at 10:20 AM, Glen Hein <glenhein@gmail.com> wrote:

> I've narrowed down the problem to CLOB fields. I did figure out that if I
> add the option "--map-column-java BIO=String
> " (BIO is the name of the CLOB field), then newlines are filtered as
> expected.
>
> We do have an older system running Sqoop 1.3.0 that seems to filter the
> newlines for the CLOB fields with adding the --map-column-java. Do you know
> if 1.4.3 differs from 1.3.0 on how CLOB are handled?
>
> -Glen
>
> On Wed, Mar 11, 2015 at 10:03 AM, Abraham Elmahrek <abe@cloudera.com>
> wrote:
>
>> Hey man,
>>
>> Maybe regexp_replace + free-form query?
>>
>>
>> http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_free_form_query_imports
>> http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
>>
>> -Abe
>>
>> On Tue, Mar 10, 2015 at 9:34 AM, Glen Hein <glenhein@gmail.com> wrote:
>>
>>>
>>> I have an oracle db were some of the records contain \n and \r
>>> characters. I
>>> m trying to use --hive-delims-replacement to convert them to spaces.
>>> I've had mixed results. It seems that when the \n and \r appear at the
>>> beginning of the fields data, then they are converted as expected. But when
>>> they are embedded in the middle of the field data, then they are not being
>>> replaced. Here's a hexdump showing the unconverted \n and \r characters:
>>>
>>> 000000c0  6e 75 6c 6c 01 48 65 6c  6c 6f 2c 0d 0a 20 20 20
>>> |null.Hello,..   |
>>> 000000d0  20 20 20 4d 79 20 6e 61  6d 65 20 69 73 20 4d 61  |   My name
>>> is Ma|
>>> 000000e0  72 79 20 48 75 6e 74 2c  20 49 20 61 6d 20 31 39  |ry Hunt, I
>>> am 19|
>>> 000000f0  20 79 65 61 72 73 20 6f  6c 64 20 61 6e 64 20 6c  | years old
>>> and l|
>>> 00000100  69 76 65 20 69 6e 20 43  61 6c 69 66 6f 72 6e 69  |ive in
>>> Californi|
>>> 00000110  61 2e 20 49 20 67 72 61  64 75 61 74 65 64 20 68  |a. I
>>> graduated h|
>>> 00000120  69 67 68 20 73 63 68 6f  6f 6c 20 6f 6e 20 32 30  |igh school
>>> on 20|
>>> 00000130  31 31 20 61 6e 64 20 64  65 63 69 64 65 64 20 74  |11 and
>>> decided t|
>>> 00000140  6f 20 74 61 6b 65 20 61  20 79 65 61 72 20 6f 66  |o take a
>>> year of|
>>> 00000150  66 20 74 6f 20 73 65 65  20 69 66 20 77 68 61 74  |f to see if
>>> what|
>>> 00000160  20 49 20 63 68 6f 73 65  20 61 73 20 61 20 66 75  | I chose as
>>> a fu|
>>> 00000170  74 75 72 65 20 63 61 72  65 65 72 20 69 73 20 77  |ture career
>>> is w|
>>> 00000180  68 61 74 20 49 20 72 65  61 6c 79 20 77 61 6e 74  |hat I realy
>>> want|
>>> 00000190  65 64 20 74 6f 20 64 6f  2e 20 4d 79 20 79 65 61  |ed to do.
>>> My yea|
>>> 000001a0  72 20 69 73 20 75 70 20  61 6e 64 20 49 20 73 74  |r is up and
>>> I st|
>>> 000001b0  69 6c 6c 20 77 61 6e 74  20 74 6f 20 62 65 20 61  |ill want to
>>> be a|
>>> 000001c0  20 74 65 61 63 68 65 72  2e 20 49 20 6c 6f 6f 6b  | teacher. I
>>> look|
>>> 000001d0  20 66 6f 77 61 72 64 20  74 6f 20 6c 65 61 72 6e  | foward to
>>> learn|
>>> 000001e0  69 6e 67 20 61 6c 6c 20  74 68 61 74 20 49 20 63  |ing all
>>> that I c|
>>> 000001f0  61 6e 20 61 6e 64 20 62  65 69 6e 67 20 61 6e 20  |an and
>>> being an |
>>> 00000200  65 6c 65 6d 65 6e 74 72  79 20 73 63 68 6f 6f 6c  |elementry
>>> school|
>>> 00000210  20 74 65 61 63 68 65 72  2e 01 6e 75 6c 6c 01 31  |
>>> teacher..null.1|
>>>
>>> You can see in the first row of data there is a "01" that starts the
>>> field, and then a few characters later the 0d and 0a. I'm passing the
>>> following option to sqoop:
>>>
>>>                 <arg>--hive-delims-replacement</arg>
>>>                 <arg>ABCDEFG</arg>
>>>
>>> I've tried several different values for the actual replacement string,
>>> but the results are always the same.
>>>
>>> Is there a better way to replace the unwanted characters?
>>>
>>> Thanks,
>>> Glen
>>>
>>>
>>
>

Mime
View raw message