sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From redshift-etl-user <redshift....@gmail.com>
Subject Re: Null values and escaping
Date Sun, 05 Jan 2014 13:45:30 GMT

Clicking on the link only gets me to a book presenting the book, so I'm not
sure which note you're referring to, but perhaps it's this: "If you want to
use \N to encode missing values, then you need to specify \\N on the
command line; \ is a special escape string character in Java that will be
interpreted by the compiler."

The problem is that even if you do this, you still won't be able to
distinguish between the literal string "\N" and the null-string "\N" in the
output text file. In other words, unless you know there's a certain string
that doesn't occur in the input data, there doesn't seem to be a way to
unambiguously define a null value in the output text file.


On Sun, Jan 5, 2014 at 5:57 AM, Abraham Elmahrek <abe@cloudera.com> wrote:

> Ah now I understand. I think you can get the desired effect of using '\N'
> by providing the arguments --null-string '\\N' and --null-non-string '\\N'
> without setting "escaped-by". Take a look at the note at the bottom of this
> page from the "apache sqoop cookbook":
> http://books.google.com/books?id=bxBnjitgIAYC&pg=PT36&lpg=PT36&dq=sqoop+escape+by+null-string&source=bl&ots=JKuOI3l5Px&sig=WlDF4aWA_kTbM9lbgLmYXHUK6Uo&hl=en&sa=X&ei=0OPIUuzHLNHkoATMyYG4AQ&ved=0CEAQ6AEwAg#v=onepage&q=sqoop%20escape%20by%20null-string&f=false
> .
> -Abe
> On Sat, Jan 4, 2014 at 6:06 PM, redshift-etl-user <redshift.etl@gmail.com>wrote:
>> Abe,
>> Sure - the problem is that whatever I specify as the null-string, say X,
>> I don't see how I can distinguish between that X and an actual string X in
>> the resulting text file. Any ideas?
>> Thanks.
>> On Jan 4, 2014 1:47 AM, "Abraham Elmahrek" <abe@cloudera.com> wrote:
>>> Hey There,
>>> Have you tried the --null-string option? See
>>> http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#idp3491496 for
>>> more details. It should change null string values to what ever string you
>>> specify.
>>> -Abe
>>> On Fri, Jan 3, 2014 at 4:34 AM, redshift-etl-user <
>>> redshift.etl@gmail.com> wrote:
>>>> I'm importing from a DB into a text file, and I need to distinguish
>>>> between null and non-null strings. Is there a combination of parameters
>>>> (i.e. escaped-by, enclosed-by, and null-string) that yields unambiguous
>>>> output strings? With the default options "null-string" is "null", and so
>>>> there's no way of distinguishing between a null string and the string
>>>> "null" in the output file.
>>>> One solution to this would be to avoid escaping the specified null
>>>> string. That way we could specify "escaped-by" as "\" and "null-string" as
>>>> "\N" and get "\N" in the output as opposed to "\\N" for null strings. That
>>>> way it's guaranteed to be different from any non-null string.
>>>> In the generated code's toString() method this would mean changing from
>>>> __sb.append(FieldFormatter.escapeAndEnclose(STRING==null?"\\N":STRING,
>>>> delimiters));
>>>> to
>>>> __sb.append(STRING==null?"\\N":FieldFormatter.escapeAndEnclose(STRING,
>>>> delimiters));
>>>> Thoughts? Any ideas for a workaround?
>>>>  Thanks!

View raw message