sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Jarcec Cecho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1495) EnclosedBy and EscapedBy set to \000 are not ignored
Date Tue, 02 Dec 2014 15:50:13 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231671#comment-14231671
] 

Jarek Jarcec Cecho commented on SQOOP-1495:
-------------------------------------------

I see the problem [~petehannam], sadly I'm afraid that current patch will completely disable
the ability to use {{\000}} as an enclose/escape character which is not acceptable solution
as it would break backward compatibility. We should provide a way that would still enable
to use {{\000}} while not using it as a default case. Let's perhaps convert the {{char}} type
to {{Char}} that will effectively allow {{null}} values and use that to detect if the value
has been entered?

> EnclosedBy and EscapedBy set to \000 are not ignored
> ----------------------------------------------------
>
>                 Key: SQOOP-1495
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1495
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.5
>            Reporter: Peter Hannam
>            Priority: Minor
>         Attachments: patch.diff
>
>
> In {{DelimiterSet}} there is the following comment above two option variables:
> {code:java}
> // If these next two fields are '\000', then they are ignored.
> private char enclosedBy;
> private char escapedBy;
> {code}
> We just found a problem with this whilst doing a Sqoop export, without setting the parameters
for enclosing or escaping (i.e. they're left as default \000).  Looking at the code in {{RecordParser}}
it appears that although the comment says they would be ignored if set to \000 they actually
aren't.
> For some reason some of the records we're trying to export have \000 in a column.  This
is fine as long as the \000 isn't just before the delimiter.
> This is fine {{foo\000bar|moo}} - two columns are exported.
> This isn't fine {{foo\000|bar}} - only one column is exported.
> Looking through {{RecordParser}} the problem is that our \000 character is being assumed
to be an enclosing character, so it's then assuming the delimiter is part of a value.  We've
set {{enclosedBy}} to be \000 as a default, let's ignore it value, but then we're encountering
\000 and it's being picked up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message