sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Younos Aboulnaga (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-2312) Problem when exporting files that has \n as part as the content columns
Date Fri, 22 May 2015 03:44:17 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555526#comment-14555526
] 

Younos Aboulnaga commented on SQOOP-2312:
-----------------------------------------

This problem also happens with all other vertical space characters, such as form feed, vertical
space, ... etc.

I am not sure if this is addressed in Sqoop2, especially that in the CSVIntermediateFormat
Wiki page (https://cwiki.apache.org/confluence/display/SQOOP/Intermediate+Data+Format+API)
the only vertical space characters mentioned are \n and \r.

> Problem when exporting files that has \n as part as the content columns
> -----------------------------------------------------------------------
>
>                 Key: SQOOP-2312
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2312
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/generic
>         Environment: Sqoop 1.4.6-rc1
>            Reporter: Henrique Andrade
>            Priority: Critical
>
> I have exported from my SQL Server some data related to our customers.
> One of the columns has some comments from customers and this is the data that is there:
> "Pecém\n" +
>                         "                                \n" +
>                         "								(São Gonçalo do Amarante)
> The problem is that Sqoop is breaking the Record at this point and the rest of the process
is failing.
> I tried to use some different options such as lines-terminated by with different character
(ˆ) but looks like hadoop library is not accepting that and is taking all the 29.000 records
as a single record.
>    "--fields-terminated-by", "|",
>                 "--lines-terminated-by", "ˆ",
>                 "--enclosed-by","'",
>                 "--escaped-by","\\"};
> I have read in some threads that looks like the only lines-terminated-by character that
was accepted was \n. Is this changed on this 1.4.6 version?
> Is there a way for avoiding the content of the columns to break the import?
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message