sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Jarcec Cecho (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1245) Varchar fields encoding is corrupted during import when snappy used
Date Mon, 02 Dec 2013 16:19:45 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836648#comment-13836648
] 

Jarek Jarcec Cecho commented on SQOOP-1245:
-------------------------------------------

I believe that Sqoop always outputs all {{String}} based columns in UTF8. Would you mind attaching
following three files?

* MySQL dump of your table with few (cca 10) rows that will reproduce the issue
* Imported text file
* Imported text compressed file

> Varchar fields encoding is corrupted during import when snappy used
> -------------------------------------------------------------------
>
>                 Key: SQOOP-1245
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1245
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.3
>         Environment: CDH 4.4. 1.4.3+62
>            Reporter: Sergey
>
> Here is a MySQL table DDL:
> {code}
> CREATE TABLE `item_info` (
>   `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
>   `shop_id` int(11) unsigned NOT NULL,
>   `internal_id` int(10) unsigned DEFAULT NULL,
>   `name` varchar(1024) NOT NULL,
>   `prefix` varchar(255) NOT NULL DEFAULT '',  
>   PRIMARY KEY (`id`),
>   
> ) ENGINE=InnoDB AUTO_INCREMENT=1727331768 DEFAULT CHARSET=utf8
> {code}
> when "--as-textfile" is used, works perfectly.
> When "--compression-codec org.apache.hadoop.io.compress.SnappyCodec" is pecified, then
all varchar fields are corrupted. Looks like they are encoded as "ISO-8859-1"
> So there is no way to export with compression varchar with non-ASCII codes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message