sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Truscello <marcus.trusce...@gmail.com>
Subject --hive-import with --fields-terminated-by value over 127
Date Thu, 17 Dec 2015 17:49:42 GMT
This isn't so much as a bug report as a feature request.

With sqoop, one can specify a --fields-terminated-by value greater than 127
using octal notation and it will work correctly.  The resulting file will
have the correct delimiter.

However, if you include the --hive-import option, the delimiter will result
in error when being imported into Hive even though the file retains the
correct delimiter.  This is the region of code responsible for the error:
https://github.com/apache/sqoop/blob/f19e2a523579db8c28a96febfd3cf35a5d58adc6/src/java/org/apache/sqoop/hive/TableDefWriter.java#L278-L300

However, Hive supports delimiters with ASCII values between 128 and 255,
just not in the octal escape form.  Instead, they must be specified as
negative values (two's compliment, signed char).  For example, ASCII 254 in
octal would normally be FIELDS TERMINATED BY '\0376' which is an error in
Hive, but FIELDS TERMINATED BY '-2' works correctly.

I believe that sqoop's --hive-import function should convert the
--fields-terminated-by value into a form usable by Hive even if the value
is greater than 127.  Values greater than 255 should probably still be an
error.


Thanks for your time and consideration.
-Marcus

Mime
View raw message