drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@dremio.com>
Subject Re: UTF conversion issue with gz files
Date Wed, 26 Aug 2015 02:23:33 GMT
Yes, please post an issue.  Right now, the text reader is based on utf8.
It would need an enhancement to support alternative character sets.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Aug 24, 2015 at 9:05 AM, Edmon Begoli <ebegoli@gmail.com> wrote:

> We are unable to process files that OSX identifies as character sete
> UTF16LE.  After unzipping and converting to UTF8, we ere able to process
> one fine.  There are CONVERT_TO and CONVERT_FROM commands that appear to
> address the issue, but we were unable to make them work on a gzipped or
> unzipped version of the UTF16 file.  We were  able to use CONVERT_FROM ok,
> but when we tried to wrap the results of that to cast as a date, or
> anything else, it failed.  Trying to work with it natively caused the
> double-byte nature to appear (a substring 1,4 only return the first two
> characters).
>
> Is there a fix for this or should I file it as an issue?
>
> I cannot post the data because it is proprietary in nature, but I might be
> able to try to re-create the data for release testing and
> development purposes.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message