hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <>
Subject [jira] [Updated] (HIVE-13275) Add a toString method to BytesRefArrayWritable
Date Tue, 31 May 2016 20:00:14 GMT


Jesus Camacho Rodriguez updated HIVE-13275:
    Target Version/s: 2.2.0  (was: 2.1.0)

> Add a toString method to BytesRefArrayWritable
> ----------------------------------------------
>                 Key: HIVE-13275
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: File Formats, Serializers/Deserializers
>    Affects Versions: 1.1.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Trivial
>         Attachments: HIVE-13275.000.patch
> RCFileInputFormat cannot be used externally for Hadoop Streaming today cause Streaming
generally relies on the K/V pairs to be able to emit text representations (via toString()).
> Since BytesRefArrayWritable has no toString() methods, the usage of the RCFileInputFormat
causes object representation prints which are not useful.
> Also, unlike SequenceFiles, RCFiles store multiple "values" per row (i.e. an array),
so its important to output them in a valid/parseable manner, as opposed to choosing a simple
joining delimiter over the string representations of the inner elements.
> I propose adding a standardised CSV formatting of the array data, such that users of
Streaming can then parse the results in their own script. Since we have OpenCSV as a dependency
already, we can make use of it for this purpose.

This message was sent by Atlassian JIRA

View raw message