Hi,
I have a) confirmed this behavior with more data and latest 1.3 anb b)
submitted a test file to the Jira ticket.
This affects all string based data fetched from Avro files (at least for me)
I think this should be considered a blocker for 1.3.
Regards,
-Stefán
On Tue, Nov 10, 2015 at 2:40 PM, Stefán Baxter (JIRA) <jira@apache.org>
wrote:
> Stefán Baxter created DRILL-4056:
> ------------------------------------
>
> Summary: Avro deserialization
> Key: DRILL-4056
> URL: https://issues.apache.org/jira/browse/DRILL-4056
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Other
> Affects Versions: 1.3.0
> Environment: Ubuntu 15.04 - Oracle Java
> Reporter: Stefán Baxter
> Fix For: 1.3.0
>
>
> I have an Avro file that support the following data/schema:
> {"field":"some", "classification":{"variant":"Gæst"}}
>
> When I select 10 rows from this file I get:
> +---------------------+
> | EXPR$0 |
> +---------------------+
> | Gæst |
> | Voksen |
> | Voksen |
> | Invitation KIF KBH |
> | Invitation KIF KBH |
> | Ordinarie pris KBH |
> | Ordinarie pris KBH |
> | Biljetter 200 krBH |
> | Biljetter 200 krBH |
> | Biljetter 200 krBH |
> +---------------------+
>
> The bug is that the field values are incorrectly de-serialized and the
> value from the previous row is retained if the subsequent row is shorter.
>
> The sql query:
> "select s.classification.variant variant from dfs.<some> as s limit 10;"
>
> That way the "Ordinarie pris" becomes "Ordinarie pris KBH" because the
> previous row had the value "Invitation KIF KBH".
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
|