drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Barclay (Drill) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3955) Possible bug in creation of Drill columns for HBase column families
Date Tue, 20 Oct 2015 00:16:27 GMT
Daniel Barclay (Drill) created DRILL-3955:

             Summary: Possible bug in creation of Drill columns for HBase column families
                 Key: DRILL-3955
                 URL: https://issues.apache.org/jira/browse/DRILL-3955
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Daniel Barclay (Drill)

If all of the rows read by a given {{HBaseRecordReader}} have no HBase columns in a given
HBase column family, {{HBaseRecordReader}} doesn't create a Drill column for that HBase column

Later, in a {{ProjectRecordBatch}}'s {{setupNewSchema}}, because no Drill column exists for
that HBase column family, that {{setupNewSchema}} creates a dummy Drill column using the usual
{{NullableIntVector}} type.  In particular, it is not a map vector as {{HBaseRecordReader}}
creates when it sees an HBase column family.

Should {{HBaseRecordReader}} and/or something around setting up for reading HBase (including
setting up that {{ProjectRecordBatch}}) make sure that all HBase column families are represented
with map vectors so that {{setupNewSchema}} doesn't create a dummy field of type {{NullableIntVector}}?

The problem is that, currently, when an HBase table is read in two separate fragments, one
fragment (seeing rows with columns in the column family) can get a map vector for the column
family while the other (seeing only rows with no columns in the column familar) can get the
{{NullableIntVector}}.  Downstream code that receives the two batches ends up with an unresolved
conflict, yielding IndexOutOfBoundsExceptions as in DRILL-3954.

It's not clear whether there is only one bug--that downstream code doesn't resolve {{NullableIntValue}}
dummy fields right (DRILL-TBD)--or two--that the HBase reading code should set up a Drill
column for every HBase column family (regardless of whether it has any columns in the rows
that were read) and that downstream code doesn't resolve {{NullableIntValue}} dummy fields
(resolution is applicable to sources other than just HBase).

This message was sent by Atlassian JIRA

View raw message