drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MOIS Martin (MORPHO)" <martin.m...@morpho.com>
Subject Columnar data model for JSON stored in HBase column?
Date Wed, 13 May 2015 09:13:36 GMT
Hello,

currently I am evaluating Apache Drill and have a few questions regarding the implementation
details using the HBase Storage Plugin.

The documentation explains that Drill optimizes storage and execution by using an in-memory
data model that is hierarchical and columnar (http://drill.apache.org/docs/performance/).
I understand the term "columnar" as it is described in the "Dremel" paper (http://research.google.com/pubs/pub36632.html).

In my use case I have an HBase table that stores in one column data in JSON format:

Put put = new Put(Bytes.toBytes("my-rowkey..."));
put.add(Bytes.toBytes("filterable"), Bytes.toBytes("filterable"), Bytes.toBytes("{\"firstName\":
\"Martin\", \"lastName\": \"Mois\", ...}"));

As far as I have understood, I have to convert the data in the column to JSON in order to
query them:

0: jdbc:drill:> select t.json.dateOfBirth from (select convert_from(p.filterable.filterable,
'JSON') json from hbase.person p);
+------------+
|   EXPR$0   |
+------------+
| 2007-02-04 |
...

If I now append a condition, I get the following error message:

select t.json.dateOfBirth from (select convert_from(p.filterable.filterable, 'JSON') json
from hbase.im_t_person p) t where t.json.dateOfBirth = '2014-09-07';
Query failed: SYSTEM ERROR: Unexpected exception during fragment initialization: null


[a2c6cdd8-e5bb-45ab-bd2a-39e728492e58 on trafodion.local:31010]
Error: exception while executing query: Failure while executing query. (state=,code=0)

The same happens when I create a view for the query above and set filter conditions on this
view.

With the above use case in mind, I have the following questions:

1.       Is it possible to query the JSON data inside a column of an HBase table with conditions?

2.       When I query an HBase table, does Apache Drill create a  columnar data structure
in memory for the JSON data contained in the HBase column? Is this in-memory structure re-used
by similar queries on the view?

3.       If the column family "person" has been created with compression enabled, when does
decompression happen? Once while the in-memory structure is build or again and again for each
query?

4.       When we assume that another process updates a row in my HBase table while the query
is running, how does Apache Drill sync the in-memory structure with updates made to the underlying
HBase storage?

Please note that data conversion using the option 'store.format' as explained in the section
"Data Type Conversion" (http://drill.apache.org/docs/data-type-conversion/) is not an option,
as I want to use Apache Drill as some kind of OLAP system where I can query the data ad-hoc
without any further data conversions.

Is there any kind of documentation (except the source code itself) that explains such kind
of implementation details?

Best Regards,
Martin Mois
#
" This e-mail and any attached documents may contain confidential or proprietary information.
If you are not the intended recipient, you are notified that any dissemination, copying of
this e-mail and any attachments thereto or use of their contents by any means whatsoever is
strictly prohibited. If you have received this e-mail in error, please advise the sender immediately
and delete this e-mail and all attached documents from your computer system."
#

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message