drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Phillips <sphill...@maprtech.com>
Subject Re: Querying wide rows with Drill
Date Tue, 11 Nov 2014 19:46:33 GMT
There are two new features that have been recently added to master branch
which could be useful here. The first is what we call "mappify", which will
turns a map with a wide set of columns into an array of key-value pairs.
For example:

column f:

{"a": "valA", "b": "valB"}
{"c": "valC", "d": "valD"}

would become

[{"key": "a", "value": "valA"}, {"key":"b", "value":"valB"}]
[{"key": "c", "value": "valC"}, {"key":"d", "value":"valD"}]

You could then use the new "flatten" operator to break the arrays into
multiple rows:

{"key": "a", "value": "valA"}
{"key":"b", "value":"valB}
{"key": "c", "value": "valC"}
{"key":"d", "value":"valD"}

For this to really work well in your case, I think we need to be able to
push the "mappify" operation into the scan. In other words, we need the
hbase scan to ouptut the records in the desired key/value format.
Currently, hbase scan will output in the normal, sparse column schema, and
then a separate operator would convert it.

One way to do this would be to write a new HBaseRecordReader that outputs
in the key/value mode, and then have a System/session option to set which
mode to use.

On Tue, Nov 11, 2014 at 11:08 AM, Kyrill Alyoshin <kyrill007@gmail.com>

> Guys,
> We have some time series data stored in a "wide row" in mapr-db. The column
> is the timestamp, the value is some number. Is there any
> documentation/blogs/links as to how we could query it with Drill? I mean I
> realize that we can issue a "select * from..." but is it possible to do
> more? I mean this row has other columns (in other column families).
> Would it be possible to create a join between the columns in the main
> "named" column family and the "wide" one so that we get multiple rows (as
> if the wide row was a separate table)?
> Is it possible to apply function predicates to "wide" data? Say, we want
> the latest value or the average one?
> If none of this is possible, where would we need to dig in the Drill code
> base to add these features?
> Thank you!
> -Kyrill

 Steven Phillips
 Software Engineer


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message