drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Phillips <sphill...@maprtech.com>
Subject Re: Querying wide rows with Drill
Date Tue, 11 Nov 2014 20:45:15 GMT
To clarify, when I said a new HBaseRecordReader, I was referring to the
Drill class that reads data using the HBase client and writes into the
ValueVectors. In the current implementation, we have a vector for each
column, which would mean for a sparse table, we would end up with
potentially millions of vectors, which would not be very efficient at all.

In the new implementation, we would simply have a RepeatedMapVector, with a
Key and Value vector nested inside. You are correct that this will work
without any special support from DB layer.

On Tue, Nov 11, 2014 at 12:37 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Tue, Nov 11, 2014 at 1:46 PM, Steven Phillips <sphillips@maprtech.com>
> wrote:
> > For this to really work well in your case, I think we need to be able to
> > push the "mappify" operation into the scan. In other words, we need the
> > hbase scan to ouptut the records in the desired key/value format.
> > Currently, hbase scan will output in the normal, sparse column schema,
> and
> > then a separate operator would convert it.
> >
> > One way to do this would be to write a new HBaseRecordReader that outputs
> > in the key/value mode, and then have a System/session option to set which
> > mode to use.
> >
> Actually, I think that what you suggest would be plenty fast even without
> any special support in the DB layer.  The key limitation is rows per second
> retrieved from the DB, not rows per second processed by drill.
> THis is *very* exciting.

 Steven Phillips
 Software Engineer


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message