drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Question regarding to Drill
Date Mon, 10 Jun 2013 08:50:04 GMT
On Mon, Jun 10, 2013 at 10:35 AM, AnilKumar B <akumarb2010@gmail.com> wrote:

> Hi,
> I went through the Drill documentation and going through the source code, I
> have few questions regarding to drill. Can any one help me in understanding
> it much better?
> 1) How the Drill aggregations are real time? Anyway it is going to scan all
> the records right? What exactly it optimizes when compare to Map Reduce
> based Hive(Considering index feature)?

Real-time is often used in a bit of a sloppy fashion.  The meaning with
respect to Drill is "ad hoc, interactive queries".

> 2) For aggregations, Is in't Cube materialization will be better solution?
>  For example like HBase-Lattice kind of solution.

Cubes are fine if you know what you are doing ahead of time.  They still
require a pass over the data.  Nothing prevents Drill from creating and/or

3) What exactly the real use cases for Drill? Whenever we say interactive,
> mostly they include aggregations, and when we say aggregations definitely
> they cannot be real time, when we scan whole raw data.

Aggregation is a fine use case.  There are many others as well.  For
instance, incremental cooccurrence counting.  Or, with special UDF's, the
inner loop of many machine learning applications.

Drill has an especially flexible scanner API which will allow cross data
source scanning.

Not sure what you are getting at, though, so I may have mis interpreted
something you said.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message