Thanks Ted.
What exactly, I thought is pre-computing the aggregations like cubes might
be better. But as you mentioned, that might be true, If I know ahead of
time.
On Mon, Jun 10, 2013 at 2:20 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> On Mon, Jun 10, 2013 at 10:35 AM, AnilKumar B <akumarb2010@gmail.com>
> wrote:
>
> > Hi,
> >
> > I went through the Drill documentation and going through the source
> code, I
> > have few questions regarding to drill. Can any one help me in
> understanding
> > it much better?
> >
> > 1) How the Drill aggregations are real time? Anyway it is going to scan
> all
> > the records right? What exactly it optimizes when compare to Map Reduce
> > based Hive(Considering index feature)?
> >
>
> Real-time is often used in a bit of a sloppy fashion. The meaning with
> respect to Drill is "ad hoc, interactive queries".
>
>
> > 2) For aggregations, Is in't Cube materialization will be better
> solution?
> > For example like HBase-Lattice kind of solution.
> >
>
> Cubes are fine if you know what you are doing ahead of time. They still
> require a pass over the data. Nothing prevents Drill from creating and/or
> cubes.
>
> 3) What exactly the real use cases for Drill? Whenever we say interactive,
> > mostly they include aggregations, and when we say aggregations definitely
> > they cannot be real time, when we scan whole raw data.
> >
>
> Aggregation is a fine use case. There are many others as well. For
> instance, incremental cooccurrence counting. Or, with special UDF's, the
> inner loop of many machine learning applications.
>
> Drill has an especially flexible scanner API which will allow cross data
> source scanning.
>
> Not sure what you are getting at, though, so I may have mis interpreted
> something you said.
>
|