drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xun Zhou <shawn.x.z...@gmail.com>
Subject Re: Dremel and Google Analytics
Date Thu, 15 Nov 2012 05:47:21 GMT
Why don't they only pre-aggregate the standard report set, and compute
the 'custom report' in runtime based on column-store storage, say
Bigtable? as you said, they only select 5 dimension at the same time
in custom report, IMHO, 'column families' in bigtable can help to scan
less data in practice.

On Wed, Nov 14, 2012 at 1:25 AM, Asaf Mesika <asaf.mesika@gmail.com> wrote:
> Interesting.
> Analytics offers drilling up to 5 dimensions in depth - your choice of them out of a
few tenths. That's quite a lot of combinations for them to pre-aggregate. So its seems they
will a heavy storage penalty for such pre calculation.
> Regarding large data sets - when you are using the app you are focus on one domain. So
the data set is as large as the site traffic. As I understand they 20k-50k machines, so I
thought they can disperse the data on it, and run Dremel on top of this data. They can optimize
by doing some first level aggregations in all sorts of dimensions, and then run Dremel on
top of that which makes the data set smaller by x10 the very least.
>
> Asaf
>
> On 13 בנוב 2012, at 17:51, David Gruzman <david@bigdatacraft.com> wrote:
>
>> As far as I know, it is not. It is heavy sampling and pre-calculations.
>> If you do processing of large data sets - result of aggregation will be
>> also large - something dremel does not intended to support. It is designed
>> to build small derivative over large dataset.
>> David
>>
>> On Tue, Nov 13, 2012 at 5:36 PM, Mesika, Asaf <asaf.mesika@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Do you know if Google Analytics is powered by Dremel?
>>>
>>> Thanks,
>>>
>>> Asaf
>>>
>>>
>

Mime
View raw message