drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Gruzman <da...@bigdatacraft.com>
Subject Re: A list of questions on Dremel (or Apache Drill)'s columnar storage
Date Mon, 27 Aug 2012 18:03:25 GMT
Hi Min,
I will try to address your questions.
1. Logically Dremel needs 3 files per column. One for data, and one for
repetition and definition levels.  In the same time some kind of PAX format
can be used to have columns inside one file. Can be mentioned that Google
charges per size of scanned columns - something which supports columnar
2. Dremel has very high data scan rate. I would speculate about 1 GB per
second per node. It is not trivial to get this amount of data over network.
So I would assume data locality. I also think that data in small chunks is
distributed over large number of nodes..  Regarding data locality - I would
speculate that data of all tenants is stored on all (almost all) nodes. I
would note that GFS is going on the path of reducing block size up to 1 MB.
So even 1 GB of data can be distributed over 1000 node cluster.
3. Our experiments with pre-fetching were based on MMAP options. When you
MMAP file you can hint to the OS the access patterns. And if you tell to OS
you going to scan the file - it is doing pre-fetching.
Regarding GFS support - I would expect that it happens when data is
collocated and directly read from the disk. BTW - HDFS also starting to get
capability of direct read from local file.
With best regards,

On Mon, Aug 27, 2012 at 8:40 PM, Min Zhou <coderplay@gmail.com> wrote:

> Hi all,
> I was every excited that you guys decided to start  Apache Drill, an open
> source
> version of Google's Dremel.  I was a contributor of Apache Hive, and
> skilled in Hadoop
> related development. We have a nearly 3000-nodes cluster in production, one
> of the
> largest cluster of the world.
> Dremel became more and more popular since Google's BigQuery was released. I
> took a interest in this nearly two years ago.This paper
> (http://research.google.com/pubs/...<
> http://research.google.com/pubs/pub36632.html>
> ) has describe how dremel organizes
> records into nested columnar data.  But  there’s almost no information
> about
> how does dremel store those columns. I have many questions on this point.
>    1. It that one file for each column?
>    2. It seems that Dremel has no restriction that data must store in local
>    disk,
>     GFS or Bigtable,  all of them could be the target storage.  If in GFS,
>    how does dremel retrieve records from different nodes?
>    How to guarantee the data locality?
>    3. The paper refered that "The blocks in each stripe are prefetched
>    asynchronously; the read-ahead cache typically achieves hit rates of
>    95%. " , does GFS support async prefetching?
> Have you consider the questions above? What's you answers?
> BTW,  Could I join you guys to start such a cool project?
> Thanks,
> Min
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
> My profile:
> http://www.linkedin.com/in/coderplay
> My blog:
> http://coderplay.javaeye.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message