drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: A list of questions on Dremel (or Apache Drill)'s columnar storage
Date Mon, 27 Aug 2012 22:40:41 GMT
On Mon, Aug 27, 2012 at 2:03 PM, David Gruzman <david@bigdatacraft.com>wrote:

> ...

2. Dremel has very high data scan rate. I would speculate about 1 GB per
> second per node. It is not trivial to get this amount of data over network.

With dual 10G links and something like MapR with an efficient I/O system,
this becomes feasible.

> So I would assume data locality.

Data locality is still a good thing since top of rack switches can still be
a bottleneck.

> 3. Our experiments with pre-fetching were based on MMAP options. When you
> MMAP file you can hint to the OS the access patterns. And if you tell to OS
> you going to scan the file - it is doing pre-fetching.
> Regarding GFS support - I would expect that it happens when data is
> collocated and directly read from the disk. BTW - HDFS also starting to get
> capability of direct read from local file.

MapR supports mmap directly against clustered files.  Hadoop can sort of
kind of do this if the file is local or in distributed cache.  There are
very scary security questions that are very difficult to answer if you do
this with Hadoop.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message