drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Min Zhou <coderp...@gmail.com>
Subject Re: A list of questions on Dremel (or Apache Drill)'s columnar storage
Date Tue, 28 Aug 2012 13:07:32 GMT
On Tue, Aug 28, 2012 at 7:39 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Can't do variable block size in vanilla hadoop.  That is part of the whole
> namenode legacy.
>
Exactly. HDFS doesn't support variable block sizes. There is a jira of HDFS
metioned such feature (HDFS-2362). After all,  variable block sizes would
make things more complex. It seems that we need a tradeoff: locality or
simplicity.





On Tue, Aug 28, 2012 at 2:56 AM, Min Zhou <coderplay@gmail.com> wrote:
>
> > 1. If it's one data file for each column, data locality is difficult to
> >    guarantee when rebuilding a row from column files. Unless
> >    that GFS can keep all fields from the same row in files of the
> >    same node. Moreover that, data block can't be a fixed
> >    size like 1MB/64MB/128MB, cuz
> >
>


Regards,
Min
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message