Hi, all,
Thanks for all of you, that's a good discussion. Some further questions:
1. If it's one data file for each column, data locality is difficult to
guarantee when rebuilding a row from column files. Unless
that GFS can keep all fields from the same row in files of the
same node. Moreover that, data block can't be a fixed
size like 1MB/64MB/128MB, cuz
2. mmap should be a good idea for data prefetching, but I think
fadvise should be better for this, since the recent version of
HDFS has been already support that function.
3. I agree with Ted, rack switches might be a bottleneck if data
locality can't guarantee.
Thanks,
Min
On Tue, Aug 28, 2012 at 11:46 AM, Dharm Raj <dharmrajbaliyan@gmail.com>wrote:
> After going through Dremel & Dryad paper, Here is my understanding --
>
> 1. Columnar storage is chosen so that non-required column of a record can
> be avoided and hence less IO.
> 2. All values of a field are kept together to improve retrieval efficiency.
> From this my understanding is that if that particular field is required in
> query, all values can be fetched in one seek efficiently.
> 3. There is no detail in paper about how to store values, repetition level
> & definition levels. As David said, it can be done having separate files
> for value, repetition level & definition level. on top of this we need to
> index record so that we can seek at right position and fetch desired values
> only or read more and discard later.
> 4. I agree on data locality part with Ted and Camuel. It is desired but not
> mandatory. Dremel paper states that Dremel has ability to access local data
> or data in GFS or other store like BigTable.
> 5. Dremel and Dryad both mentions similar way to retrieve data using
> serving tree, each node acts (independently) as an operator or run some
> custom code. User submitted query is translated to form a DAG of execution.
> Dryad states that relational algebra can be expressed as DAG. General graph
> are more complicated to implement and need to take care of cycles during
> execution. Hence Dryad chosen DAG as a query execution model.
>
>
> Please throw your understanding on this to enhance(correct) mine.
>
> Regards,
> Dharm
>
>
> On Tue, Aug 28, 2012 at 4:40 AM, Camuel Gilyadov <camuel@gmail.com> wrote:
>
> > On Mon, Aug 27, 2012 at 8:40 PM, Min Zhou <coderplay@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I was every excited that you guys decided to start Apache Drill, an
> open
> > > source
> > > version of Google's Dremel. I was a contributor of Apache Hive, and
> > > skilled in Hadoop
> > > related development. We have a nearly 3000-nodes cluster in production,
> > one
> > > of the
> > > largest cluster of the world.
> > >
> > > Dremel became more and more popular since Google's BigQuery was
> > released. I
> > > took a interest in this nearly two years ago.This paper
> > > (http://research.google.com/pubs/...<
> > > http://research.google.com/pubs/pub36632.html>
> > > ) has describe how dremel organizes
> > > records into nested columnar data. But there’s almost no information
> > > about
> > > how does dremel store those columns. I have many questions on this
> point.
> > >
> > >
> > > 1. It that one file for each column?
> > >
> >
> > I think it is an less important implementation detail. What is important
> > that you don't incur IO for non-projected columns.
> >
> > 2. It seems that Dremel has no restriction that data must store in
> local
> > > disk,
> > > GFS or Bigtable, all of them could be the target storage. If in
> > GFS,
> > > how does dremel retrieve records from different nodes?
> > > How to guarantee the data locality?
> > >
> >
> > Data locality is not mandatory. It is clearly written that data is either
> > local or accessed remotely. Search Dremel paper or slide deck for
> "in-situ"
> > and "local".
> >
> >
> > > 3. The paper refered that "The blocks in each stripe are prefetched
> > > asynchronously; the read-ahead cache typically achieves hit rates of
> > > 95%. " , does GFS support async prefetching?
> > >
> > >
> > > Have you consider the questions above? What's you answers?
> > >
> > > BTW, Could I join you guys to start such a cool project?
> > >
> >
> > It is open to everyone
> >
> >
> > >
> > >
> > > Thanks,
> > > Min
> > >
> > > --
> > > My research interests are distributed systems, parallel computing and
> > > bytecode based virtual machine.
> > >
> > > My profile:
> > > http://www.linkedin.com/in/coderplay
> > > My blog:
> > > http://coderplay.javaeye.com
> > >
> >
>
--
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.
My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com
|