drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Drill Incubator report: can someone review and add to the wiki?
Date Wed, 05 Jun 2013 14:42:53 GMT
Hey y'all...

Ellen and I put something together.  I need someone who has edit
privileges to post it.  It is attached below:
Apache: Project Drill


Apache Drill is a distributed system for interactive analysis of
large-scale datasets that is based on Google's Dremel. Its goal is to
efficiently process nested data, scale to 10,000 servers or more and
to be able to process petabyes of data and trillions of records in

Drill has been incubating since 2012-08-11.

Three Issues to Address in Move to Graduation:

1. Continue to attract new developers with a variety of skills and viewpoints
2. Develop community skills and knowledge by building some releases
3. Demonstrate community robustness by rotating project tasks among
multiple project members

Issues to Call to Attention of PMC or ASF Board:


How community has developed since last report:

Mailing list discussions:

There has been active participation in discussions on the developer
mailing list, including new participants and developers. A few have
participated in the users list; mainly activity takes place on
developer mailing list.

Activity summary:

June to date 5 June, 29 (mainly jira; some discussion)
May 2013, 135  (jira, focused discussions)
April 2013, 188  (jira; focused discussions)
March 2013 260 (jira, focused discussions)

Topics in discussion on the dev mailing list included but not limited to:

•	Evolution of logical plan syntax with addition of operators
including the Value and Union Distinct operators
•	Advantages and disadvantages of Parquet versus ORC
•	ValueVector construct and requirements
•	The relative performance of Janino based compilation versus
•	Initial development of execution engine environment
•	Discussion of various types of large array and off heap data
structure libraries
•	RPC protocol and framework


For details of code commits, see http://bit.ly/14YPXN9 and http://bit.ly/19IyID1
There has been great progress around both evolution of the reference
interpreter and

In the last three months, there have been many commits including:
•	Initial implementation of RPC framework
•	Base client and Zookeeper based client abstraction
•	SQL parser with JDBC driver
•	Distributed query scheduling framework
•	ValueVector implementations
•	Large number of reference interpreter tests and fixes

Community Interactions

There is now a weekly Drill hangout conducted remotely through Google
hangouts Tuesday mornings 9am Pacific Time to keep core developers in
contact in realtime despite geographical separation.  Results from
these discussions are shared with the discussion list through meeting
minutes and all are welcome to attend.  This has been helpful in
speeding development and averages attendance of 8-10 developers each


There have been presentations from community members at conferences,
meet-ups and through the weekly Google hangout.

Sample presentations:
•	Introduction to Apache Drill, Bay Area Analytics Group 2 April 2013
by Tomer Shiran
•	Interactive Ad hoc query at scale: talk at Hadoop User Group UK by
•	Apache Drill Technical Overview: talk at Google Hangout, May 22 by
Jacques Nadeau available at http://slidesha.re/123mSDh
•	Drill Technical update @April 16 Hangout by Jacques Nadeau available
at http://slidesha.re/ZDBvWP
•	Drill Dissection at NoSQL matters (April) @mhausenblas video
available at http://bit.ly/13Ffk7b
•	All You Need to Know About Drill, talk during Big Data Week #bdw13
by Michael Hausenblas on 26 April http://bit.ly/17L1rD
•	Deep Dive into Drill Implementation 3 June at Berlin Buzzwords by
Ted Dunning and Michael Hausenblas


Slides from Drill presentations posted online such as at slideshare
get a large number and increasing number of views.


An invited interview with Ted Dunning in an O’Reilly white paper by
Mike Barlow titled “Real Time Big Data Analytics: Emerging
Architecture” discussed Apache Drill; there have been a number of blog

Social Networking

@ApacheDrill Twitter entity is active and has grown to 362 followers.

How project has developed since last report:

1. Wiki has been updated regularly
2. Significant code drops have been checked in from a number of developers
3. Significant design documents have been created and discussed
4. Additional non-code contributors have become active and are being encouraged

Please check this [ ] when you have filled in the report for Drill.

Ted Dunning: [ ](drill)
Grant Ingersoll: [ ](drill)
Isabel Drost: [ ](drill)

View raw message