drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Seddon <mr.tom.sed...@gmail.com>
Subject Distributed Drill question
Date Wed, 30 Oct 2013 09:43:00 GMT

I would like to know more about how Drill's parallel processing of queries
relates, if at all, to the parallel nature of a data source such as
Hadeoop.  Am I correct in thinking that if a Drill cluster is querying data
from a Hadoop cluster, that the drillbits are unaware of where the data
resides in HDFS, as their interaction is through the NameNode.  If this is
the case, how does scaling Drill out help performance if it's always having
to route through the NameNode?

Sorry if this is a silly question.  I've tried to find the answer by
reading the documentation and the mailing list, but I'm still not clear on



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message