drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manu Mukundan <manu.mukun...@prevalent.ai>
Subject Clarification regarding Apache drill setup
Date Fri, 16 Aug 2019 04:56:36 GMT

My name is Manu and I am working as a Bigdata architect in a small startup company in Kochi,
India. Our new project handles visualizing large volume of unstructured data in cloud storage
(It can be S3, Azure blob storage or Google cloud storage). We are planning to use Apache
Drill as SQL query execution engine so that we will be cloud agnostic. Unfortunately we are
finding some  key questions unanswered before moving ahead with Drill as our platform. Hoping
you can provide some clarity and it will be much appreciated.

  1.  When stetting up the drill cluster in prod environment to query data ranging from several
gigabytes to few terabytes hosted in s3/blob storage/cloud storage, what are the considerations
for disk space ? I understand drill bits make use of data locality, but how does that work
in case of cloud storage like s3 ? Will the entire data from s3 be moved to drill cluster
before starting the query processing ?
  2.   Is it possible to use s3 or other cloud storage solutions for Sort, Hash Aggregate,
and Hash Join operators spill data rather than using local disk ?
  3.  Is it ok to run drill production cluster without hadoop ? Is just zookeeper quorum enough

I totally understand how busy you can be but if you get a chance, please help me to get a
clarity on these items. It will be really helpful

Thanks again!
Manu Mukundan
Bigdata Architect,
Prevalent AI,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message