drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <he...@augerdata.com.au>
Subject Successful (and not so successful) Production use cases for drill?
Date Fri, 21 Aug 2020 04:01:11 GMT
Hi all,


Can some of the users that have deployed drill in production, whether
small/medium and enterprise firms, share the use cases and experiences?


What problems was drill meant to solve?


Was it successful?


What was/is drill mostly used for at your corporation?


What was tried but wasn't taken up by users?


Has it found a niche, or a core group of heavy users? What are their roles?



I've been working in reporting, data warehousing, business intelligence,
data engineering(?) (the name of the field seems to rebrand every 5 or so
years - or the lifecycle of 2 failed enterprise data projects - but that's a
theory for another time) for a bit over 15 years now and for the last 5 or
so have been trying to understand why 70-80% of projects never achieve their
aims. It doesn't seem to matter if they're run by really smart (and
expensive!) people using best in class tools and processes. Their failure
rate might be closer to the 70%, but that's still pretty terrible


I have a couple theories as to why and have tested them over the last 5 or
so years 


One part is reducing the gap between project inception and production
quality data output. Going live quickly creates enthusiasm + a feedback loop
to iterate the models which in turn creates a sense of engagement


Getting rid of a thick ETL process that takes months or more of dev and
refactoring before hitting production is one component. Using ~70% of the
project resources on the plumbing - leaving very little for the complex data
model iterations - just creates a tech demo not a commercially useful
solution.  I don't think this is a technology problem, and applies whether
using traditional on prem etl tools or the current data engineering scripts
and cron jobs but in the cloud


The least unsuccessful data engineering approach I've seen is the ELT
logical data mart pattern; landing the source data as close to a 1:1 format
as possible into a relational-like data store and leveraging MPP dbs via
views and CTASes to create a conformed star schema. Then using the star
schemas as building blocks create the complex (and actually useful) models.
Something like this can be up in a few weeks and still cover the majority of
user facing features a full data pipeline/ETL would have (snapshots +
transactional facts, inferred members, type 1 dims only - almost everyone
double joins a type 2 dim to get the current record anyway). While they
aren't always (or even usually) 100% successes they at least have something
useful or just fail quickly which is useful in itself


The first part of this - getting all the data into a single spot, still
sucks and is probably more fiddly than 10 years ago because it's all flat
files and apis now vs on premise db->db transfers


This is where I *think* drill might help me, but just want to check if this
is how it's actually being used by others. It would be nice if it could
replace the MPP altogether.. 


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message