drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boaz Ben-Zvi <bben-...@mapr.com>
Subject Re: Best architecture
Date Wed, 22 Feb 2017 01:06:38 GMT
 Hi Users,

     Drill CAN use the disk when running out of memory (a.k.a. spill to disk).
Currently only the Sort operation is supported, hence you’d need to enforce a Merge Join
for joining, or a Streaming Aggregation for aggregating.
But we are currently working on expanding this functionality to other operators (Hash Aggregate,
Hash Join, windowing, etc.)

And Drill does not work with “raw storage” (i.e., manage the storage pages, etc); Drill
needs the storage to be a file system, or HBase, Hive, etc.
BTW, Drill supports the Apache Parquet storage format, which is columnar – and may suit
your needs.


On 2/21/17, 2:21 PM, "Nicolas Paris" <niparisco@gmail.com> wrote:

    Join csv, json, databases.
    Your needs looks like ETL processes. I am not sure drill suits well for
    such goal. AFAIK, it is not able to work on disk when out of memory
    Moreover those tasks usally needs some procedural code parts. I am not
    sure UDFs are very flexible.
    For such use case, I would use ETL tools such talend and load monetdb
    direcly with it.
    Le 19 févr. 2017 à 18:02, Gustavo Brian écrivait :
    > Hi there,
    > I'm newbie to this, so i apology if I'm asking something senseless :)
    > Thanks for this amazing product. I'm planning to use it as main query
    > engine for data analysis. My plan is to have a raw storage where to drop
    > different types of documents: csv, json,... as they are produced by the
    > apps. Then use Drill to query and join against sql database to produce
    > enriched data to drop into a columnar storage: monetdb, druid,...
    > My question is: is there a preferred storage engine for this raw storage?
    > Can Drill take advantage of other engines like hadoop or yarn?
    > Thanks in advance

View raw message