drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefán Baxter <ste...@activitystream.com>
Subject Re: Batch load of unstructured data in Drill
Date Fri, 09 Dec 2016 06:26:01 GMT

Have you considered batching them up into a nicely defined directory
structure and use directory pruning as part of your queries?

I ask because our batch processes does that. Data is arranged into Hour,
Day, Month, Quarter, Years structures (which we then roll-up in different
ways, based on volume (from H->*->Y)).
We then use simple directory pruning to decide what data is applicable for
each query.

Hope this helps,

On Thu, Dec 8, 2016 at 5:13 PM, Alexander Reshetov <
alexander.v.reshetov@gmail.com> wrote:

> By the way, is it possible to append data to parquet data source?
> I'm looking for possibility to update (append to) existing data new
> rows so every query execution will have new data rows.
> Surely it's possible with plain JSON, but I want more efficient binary
> format which will give quicker reads (and executions of queries).
> On Wed, Dec 7, 2016 at 4:08 PM, Alexander Reshetov
> <alexander.v.reshetov@gmail.com> wrote:
> > Hello,
> >
> > I want to load batches of unstructured data in Drill. Mostly JSON data.
> >
> > Is there any batch API or other options to do so?
> >
> >
> > Thanks.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message