drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Drill performance question
Date Mon, 30 Oct 2017 16:33:52 GMT
Also, on a practical note, Parquet will likely crush CSV on performance.
Columnar. Compressed. Binary.  All that.



On Mon, Oct 30, 2017 at 9:30 AM, Saurabh Mahapatra <
saurabhmahapatra94@gmail.com> wrote:

> Hi Charles,
>
> Can you share some query patterns on this data? More specifically, the
> number of columns you retrieving out of the total, the filter on the time
> dimension itself (ranges and granularities)
>
> How much is ad hoc and how much is not.
>
> Best,
> Saurabh
>
> On Mon, Oct 30, 2017 at 9:27 AM, Charles Givre <cgivre@gmail.com> wrote:
>
> > Hello all,
> > I have a dataset consisting of about 16 GB of CSV files.  I am looking to
> > do some time series analysis of this data, and created a view but when I
> > started doing aggregate queries using components of the date, the
> > performance was disappointing.  Would it be better to do a CTAS and
> > partition by components of the date?  If so, would parquet be the best
> > format?
> > Would anyone have other suggestions of things I could do to improve
> > performance?
> > Thanks,
> > — C
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message