drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From salim achouche <sachouc...@gmail.com>
Subject Re: Why Parquet file size is bigger in Linux
Date Sun, 02 Sep 2018 00:22:32 GMT
Sreekanth, can you indicate how you are creating the Parquet files (using
Drill CTAS or another application).

- Parquet files can get larger if:
  o Multiple row-groups are included per parquet file
  o Compression is not being used
- You can use the Parquet Tools utility here
<https://github.com/apache/parquet-mr/tree/master/parquet-tools> to inspect
these files

On Sat, Sep 1, 2018 at 4:43 PM Sreekanth Jonnalagadda <
sreekanth.jon@gmail.com> wrote:

> Team,
>     I have installed Drill in a distributed mode in a Linux Cluster. it is
> working fine no issue and works amazing. what i have noticed that the size
> of the Parquet file is 5 times bigger than windows Parquet file for the
> same data pull. i tried to setup the file size parameter to 512 still that
> is not working.
> can you please help if it is a defect or i am missing some configuration.
> can you please help.
> much appreciated, any amazing tool
> Thanks,
> Sreekanth Jonnalagadda


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message