drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andries Engelbrecht <aengelbre...@mapr.com>
Subject Re: Parquet files size
Date Thu, 29 Jun 2017 15:21:48 GMT
With limited memory and what seems to be higher concurrency you may want to reduce the minor
fragments (threads) per query per node.
See if you can reduce planner.width.max_per_node on the cluster and not have too much impact
on the response times.

Slightly smaller (512MB) parquet files may potentially also help, but that is usually harder
to restructure the data than system settings.


On 6/29/17, 7:39 AM, "François Méthot" <fmethot78@gmail.com> wrote:

      I am investigating issue where we are started getting Out of Heap space
    error when querying parquet files in Drill 1.10. It is currently set to 8GB
    heap, and 20GB off -heap. We can't spare more.
    We usually query 0.7 to 1.2 GB parquet files. recently we have been more on
    the 1.2GB side. For same number of files.
    It now fails on simple
       select bunch of fields.... where ....needle in haystack type of params.
    Drill is configured with the old reader:
        because of this bug DRILL-5435 (Limit cause Mem Leak)
        I have set the max number of large query to 2 instead of 10 temporarly,
    It did help so far.
    My question:
    Could parquet file size be related to those new exceptions?
    Would reducing max file size help to improve robustness of query in drill
    (at the expense of having more files to scan)?

View raw message