drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andries Engelbrecht <aengelbre...@mapr.com>
Subject Re: Parquet files size
Date Thu, 29 Jun 2017 15:21:48 GMT
With limited memory and what seems to be higher concurrency you may want to reduce the minor
fragments (threads) per query per node.
See if you can reduce planner.width.max_per_node on the cluster and not have too much impact
on the response times.

Slightly smaller (512MB) parquet files may potentially also help, but that is usually harder
to restructure the data than system settings.

--Andries



On 6/29/17, 7:39 AM, "François Méthot" <fmethot78@gmail.com> wrote:

    Hi,
    
      I am investigating issue where we are started getting Out of Heap space
    error when querying parquet files in Drill 1.10. It is currently set to 8GB
    heap, and 20GB off -heap. We can't spare more.
    
    We usually query 0.7 to 1.2 GB parquet files. recently we have been more on
    the 1.2GB side. For same number of files.
    
    It now fails on simple
       select bunch of fields.... where ....needle in haystack type of params.
    
    
    Drill is configured with the old reader:
        store.parquet_use_reader=false
        because of this bug DRILL-5435 (Limit cause Mem Leak)
    
        I have set the max number of large query to 2 instead of 10 temporarly,
    It did help so far.
    
    My question:
    Could parquet file size be related to those new exceptions?
    Would reducing max file size help to improve robustness of query in drill
    (at the expense of having more files to scan)?
    
    Thanks
    Francois
    

Mime
View raw message