drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: Question about foreman restart
Date Tue, 07 Jan 2020 18:57:19 GMT
Hi Nitin,

Thanks for letting us know about the OOM issues. These are serious and we should focus on
finding the cause and fixing them. In general, it is the goal of the Drill project that Drill
suffer no OOM errors on a cluster configured properly for your target workload.

Thank you for filing a JIRA ticket. The stack trace in that ticket describes a connection
shut down. Your e-mail mentioned an OOM error. Can you attach a stack trace or log entry that
led you to believe you were getting an OOM error? How many queries are running at the time
of the error?

As you know, Drill uses two kinds of memory: heap and off-heap (AKA "direct" or "unsafe.")
Generally, you want much more off-heap than heap memory. But, until we know which kind is
being exhausted, it is hard to say what to adjust.

If a Drillbit fails, all queries anywhere on the cluster will fail. The reason is simple:
all queries are distributed across all nodes. This is why we must find and fix the underlying
OOM error.

On a 64 GB machine, if you are running only Drill, you can give most of the memory to Drill
itself. Determine how much your OS and other process need. Then, split the rest between heap
and off-heap. It is very likely you have already customized the Drill memory settings: it
is the first thing everyone does when deploying. [1] Check your settings.

Until we know if you are running out of heap vs. off-heap, it is hard to suggest which setting
to adjust. If it is heap memory that is affected, then you can increase the heap memory setting
to see what affect that has on Drillbit lifetime.

- Paul

[1] http://drill.apache.org/docs/configuring-drill-memory/


    On Tuesday, January 7, 2020, 08:45:46 AM PST, Nitin Pawar <nitinpawar432@gmail.com>
 Hello Team
We have recently upgraded to drill-1.16 from drill-1.13 version
and we have started to notice lots of OOM issues .. its same setup with
changed binaries
till we figured out what’s the issue, we wanted to keep restarting
drillbits with cronjobs

my question is : *If a drill is restarted .. would the queries with this
node as foreman be resubmitted automatically ?*

Also we have a 64GB RAM machines. Can someone recommend memory setting for
this environment

Nitin Pawar  
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message