drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kunal Khatua" <ku...@apache.org>
Subject Re: CTAS memory leak
Date Thu, 30 Aug 2018 18:04:02 GMT
Scott 

I think I can explain why you are getting the OutOfMemory.

Drill essentially has 2 pools of memory... the standard JVM Heap and the Netty-managed Direct
memory. When you are reading a JSON document, it needs to be deserialized into Java heap objects
because of the JSON parser libraries Drill uses. After that, Drill converts it into its internal
representation within the Direct memory space. The issue you are seeing is most likely that
this initial step is consuming a very large amount of Heap memory. 

So, the options you have are
1. Reduce the size of the individual units of the dataset (I'm assuming it is one giant JSON
document within the source file)
2. Increase the Heap, possibly at the cost of Direct (say, 12GB Xmx and 6GB Direct)
3. Reduce the parallelization, so that fewer JSON files are read and materialized in the heap
memory at a given time.

~ Kunal



On 8/29/2018 5:10:55 PM, Boaz Ben-Zvi <bben-zvi@mapr.com> wrote:
Hi Scott,

1. "swaps and then crashes" - do you mean an Out-Of-Memory error ?

2. Version 1.14 is available now, with several memory control
improvements (e.g., Hash Join spilling, output batch sizing)

3. Direct memory is only 10G - why not go higher ? This is where most of
Drill's in-memory data is held (not so much the stack and heap).

4. May want to increase the memory available to each query on each node;
the default ( 2GB ) is too conservative (i.e. low).

E.g., to go to 8GB, do

alter session set `planner.memory.max_query_memory_per_node` =
8589934592;

Thanks,

Boaz

On 8/29/18 4:09 PM, scott wrote:
> Hi all,
> I've got a problem using the create table as option I was hoping someone
> could help with. I am trying to create parquet files from existing json
> files using this method. It works on smaller datasets, but when I try this
> on a large dataset, drill will take up all memory on my servers until it
> swaps and then crashes. I'm running version 1.12 on centos 7. I've got my
> drillbits set to xmx 8G, which seems to work for most queries and it does
> not exceed that limit by much, but when I do the CTAS, it just keeps
> growing without bounds.
> I run 4 drillbits on each server with these settings: -Xms8G -Xmx8G
> -XX:MaxDirectMemorySize=10G on a server that has 48G RAM.
> Has anyone else experienced this? Are there any workarounds you can suggest?
>
> Thanks for your time,
> Scott
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message