drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: How to resolve error- "One or more nodes ran out of memory while executing the query"?
Date Thu, 25 Oct 2018 23:32:43 GMT
Hi Ashish,

In general, you should give Drill less memory than your total system memory, else swapping
will occur and performance will suffer greatly.

In your case, you should *decrease* memory so that the total (along with the 1 GB for code
cache) is under your 8 GB total, while also allowing memory for your OS, etc.

Note that Drill is a big data query engine and is designed to run on large servers in a cluster;
running it on a laptop with 8 GB is handy for testing and development, but is not really the
primary use case for doing real queries on large data sets.

You did not mention the size of your files. If you are running out of memory, it may be because
your inputs are larger than the memory available to Drill and so there is insufficient space
in memory to hold the data when doing the join.

If your data is significantly smaller than, say, 4 GB, then there may be some other issue.
You can perhaps look at the query profile, or give us a bit more detail about your data such
as file sizes, etc.


Also, in your join condition, how many of the rows are expected to joins? If both column[0]
values are "Fred", say, then you are doing a cartesian join in which every row in one table
is matched with every row in the other. Can you provide a bit more detail about your data?

A question for others on this list: isn't the join now supposed to spill if memory is constrained?
What might Ashish look for to see why spilling is not happening? Might the plan be using a
form of join that does not spill? How would we diagnose such an issue? With an EXPLAIN PLAN?

Thanks,
- Paul

 

    On Thursday, October 25, 2018, 12:40:18 AM PDT, Ashish Pancholi <apancholi@chambal.com>
wrote:  
 
 I am using `Apache Drill` version `1.14` on `windows` system and running
Drill using the command:

    sqlline.bat -u "jdbc:drill:zk=local"


 I am trying to `execute` a `join query` on two `compressed` and `archived`
`CSV` files.

Query:

    SELECT  *  FROM
> dfs.`C:\Users\admin\Desktop\DRILL_FILES\csvFileParquet\TBL_MOREDATA-20180924181406.tar.gz`
> AS Table0 INNER JOIN
> dfs.`C:\Users\admin\Desktop\DRILL_FILES\csvFileParquet\TBL_MOREDATA1-20180924181406.tar.gz`
> AS Table1  ON  Table0.columns[0]=Table1.columns[0]


But an out of memory, error occurred:

    org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR:
> One or more nodes ran out of memory while executing the query. Unable to
> allocate buffer of size 131072 (rounded from 86104) due to memory limit
> (630194176). Current allocation: 630108434 Fragment 0:0 [Error Id:
> 585c0644-5fd5-446e-b9b3-d48e0771eb2a on DESKTOP-SM3E3KM:31010]


To resolve the issue, I tried to update `config\drill-env.sh` file but the
issue remains the same and it looks like updating the script file does not
reflect the changes because I am trying to increase the DIRECT MEMORY
beyond the system memory (RAM), every time drill starts up peacefully. Not
even complaining that you have exceeded the memory, therefore, it looks
like the changes are not reflecting.

    export DRILLBIT_MAX_PROC_MEM=12G
>    export DRILL_HEAP=2G
>    export DRILL_MAX_DIRECT_MEMORY=10G


whereas my system's main memory is only 8 GB.

*Please help me to resolve the out of memory error*. I had even run the
below queries, in order to follow the troubleshooting instructions but the
issue remains the same.


  -    alter session set `planner.enable_hashagg` = false;
  -    alter session set `planner.enable_hashjoin` = false;
  -    alter session set planner.width.max_per_node=3;
  -    alter system set planner.width.max_per_query = 100;











-- 
Thanks
Ashish Pancholi
  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message