drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject Re: Query hangs on planning
Date Thu, 01 Sep 2016 18:21:56 GMT
While planning we use heap memory. 2GB of heap should be sufficient for
what you mentioned. This looks like a bug to me. Can you raise a jira for
the same? And it would be super helpful if you can also attach the data set
used.

Rahul

On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <spacepluk@gmail.com> wrote:

> Sure,
> This is what I remember:
>
> * Failure
>    - embedded mode on my laptop
>    - drill memory: 2Gb/4Gb (heap/direct)
>    - cpu: 4cores (+hyperthreading)
>    - `planner.width.max_per_node=6`
>
> * Success
>    - AWS Cluster 2x c3.8xlarge
>    - drill memory: 16Gb/32Gb
>    - cpu: limited by kubernetes to 24cores
>    - `planner.width.max_per_node=23`
>
> I'm very busy right now to test again, but I'll try to provide better info
> as soon as I can.
>
>
>
> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
>
>> Can you please share the number of cores on the setup where the query hung
>> as compared to the number of cores on the setup where the query went
>> through successfully.
>> And details of memory from the two scenarios.
>>
>> Thanks,
>> Khurram
>>
>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <spacepluk@gmail.com>
>> wrote:
>>
>> For the record, I think this was just bad memory configuration after all.
>>> I retested on bigger machines and everything seems to be working fine.
>>>
>>>
>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
>>>
>>> Oscar, can you please report a JIRA with the required steps to reproduce
>>>> the OOM error. That way someone from the Drill team will take a look and
>>>> investigate.
>>>>
>>>> For others interested here is the stack trace.
>>>>
>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-378aaa4ce50e:foreman]
>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
>>>> Occurred,
>>>> exiting. Information message: Unable to handle out of memory condition
>>>> in
>>>> Foreman.
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>        at java.util.Arrays.copyOfRange(Arrays.java:2694)
>>>> ~[na:1.7.0_111]
>>>>        at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
>>>>        at java.lang.StringBuilder.toString(StringBuilder.java:405)
>>>> ~[na:1.7.0_111]
>>>>        at org.apache.calcite.util.Util.newInternal(Util.java:785)
>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>        at
>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>>>> VolcanoRuleCall.java:251)
>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>        at
>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>>>> VolcanoPlanner.java:808)
>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>        at
>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>> .transform(DefaultSqlHandler.java:404)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>> .transform(DefaultSqlHandler.java:343)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>> .convertToDrel(DefaultSqlHandler.java:240)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>>>> .convertToDrel(DefaultSqlHandler.java:290)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
>>>> tPlan(ExplainHandler.java:61)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
>>>> llSqlWorker.java:94)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:
>>>> 257)
>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>>>        at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>>> Executor.java:1145)
>>>> [na:1.7.0_111]
>>>>        at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>>> lExecutor.java:615)
>>>> [na:1.7.0_111]
>>>>        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
>>>>
>>>> Thanks,
>>>> Khurram
>>>>
>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacepluk@gmail.com>
>>>> wrote:
>>>>
>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
>>>>
>>>>> explain succeeds within ~30s.  Enabling any of the other lines triggers
>>>>> the
>>>>> failure.
>>>>>
>>>>> This is a log with the `upload_date` lines and `usage <> 'Test'`
>>>>> enabled:
>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
>>>>>
>>>>> The client times out around here (~1.5hours):
>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>>>> b3c55e#file-drillbit-log-L178
>>>>>
>>>>> And it still keeps running for a while until it dies (~2.5hours):
>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>>>> b3c55e#file-drillbit-log-L178
>>>>>
>>>>> The memory settings for this test were:
>>>>>
>>>>>    DRILL_HEAP="4G"
>>>>>    DRILL_MAX_DIRECT_MEMORY="8G"
>>>>>
>>>>> This is on a laptop with 16G and I should probably lower it, but it
>>>>> seems
>>>>> a bit excessive for such a small query.  And I think I got the same
>>>>> results
>>>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster to
>>>>> make
>>>>> sure.
>>>>>
>>>>> Thanks,
>>>>> Oscar
>>>>>
>>>>>
>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
>>>>>
>>>>> You mentioned "*But if I uncomment the where clause then it runs for
a
>>>>>
>>>>>> couple of hours until it runs out of memory.*"
>>>>>>
>>>>>> Can you please share the OutOfMemory details from drillbit.log and
the
>>>>>> value of DRILL_MAX_DIRECT_MEMORY
>>>>>>
>>>>>> Can you also try to see what happens if you retain just this line
>>>>>> where
>>>>>> upload_date = '2016-08-01' in your where clause, can you check if
the
>>>>>> explain succeeds.
>>>>>>
>>>>>> Thanks,
>>>>>> Khurram
>>>>>>
>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <spacepluk@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I've been stuck with this for a while and I'm not sure if I'm running
>>>>>>> into
>>>>>>> a bug or I'm just doing something very wrong.
>>>>>>>
>>>>>>> I have this stripped-down version of my query:
>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b
>>>>>>>
>>>>>>> The data is just a single file with one record (1.5K).
>>>>>>>
>>>>>>> Without changing anything, explain takes ~1sec on my machine.
 But
>>>>>>> if I
>>>>>>> uncomment the where clause then it runs for a couple of hours
until
>>>>>>> it
>>>>>>> runs
>>>>>>> out of memory.
>>>>>>>
>>>>>>> Also if I uncomment the where clause *and* take out the join,
then it
>>>>>>> takes around 30s to plan.
>>>>>>>
>>>>>>> Any ideas?
>>>>>>> Thanks!
>>>>>>>
>>>>>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message