drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khurram Faraaz <kfar...@maprtech.com>
Subject Re: Query hangs on planning
Date Wed, 31 Aug 2016 12:08:53 GMT
Can you please share the number of cores on the setup where the query hung
as compared to the number of cores on the setup where the query went
through successfully.
And details of memory from the two scenarios.

Thanks,
Khurram

On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <spacepluk@gmail.com> wrote:

> For the record, I think this was just bad memory configuration after all.
> I retested on bigger machines and everything seems to be working fine.
>
>
> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
>
>> Oscar, can you please report a JIRA with the required steps to reproduce
>> the OOM error. That way someone from the Drill team will take a look and
>> investigate.
>>
>> For others interested here is the stack trace.
>>
>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-378aaa4ce50e:foreman]
>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic Failure
>> Occurred,
>> exiting. Information message: Unable to handle out of memory condition in
>> Foreman.
>> java.lang.OutOfMemoryError: Java heap space
>>        at java.util.Arrays.copyOfRange(Arrays.java:2694) ~[na:1.7.0_111]
>>        at java.lang.String.<init>(String.java:203) ~[na:1.7.0_111]
>>        at java.lang.StringBuilder.toString(StringBuilder.java:405)
>> ~[na:1.7.0_111]
>>        at org.apache.calcite.util.Util.newInternal(Util.java:785)
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>        at
>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> VolcanoRuleCall.java:251)
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>        at
>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>> VolcanoPlanner.java:808)
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>        at
>> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:303)
>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>>        at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .transform(DefaultSqlHandler.java:404)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>        at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .transform(DefaultSqlHandler.java:343)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>        at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .convertToDrel(DefaultSqlHandler.java:240)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>        at
>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> .convertToDrel(DefaultSqlHandler.java:290)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>        at
>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
>> tPlan(ExplainHandler.java:61)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>        at
>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
>> llSqlWorker.java:94)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>        at
>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>        at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:
>> 257)
>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> [na:1.7.0_111]
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> [na:1.7.0_111]
>>        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
>>
>> Thanks,
>> Khurram
>>
>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacepluk@gmail.com>
>> wrote:
>>
>> Yeah, when I uncomment only the `upload_date` lines (a dir0 alias),
>>> explain succeeds within ~30s.  Enabling any of the other lines triggers
>>> the
>>> failure.
>>>
>>> This is a log with the `upload_date` lines and `usage <> 'Test'` enabled:
>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
>>>
>>> The client times out around here (~1.5hours):
>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>> b3c55e#file-drillbit-log-L178
>>>
>>> And it still keeps running for a while until it dies (~2.5hours):
>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>>> b3c55e#file-drillbit-log-L178
>>>
>>> The memory settings for this test were:
>>>
>>>    DRILL_HEAP="4G"
>>>    DRILL_MAX_DIRECT_MEMORY="8G"
>>>
>>> This is on a laptop with 16G and I should probably lower it, but it seems
>>> a bit excessive for such a small query.  And I think I got the same
>>> results
>>> on a 2 node cluster with 8/16.  I'm gonna try again on the cluster to
>>> make
>>> sure.
>>>
>>> Thanks,
>>> Oscar
>>>
>>>
>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz wrote:
>>>
>>> You mentioned "*But if I uncomment the where clause then it runs for a
>>>> couple of hours until it runs out of memory.*"
>>>>
>>>> Can you please share the OutOfMemory details from drillbit.log and the
>>>> value of DRILL_MAX_DIRECT_MEMORY
>>>>
>>>> Can you also try to see what happens if you retain just this line where
>>>> upload_date = '2016-08-01' in your where clause, can you check if the
>>>> explain succeeds.
>>>>
>>>> Thanks,
>>>> Khurram
>>>>
>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <spacepluk@gmail.com>
>>>> wrote:
>>>>
>>>> Hi there,
>>>>
>>>>> I've been stuck with this for a while and I'm not sure if I'm running
>>>>> into
>>>>> a bug or I'm just doing something very wrong.
>>>>>
>>>>> I have this stripped-down version of my query:
>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c805b
>>>>>
>>>>> The data is just a single file with one record (1.5K).
>>>>>
>>>>> Without changing anything, explain takes ~1sec on my machine.  But if
I
>>>>> uncomment the where clause then it runs for a couple of hours until it
>>>>> runs
>>>>> out of memory.
>>>>>
>>>>> Also if I uncomment the where clause *and* take out the join, then it
>>>>> takes around 30s to plan.
>>>>>
>>>>> Any ideas?
>>>>> Thanks!
>>>>>
>>>>>
>>>>>
>>>>>
> --
> Oscar Morante
> "Self-education is, I firmly believe, the only kind of education there is."
>                                                          -- Isaac Asimov.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message