drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oscar Morante <spacep...@gmail.com>
Subject Re: Query hangs on planning
Date Sun, 25 Sep 2016 16:16:14 GMT
Hi Rahul,
I'm still very busy :(  But I haven't forgotten about this.  I'll open a 
JIRA with a proper test-case as soon as I get the chance.


On Thu, Sep 01, 2016 at 12:03:43PM -0700, Zelaine Fong wrote:
>Ah ... yes, you're right.  I forgot that was off heap.
>
>-- Zelaine
>
>On Thu, Sep 1, 2016 at 11:41 AM, Sudheesh Katkam <skatkam@maprtech.com>
>wrote:
>
>> That setting is for off-heap memory. The earlier case hit heap memory
>> limit.
>>
>> > On Sep 1, 2016, at 11:36 AM, Zelaine Fong <zfong@maprtech.com> wrote:
>> >
>> > One other thing ... have you tried tuning the planner.memory_limit
>> > parameter?  Based on the earlier stack trace, you're hitting a memory
>> limit
>> > during query planning.  So, tuning this parameter should help that.  The
>> > default is 256 MB.
>> >
>> > -- Zelaine
>> >
>> > On Thu, Sep 1, 2016 at 11:21 AM, rahul challapalli <
>> > challapallirahul@gmail.com> wrote:
>> >
>> >> While planning we use heap memory. 2GB of heap should be sufficient for
>> >> what you mentioned. This looks like a bug to me. Can you raise a jira
>> for
>> >> the same? And it would be super helpful if you can also attach the data
>> set
>> >> used.
>> >>
>> >> Rahul
>> >>
>> >> On Wed, Aug 31, 2016 at 9:14 AM, Oscar Morante <spacepluk@gmail.com>
>> >> wrote:
>> >>
>> >>> Sure,
>> >>> This is what I remember:
>> >>>
>> >>> * Failure
>> >>>   - embedded mode on my laptop
>> >>>   - drill memory: 2Gb/4Gb (heap/direct)
>> >>>   - cpu: 4cores (+hyperthreading)
>> >>>   - `planner.width.max_per_node=6`
>> >>>
>> >>> * Success
>> >>>   - AWS Cluster 2x c3.8xlarge
>> >>>   - drill memory: 16Gb/32Gb
>> >>>   - cpu: limited by kubernetes to 24cores
>> >>>   - `planner.width.max_per_node=23`
>> >>>
>> >>> I'm very busy right now to test again, but I'll try to provide better
>> >> info
>> >>> as soon as I can.
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Aug 31, 2016 at 05:38:53PM +0530, Khurram Faraaz wrote:
>> >>>
>> >>>> Can you please share the number of cores on the setup where the
query
>> >> hung
>> >>>> as compared to the number of cores on the setup where the query
went
>> >>>> through successfully.
>> >>>> And details of memory from the two scenarios.
>> >>>>
>> >>>> Thanks,
>> >>>> Khurram
>> >>>>
>> >>>> On Wed, Aug 31, 2016 at 4:50 PM, Oscar Morante <spacepluk@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> For the record, I think this was just bad memory configuration after
>> >> all.
>> >>>>> I retested on bigger machines and everything seems to be working
>> fine.
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Aug 09, 2016 at 10:46:33PM +0530, Khurram Faraaz wrote:
>> >>>>>
>> >>>>> Oscar, can you please report a JIRA with the required steps
to
>> >> reproduce
>> >>>>>> the OOM error. That way someone from the Drill team will
take a look
>> >> and
>> >>>>>> investigate.
>> >>>>>>
>> >>>>>> For others interested here is the stack trace.
>> >>>>>>
>> >>>>>> 2016-08-09 16:51:14,280 [285642de-ab37-de6e-a54c-
>> >> 378aaa4ce50e:foreman]
>> >>>>>> ERROR o.a.drill.common.CatastrophicFailure - Catastrophic
Failure
>> >>>>>> Occurred,
>> >>>>>> exiting. Information message: Unable to handle out of memory
>> condition
>> >>>>>> in
>> >>>>>> Foreman.
>> >>>>>> java.lang.OutOfMemoryError: Java heap space
>> >>>>>>       at java.util.Arrays.copyOfRange(Arrays.java:2694)
>> >>>>>> ~[na:1.7.0_111]
>> >>>>>>       at java.lang.String.<init>(String.java:203)
~[na:1.7.0_111]
>> >>>>>>       at java.lang.StringBuilder.toString(StringBuilder.java:405)
>> >>>>>> ~[na:1.7.0_111]
>> >>>>>>       at org.apache.calcite.util.Util.newInternal(Util.java:785)
>> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>> >>>>>>       at
>> >>>>>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> >>>>>> VolcanoRuleCall.java:251)
>> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>> >>>>>>       at
>> >>>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>> >>>>>> VolcanoPlanner.java:808)
>> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>> >>>>>>       at
>> >>>>>> org.apache.calcite.tools.Programs$RuleSetProgram.run(
>> >> Programs.java:303)
>> >>>>>> ~[calcite-core-1.4.0-drill-r16-PATCHED.jar:1.4.0-drill-r16-PATCHED]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> >>>>>> .transform(DefaultSqlHandler.java:404)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> >>>>>> .transform(DefaultSqlHandler.java:343)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> >>>>>> .convertToDrel(DefaultSqlHandler.java:240)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler
>> >>>>>> .convertToDrel(DefaultSqlHandler.java:290)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.handlers.ExplainHandler.ge
>> >>>>>> tPlan(ExplainHandler.java:61)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(Dri
>> >>>>>> llSqlWorker.java:94)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:978)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.
>> >> java:
>> >>>>>> 257)
>> >>>>>> ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>> >>>>>>       at
>> >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> >>>>>> Executor.java:1145)
>> >>>>>> [na:1.7.0_111]
>> >>>>>>       at
>> >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> >>>>>> lExecutor.java:615)
>> >>>>>> [na:1.7.0_111]
>> >>>>>>       at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Khurram
>> >>>>>>
>> >>>>>> On Tue, Aug 9, 2016 at 7:46 PM, Oscar Morante <spacepluk@gmail.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> Yeah, when I uncomment only the `upload_date` lines (a dir0
alias),
>> >>>>>>
>> >>>>>>> explain succeeds within ~30s.  Enabling any of the other
lines
>> >> triggers
>> >>>>>>> the
>> >>>>>>> failure.
>> >>>>>>>
>> >>>>>>> This is a log with the `upload_date` lines and `usage
<> 'Test'`
>> >>>>>>> enabled:
>> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022b3c55e
>> >>>>>>>
>> >>>>>>> The client times out around here (~1.5hours):
>> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>> >>>>>>> b3c55e#file-drillbit-log-L178
>> >>>>>>>
>> >>>>>>> And it still keeps running for a while until it dies
(~2.5hours):
>> >>>>>>> https://gist.github.com/spacepluk/d7ac11c0de6859e4bd003d2022
>> >>>>>>> b3c55e#file-drillbit-log-L178
>> >>>>>>>
>> >>>>>>> The memory settings for this test were:
>> >>>>>>>
>> >>>>>>>   DRILL_HEAP="4G"
>> >>>>>>>   DRILL_MAX_DIRECT_MEMORY="8G"
>> >>>>>>>
>> >>>>>>> This is on a laptop with 16G and I should probably lower
it, but it
>> >>>>>>> seems
>> >>>>>>> a bit excessive for such a small query.  And I think
I got the same
>> >>>>>>> results
>> >>>>>>> on a 2 node cluster with 8/16.  I'm gonna try again
on the cluster
>> to
>> >>>>>>> make
>> >>>>>>> sure.
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Oscar
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Tue, Aug 09, 2016 at 04:13:17PM +0530, Khurram Faraaz
wrote:
>> >>>>>>>
>> >>>>>>> You mentioned "*But if I uncomment the where clause
then it runs
>> for
>> >> a
>> >>>>>>>
>> >>>>>>>> couple of hours until it runs out of memory.*"
>> >>>>>>>>
>> >>>>>>>> Can you please share the OutOfMemory details from
drillbit.log and
>> >> the
>> >>>>>>>> value of DRILL_MAX_DIRECT_MEMORY
>> >>>>>>>>
>> >>>>>>>> Can you also try to see what happens if you retain
just this line
>> >>>>>>>> where
>> >>>>>>>> upload_date = '2016-08-01' in your where clause,
can you check if
>> >> the
>> >>>>>>>> explain succeeds.
>> >>>>>>>>
>> >>>>>>>> Thanks,
>> >>>>>>>> Khurram
>> >>>>>>>>
>> >>>>>>>> On Tue, Aug 9, 2016 at 4:00 PM, Oscar Morante <
>> spacepluk@gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi there,
>> >>>>>>>>
>> >>>>>>>> I've been stuck with this for a while and I'm not
sure if I'm
>> >> running
>> >>>>>>>>> into
>> >>>>>>>>> a bug or I'm just doing something very wrong.
>> >>>>>>>>>
>> >>>>>>>>> I have this stripped-down version of my query:
>> >>>>>>>>> https://gist.github.com/spacepluk/9ab1e1a0cfec6f0efb298f023f4c80
>> 5b
>> >>>>>>>>>
>> >>>>>>>>> The data is just a single file with one record
(1.5K).
>> >>>>>>>>>
>> >>>>>>>>> Without changing anything, explain takes ~1sec
on my machine.
>> But
>> >>>>>>>>> if I
>> >>>>>>>>> uncomment the where clause then it runs for
a couple of hours
>> until
>> >>>>>>>>> it
>> >>>>>>>>> runs
>> >>>>>>>>> out of memory.
>> >>>>>>>>>
>> >>>>>>>>> Also if I uncomment the where clause *and* take
out the join,
>> then
>> >> it
>> >>>>>>>>> takes around 30s to plan.
>> >>>>>>>>>
>> >>>>>>>>> Any ideas?
>> >>>>>>>>> Thanks!
>> >>>>>>>>>
>> >>>>>>>>>
>> >>
>>
>>

-- 
Oscar Morante
"Self-education is, I firmly believe, the only kind of education there is."
                                                          -- Isaac Asimov.

Mime
View raw message