drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Trush (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-7801) Changing the scope of output_batch_size
Date Wed, 18 Nov 2020 10:06:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-7801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Artem Trush resolved DRILL-7801.
--------------------------------
      Reviewer:   (was: Paul Rogers)
    Resolution: Fixed

> Changing the scope of output_batch_size
> ---------------------------------------
>
>                 Key: DRILL-7801
>                 URL: https://issues.apache.org/jira/browse/DRILL-7801
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.14.0, 1.15.0, 1.16.0, 1.17.0
>            Reporter: Artem Trush
>            Assignee: Artem Trush
>            Priority: Major
>
> {{*Drill.exec.memory.operator.output_batch_size*}} parameter caused problems with the
execution speed of certain queries, in particular, it led to situations where the number of
batch was equal to the number of records, such as 99890 batch and 99890 records.
> After comparing drill 1.13, where the query is executed in a few minutes, and 1.16, where
the query is executed in a few hours, I came to the following conclusion.
>  The problem is in the formation of the size of the record batch transmitted between
operators.
> For example, lets take a look at *{{ProjectRecordBatch}}* .
>  We have incoming batch that comes from another operator with 2000 records inside.
> *Drill 1.13*
>  We have function *_doWork_*. There is simple logic inside. This function is calling
every time when we have incoming batch in Project operator.
>  In a few words outgoing batch size depends on just incoming batch size. And in most
cases value of outgoing batch size equal to incoming batch size. So 2000 will come, 2000 will
go.
> {code:java}
> final int outputRecords = projector.projectRecords(0, incomingRecordCount, 0);{code}
> As we can see outputRecords depends just on incomingRecords.
> *Drill 1.16*
>  Now we have a memoryManager which takes as parameter our option outgoing_batch_size.
>  Lets look at this function doWork again.
>  Firstly, we got this
> {code:java}
> //calculate the output row count
> memoryManager.update();{code}
> Inside we have
>  ( getOutputBatchSize() is our config and batchSizer.rowCount() is incoming batch size)
> {code:java}
> //if rowWidth is not zero, set the output row count in the sizer
> setOutputRowCount(getOutputBatchSize(), rowWidth);
> // if more rows can be allowed than the incoming row count, then set the
> // output row count to the incoming row count.
> outPutRowCount = Math.min(getOutputRowCount(), batchSizer.rowCount()); {code}
> Back to the function _*doWork*_.
> memoryManager.update() fills variable called maxOuputRecordCount. Here
> {code:java}
> int maxOuputRecordCount = memoryManager.getOutputRowCount();{code}
> And the main difference between 13 and 16 with using a new parameter
> {code:java}
> final int outputRecords = projector.projectRecords(this.incoming,0, maxOuputRecordCount,
0); 
> {code}
> If maxOutputRecordCount smaller than incomingBatch size, then number of outputRecords
will decrease and the number of batches will increase. So will come 2000, will go 600 600
600 .. or another value depends of output_batch_size.
> As you could see in both cases the number of output batches is always not bigger than
number of incoming batches. And the same rule is following in every operator with memoryManager.
> This leads to a situation where the number of batches grows excessively. A simple solution
to this problem is to increase the value for *{{drill.exec.memory.operator.output_batch_size}}* .
However, because this option is set at the system level, changing it results in *{color:#FF0000}{{RESOURCE
ERROR: One or more nodes ran out of memory}}{color}* in other queries.
> My suggestion is to change the scope of *{{drill.exec.memory.operator.output_batch_size}}* from
system to system and session. Which will allow you to increase this option only for problematic
requests, without affecting the work of all others. As for me I don't see any reason to prevent
this change. If you have any information about the negative effects of changing the scope
of this parameter, please share it.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message