drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha" <asi...@maprtech.com>
Subject Re: Review Request 20741: Cost model to include distribution cost and enable new plans
Date Thu, 01 May 2014 01:54:34 GMT


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillFilterRelBase.java,
line 58
> > <https://reviews.apache.org/r/20741/diff/1/?file=569043#file569043line58>
> >
> >     I think we should probably add a cost settings/config object.  Could hang off
PlannerSettings or used to initialize our RelOptCostFactory.  I'd like to avoid magic numbers
being spread throughout the code so we can change them in a local place and evaluate the impacts

Pls see my response to Jinfeng's comments on the same issue. I have modified the formula to
not use the hardcoded value. The other part about externalizing the cost factors through PlannerSettings
needs some thought..we should only externalize selected factors.    


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java,
line 60
> > <https://reviews.apache.org/r/20741/diff/1/?file=569057#file569057line60>
> >
> >     As above, let's primarily do this check in the rule rather than waiting for
exceptions.  Exceptions are expensive.  Additionally, we have warning messages that are misleading
because an alternative plan may avoid an equijoin.

Agreed.  I have added a new JoinPruleBase class that performs the checks for cartesian join
and non-equijoins.  Both HashJoinPrule and MergeJoinPrule extend from this class. If the conditional
checks fail, the onMatch() will return without creating the transform request.  


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java,
line 72
> > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line72>
> >
> >     why did you disable single grouping key?

That was just for testing.. I have uncommented it. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java,
line 80
> > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line80>
> >
> >     same as hash agg, why removed?

That was just for testing.. I have uncommented it. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java,
line 84
> > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line84>
> >
> >     maybe note that this is incomplete and thus commented out.

done. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java,
line 149
> > <https://reviews.apache.org/r/20741/diff/1/?file=569068#file569068line149>
> >
> >     I think this is the same as StreamAgg, can we move to shared location?

Yes.  I created an abstract base class AggPruleBase and moved this utility function there.
 Both HashAggr and StreamingAggr extend from this base class.  


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java,
line 39
> > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line39>
> >
> >     Can you add a matches which checks the aggregates and fails to match on the
case where there is a distinct aggregate?

Done. onMatch() will return without creating a transform request if any of the aggregate exprs
contains a distinct.  


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java,
line 63
> > <https://reviews.apache.org/r/20741/diff/1/?file=569044#file569044line63>
> >
> >     Let's try to include this magic number also in the centralized cost config so
we can explore its impact.  As above, ideally we would pull this from an understanding of
the expressions.

Added a constant DrillCostBase.projectCpuCost with a factor of 4 * baseCpuCost to model the
cost of projecting expressions and columns.  Ideally, we would want to analyze the expression
to determine cost accurately, but for now I have changed the formala to be:   
   cpuCost = numProjectedExpressions * rowCount * projectCpuCost.


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrel.java,
line 59
> > <https://reviews.apache.org/r/20741/diff/1/?file=569055#file569055line59>
> >
> >     Ideally, this shouldn't be hit.  Rule should fail to match.

Done (see below for HashAggPrule).


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java,
line 88
> > <https://reviews.apache.org/r/20741/diff/1/?file=569057#file569057line88>
> >
> >     please centralize magic number.

Done.


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ScanPrel.java,
line 75
> > <https://reviews.apache.org/r/20741/diff/1/?file=569064#file569064line75>
> >
> >     I could be wrong but I thought the third number was disk io.  Why would this
be 0?  Also, we should probably delegate this down to groupscan as different scan types have
different costs.  For example, Parquet is substantially more CPU intensive.  CSV, more disk
intensive.

After discussing this, kept the formula as-is and added the following comments: 
    // Even though scan is reading from disk, in the currently generated plans all plans will
    // need to read the same amount of data, so keeping the disk io cost 0 is ok for now.
 
    // In the future we might consider alternative scans that go against projections or 
    // different compression schemes etc that affect the amount of data read. Such alternatives
    // would affect both cpu and io cost. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashToMergeExchange.java,
line 87
> > <https://reviews.apache.org/r/20741/diff/1/?file=569041#file569041line87>
> >
> >     This looks like old, unused code.  Do we need this?

All exchanges implement this method which is defined in the Exchange interface class but not
getting used.  After discussion, I took it out from all the exchanges and the interface. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrule.java,
line 80
> > <https://reviews.apache.org/r/20741/diff/1/?file=569062#file569062line80>
> >
> >     This brings to mind that we should probably have settings to enable disable
various types of exchanges.  E.g. supports_broadcast_exchange

We have talked about this...it's not part of these costing changes but will address as part
of a separate JIRA. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java,
line 57
> > <https://reviews.apache.org/r/20741/diff/1/?file=569056#file569056line57>
> >
> >     probably add note here about multi phase aggregate

You mean for the hash aggregate rule in general, not for this particular statement, right
?  this statement (line 57) is part of the following conditional block which is about non-grouped
(plain) aggregation, not necessarily multi-phase aggrs. I will add a general comment about
multi-phase for this rule.   
      if (aggregate.getGroupSet().isEmpty()) {
        toDist = DrillDistributionTrait.SINGLETON;
        traits = call.getPlanner().emptyTraitSet().plus(Prel.DRILL_PHYSICAL).plus(toDist);
        createTransformRequest(call, aggregate, input, traits);


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTraitDef.java,
line 75
> > <https://reviews.apache.org/r/20741/diff/1/?file=569053#file569053line75>
> >
> >     How does OrderedPartitionExchange destroy orderedness?  Same for broadcast?

OrderedPartitionExchange and BroadcastExchange both create a RandomReceiver instance, not
a MergingReceiver.  So one would have to add a Sort enforcer after the exchange to provide
sortedness. I suppose your point is that one could theoretically create a BroadcastExchange
with a MergingReceiver and preserve sortedness. Currently, we don't do that though. 


> On April 29, 2014, 5:11 a.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToMergeExchangePrel.java,
line 88
> > <https://reviews.apache.org/r/20741/diff/1/?file=569059#file569059line88>
> >
> >     Can we update to new utility method?  Also, are we sure this is true.  I thought
hash exchange supported all selection vectors.

Modified to use the utility method in PrelUtil. The modified code is the same as is present
in HashToRandomExchange. 


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20741/#review41681
-----------------------------------------------------------


On April 26, 2014, 1:54 a.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20741/
> -----------------------------------------------------------
> 
> (Updated April 26, 2014, 1:54 a.m.)
> 
> 
> Review request for drill, Jacques Nadeau and Jinfeng Ni.
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> 1. Added DrillCostBase that includes distribution (network) cost.
> 2. Added cost formulas for computeSelfCost() for various Exchange operators, joins, aggregations
etc. 
> 3. Added Prels for HashJoin, HashAggregate and generate plans for those.  
> 4. Added exchange Prels: BroadcastExchangePrel and a new type called HashToMergeExchangePrel.

> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java 7e3b63d 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashToMergeExchange.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
98892c0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillFilterRelBase.java
955729b 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillProjectRelBase.java
cf3d188 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillScanRelBase.java
b370352 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillScreenRelBase.java
51ed442 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillCostBase.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelOptCost.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelOptCostFactory.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
1492a28 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java
b75fb40 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTraitDef.java
c2ebb7a 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/FilterPrel.java
0fc3abd 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrel.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrule.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToMergeExchangePrel.java
PRE-CREATION 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashToRandomExchangePrel.java
e5c9661 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrel.java
978a531 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/MergeJoinPrule.java
8298e50 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
e6e99c0 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ScanPrel.java a945129

>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SingleMergeExchangePrel.java
d9431cc 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SortPrel.java 344be4e

>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrel.java
c2880da 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamAggPrule.java
a561a61 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/TopNPrel.java e981a45

>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/UnionExchangePrel.java
f89cbaa 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
8892a8f 
>   exec/java-exec/src/test/java/org/apache/drill/TestTpchSingleMode.java 1ccb65c 
>   exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoin.java
ba067e2 
>   exec/java-exec/src/test/resources/join/hj_exchanges.json PRE-CREATION 
>   exec/java-exec/src/test/resources/queries/tpch/01.sql f33416f 
> 
> Diff: https://reviews.apache.org/r/20741/diff/
> 
> 
> Testing
> -------
> 
> Existing unit tests.  Some TPCH tests show memory leaks but that is not directly related
to the costing changes...however those leaks would have to be resolved before enabling the
new plans.  However, I wanted to get this review request out to get feedback.  
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message