trafodion-codereview mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DaveBirdsall <...@git.apache.org>
Subject [GitHub] incubator-trafodion pull request: [TRAFODION-25] First implementat...
Date Fri, 28 Aug 2015 17:22:44 GMT
Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/68#discussion_r38222760
  
    --- Diff: core/sql/optimizer/costmethod.cpp ---
    @@ -13997,15 +13997,188 @@ CostMethodFastExtract::computeOperatorCostInternal(RelExpr*
op,
     /**********************************************************************/
     
     //<pb>
    -// ----QUICKSEARCH FOR DP2Insert........................................
    +// ----QUICKSEARCH FOR HbaseInsert........................................
     
     /**********************************************************************/
     /*                                                                    */
    -/*                      CostMethodDP2Insert                           */
    +/*                      CostMethodHbaseInsert                         */
     /*                                                                    */
     /**********************************************************************/
     
    -#pragma nowarn(262)   // warning elimination
    +// -----------------------------------------------------------------------
    +// CostMethodHbaseInsert::cacheParameters()
    +// -----------------------------------------------------------------------
    +void CostMethodHbaseInsert::cacheParameters(RelExpr* op, const Context * myContext)
    +{
    +  CostMethod::cacheParameters(op, myContext);
    +
    +  HbaseInsert* insOp = (HbaseInsert *)op;
    +
    +  CMPASSERT(partFunc_ != NULL);
    +  NodeMap * nodeMap = (NodeMap *)partFunc_->getNodeMap();
    +  if (nodeMap)
    +    activePartitions_ = (CostScalar)nodeMap->getNumActivePartitions();
    +  else
    +    // Occasionally (e.g., regress/fullstack2/test023, the insert/select
    +    // from t023t1 into t023t2 using a transpose operator), we get
    +    // a ReplicateNoBroadcastPartitioningFunction lacking a node map.
    +    // In this case we'll just use the number of partitions from the
    +    // partitioning function itself -- which is probably an ESP count.
    +    activePartitions_ = (CostScalar)partFunc_->getCountOfPartitions();
    +
    +  // The number of asynchronous streams is USUALLY the # of active parts.
    +  countOfAsynchronousStreams_ = activePartitions_;
    +} // CostMethodHbaseInsert::cacheParameters()
    +
    +
    +
    +// -----------------------------------------------------------------------
    +// CostMethodHbaseInsert::computeOperatorCostInternal()
    +// -----------------------------------------------------------------------
    +Cost* CostMethodHbaseInsert::computeOperatorCostInternal(RelExpr* op,
    +  const Context* myContext,
    +  Lng32& countOfStreams)
    +{
    +  cacheParameters(op, myContext);
    +  estimateDegreeOfParallelism();
    +
    +  // ------------------------------------------------------
    +  // Save off our current estimated degree of parallelism.
    +  // in the 'out' parameter; we might revise it below
    +  // ------------------------------------------------------
    +  countOfStreams = countOfStreams_;
    +
    +  CostScalar currentCpus =
    +    (CostScalar)myContext->getPlan()->getPhysicalProperty()->getCurrentCountOfCPUs();
    +  activeCpus_ = MINOF(countOfAsynchronousStreams_, currentCpus);
    +
    +  // update count of streams; the caller of the method uses this value
    +  if ((countOfAsynchronousStreams_ > 0) &&
    +      (countOfAsynchronousStreams_ < countOfStreams)
    +      )
    +    countOfStreams = (Lng32)countOfAsynchronousStreams_.getValue();
    +
    +
    +  streamsPerCpu_ =
    +    (countOfAsynchronousStreams_ / activeCpus_).getCeiling();
    +
    +  CostScalar noOfProbesPerStream(csOne);
    +
    +  // Determine the number of probes per stream. Use this number as
    +  // the number of rows to insert (this is "per-stream" costing).
    +
    +  noOfProbesPerStream =
    +    (noOfProbes_ / countOfAsynchronousStreams_).minCsOne();
    +
    +  // ************************************************************
    +  // Compute the write/read cost for the insert
    +  //
    +  // ************************************************************
    +
    +  // ---------------------------------------------------------------------
    +  // Synthesize the cost vectors.
    +  // ---------------------------------------------------------------------
    +  SimpleCostVector cvFR;
    +  SimpleCostVector cvLR;
    +
    +  // For now, we don't bother to estimate CPU time, I/O time, transfer 
    +  // time or idle time, since we really are only supporting the new 
    +  // cost model.
    +  //
    +  // Future possible improvements:
    +  //
    +  // 1. Take into account HBase memstore insertion cost. The memstore
    +  // uses a Red-Black tree which has o(n * log(n)) insertion cost. To
    +  // model this correctly, we'd need to take into account the number
    +  // of HBase regions rather than the number of ESPs, that is, to
    +  // divide the number of probes by the number of HBase regions to find
    +  // n. This cost will be paid no matter how many inserting streams
    +  // there are so by itself this may not be interesting. It would only
    +  // be interesting if there were a choice in the plan between inserting
    +  // and not inserting (e.g. if we were considering bypassing the
    +  // memstore, or if we were considering storing an intermediate result,
    +  // neither of which are choices we examine today).
    --- End diff --
    
    I agree completely; this is what I was trying to say in the comments. Thanks for the tip
on NATable::getRegionsBeginKey(). I'll add that to the comments on the next check-in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message