drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From weijie tong <tongweijie...@gmail.com>
Subject Re: How to generate hash code for each build side one of the hash join columns
Date Fri, 01 Jun 2018 05:40:42 GMT
Hi Boaz:

  Your propose is valuable though I have implemented the dynamic generating
code logic.  If a  ``` long hash64(int index, long seed) ``` method is
added to the ValueVector , it will also benefit others to implement
specific storage plugin's filter logic by using the pushed down bloom
filter.  To HashJoin and HashAggregate , methods ```double
hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
seed)```  will also be needed to the ValueVector.  If no one else gives
objection , I will be pleasure to take this work.

   Btw, I will share my thought about the scan side's filter logic by the
BloomFilter. The scan side filter logic here I supposed to do is to filter
the materialized ValueVector ,not at the process to construct the
ValueVector from the original storage format data. The reason is the
checking logic will break down the performance to materialize the original
deep storage format data to ValueVector.

On Fri, Jun 1, 2018 at 3:22 AM Boaz Ben-Zvi <bben-zvi@mapr.com> wrote:

>  Hi Weijie,
>
>     Another option is to totally avoid the generated code.
> We were considering the idea of replacing the generated code used for
> computing hash values with “real java” code.
>
> This idea is analogous to the usage of the copyEntry() method in the
> ValueVector interface (that Paul added last year).
> See an example of using the copyEntry() (via the appendRow() in
> VectorContainer) in the new Hash-Join-Spill code.
> Basically no need to generate “type specific” code, as the virtual
> copyEntry() method does the “type specific” work.
>
> Similarly we could have a hash64() method in ValueVector, which would
> perform the “type specific” computation.
> (One difference from copyEntry() – the hash64() would also need to take
> the “seed” parameter, which is the hash value produced by the previous
> hash).
> And similar to appendRow(), there would be evalHash() iterating over the
> key columns.
> (And one difference from appendRow() – need to iterate only on the key
> columns; these are the first columns; their number can be found from the
> config: e.g., htConfig.getKeyExprsBuild().size() )
>
>    With such implementation, that evalHash() could be used anywhere (e.g.,
> to match the Bloom filters on the left side of the join).
>
>        Thanks,
>
>              Boaz
>
>
> On 5/30/18, 7:49 PM, "weijie tong" <tongweijie178@gmail.com> wrote:
>
>     Hi Aman:
>
>       Thanks for your tips. I have rebased the latest code from the master
>     branch . Yes, the spill-to-disk feature does changed the original
>     implementation. I have adjusted my implementation according to the new
>     feature. But as you say, it will take some challenge to integration as
> I
>     noticed the spill-to-disk feature will continue to tune its
> implementation
>     performance.
>
>       The BloomFilter was implemented natively in Drill , not an external
>     library. It's implemented the algorithm of the paper which was
> mentioned by
>     you.
>
>
>     On Thu, May 31, 2018 at 1:56 AM Aman Sinha <amansinha@apache.org>
> wrote:
>
>     > Hi Weijie,
>     > I was hoping you could leverage the existing methods..so its good
> that you
>     > found the ones that work for your use case.
>     > One thing I want to point out (maybe you're already aware) .. the
> Hash Join
>     > code has changed significantly in the master branch due to the
>     > spill-to-disk feature.
>     > So, this may pose some integration challenges for your run-time join
>     > pushdown feature.
>     > Also, one other question/clarification:  for the bloom filter itself
> are
>     > you implementing it natively in Drill or using an external library ?
>     >
>     > -Aman
>     >
>     > On Tue, May 29, 2018 at 8:23 PM, weijie tong <
> tongweijie178@gmail.com>
>     > wrote:
>     >
>     > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>     > unNestEvalBlock()
>     > > which has the same effect to what I change to the ClassGenerator.
> So I
>     > give
>     > > up what I change to the ClassGenerator and hope this can help
> someone
>     > else.
>     > >
>     > > On Tue, May 29, 2018 at 1:53 PM weijie tong <
> tongweijie178@gmail.com>
>     > > wrote:
>     > >
>     > > > The code formatting is not nice. Put them again:
>     > > >
>     > > > private void setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>     > > MappingSet
>     > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
> keyExprs,
>     > > > TypedFieldId[] buildKeyFieldIds)
>     > > > throws SchemaChangeException {
>     > > > cg.setMappingSet(incomingMapping);
>     > > > if (keyExprs == null || keyExprs.length == 0) {
>     > > >   cg.getEvalBlock()._return(JExpr.lit(0));
>     > > > }
>     > > > String seedValue = "seedValue";
>     > > > String fieldId = "fieldId";
>     > > > LogicalExpression seed =
>     > > > ValueExpressions.getParameterExpression(seedValue,
> Types.required(
>     > > > TypeProtos.MinorType.INT));
>     > > >
>     > > > LogicalExpression fieldIdParamExpr =
>     > > > ValueExpressions.getParameterExpression(fieldId, Types.required(
>     > > > TypeProtos.MinorType.INT) );
>     > > > HoldingContainer fieldIdParamHolder =
> cg.addExpr(fieldIdParamExpr);
>     > > > int i = 0;
>     > > >  for (LogicalExpression expr : keyExprs) {
>     > > >      TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>     > > >      ValueExpressions.IntExpression targetBuildFieldIdExp = new
>     > > >
> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>     > > > ExpressionPosition.UNKNOWN);
>     > > >
>     > > >     JFieldRef targetBuildSideFieldId =
>     > cg.addExpr(targetBuildFieldIdExp,
>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>     > > >     JBlock ifBlock =
>     > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>     > > eq(targetBuildSideFieldId))._then();
>     > > >     //specify a special JBlock which is a inner one of the eval
> block
>     > to
>     > > > the ClassGenerator to substitute the returned JBlock of
> getEvalBlock()
>     > > >     cg.setCustomizedEvalInnerBlock(ifBlock);
>     > > >     LogicalExpression hashExpression =
>     > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
> null);
>     > > >     LogicalExpression materializedExpr =
>     > > >
> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>     > > batch,
>     > > > context.getFunctionRegistry());
>     > > >     HoldingContainer hash = cg.addExpr(materializedExpr,
>     > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>     > > >     ifBlock._return(hash.getValue());
>     > > >     //reset the customized block to null ,so the getEvalBlock()
> return
>     > > the
>     > > > truly eval JBlock
>     > > >     cg.setCustomizedEvalInnerBlock(null);
>     > > >     i++;
>     > > >  }
>     > > > cg.getEvalBlock()._return(JExpr.lit(0));
>     > > > }
>     > > >
>     > > >
>     > > >
>     > > >
>     > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
> seedValue,
>     > > int
>     > > > fieldId)
>     > > > throws SchemaChangeException
>     > > > {
>     > > > {
>     > > > IntHolder fieldId12 = new IntHolder();
>     > > > fieldId12 .value = fieldId;
>     > > > if (fieldId12 .value == constant14 .value) {
>     > > >    IntHolder out18 = new IntHolder();
>     > > >    {
>     > > >      out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>     > > >    }
>     > > >    IntHolder seedValue19 = new IntHolder();
>     > > >    seedValue19 .value = seedValue;
>     > > >    //---- start of eval portion of hash32AsDouble function.
> ----//
>     > > >    IntHolder out20 = new IntHolder();
>     > > >   {
>     > > >       final IntHolder out = new IntHolder();
>     > > >       IntHolder in = out18;
>     > > >       IntHolder seed = seedValue19;
>     > > >
>     > > >       Hash32WithSeedAsDouble$IntHash_eval: {
>     > > >       out.value =
>     > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
> in.value,
>     > > > seed.value);
>     > > >    }
>     > > >
>     > > >    out20 = out;
>     > > > }
>     > > > //---- end of eval portion of hash32AsDouble function. ----//
>     > > > return out20 .value;
>     > > > }
>     > > > return 0;
>     > > > }
>     > > > }
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > >
>     > > > On Tue, May 29, 2018 at 1:47 PM weijie tong <
> tongweijie178@gmail.com>
>     > > > wrote:
>     > > >
>     > > >> HI Paul:
>     > > >>
>     > > >>  Thanks for your enthusiasm. I have managed this skill as you
> ever
>     > > >> mentioned me at another mail thread. It's really helpful
> ,thanks for
>     > > your
>     > > >> valuable work.
>     > > >>
>     > > >>   Now I have solved this tough problem by adding a customized
> JBlock
>     > > >> member field to the ClassGenerator. So once you want the
>     > getEvalBlock()
>     > > of
>     > > >> the ClassGenerator to return a inner customized JBlock , then
> you set
>     > > this
>     > > >> member, if you want the method to return eval self JBlock , you
> reset
>     > > this
>     > > >> member to null.
>     > > >>
>     > > >>   Here is my changed setup method :
>     > > >>
>     > > >>
>     > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable>
cg,
>     > > MappingSet incomingMapping, VectorAccessible batch,
> LogicalExpression[]
>     > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>     > > >>   throws SchemaChangeException {
>     > > >>   cg.setMappingSet(incomingMapping);
>     > > >>   if (keyExprs == null || keyExprs.length == 0) {
>     > > >>     cg.getEvalBlock()._return(JExpr.lit(0));
>     > > >>   }
>     > > >>   String seedValue = "seedValue";
>     > > >>   String fieldId = "fieldId";
>     > > >>   LogicalExpression seed =
>     > ValueExpressions.getParameterExpression(seedValue,
>     > > Types.required(TypeProtos.MinorType.INT));
>     > > >>
>     > > >>   LogicalExpression fieldIdParamExpr = ValueExpressions.
>     > > getParameterExpression(fieldId, Types.required(
> TypeProtos.MinorType.INT)
>     > > );
>     > > >>   HoldingContainer fieldIdParamHolder =
> cg.addExpr(fieldIdParamExpr);
>     > > >>   int i = 0;
>     > > >>   for (LogicalExpression expr : keyExprs) {
>     > > >>     TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>     > > >>     ValueExpressions.IntExpression targetBuildFieldIdExp = new
>     > > ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>     > > ExpressionPosition.UNKNOWN);
>     > > >>
>     > > >>     JFieldRef targetBuildSideFieldId =
>     > cg.addExpr(targetBuildFieldIdExp,
>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>     > > >>     JBlock ifBlock = cg.getEvalBlock()._if(
>     > > fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>     > > >>     //specify a special JBlock which is a inner one of the eval
> block
>     > > to the ClassGenerator to substitute the returned JBlock of
> getEvalBlock()
>     > > >>     cg.setCustomizedEvalInnerBlock(ifBlock);
>     > > >>     LogicalExpression hashExpression =
>     > HashPrelUtil.getHashExpression(expr,
>     > > seed, incomingProbe != null);
>     > > >>     LogicalExpression materializedExpr =
> ExpressionTreeMaterializer.
>     > > materializeAndCheckErrors(hashExpression, batch,
>     > > context.getFunctionRegistry());
>     > > >>     HoldingContainer hash = cg.addExpr(materializedExpr,
>     > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>     > > >>     ifBlock._return(hash.getValue());
>     > > >>     //reset the customized block to null ,so the getEvalBlock()
> return
>     > > the truly eval JBlock
>     > > >>     cg.setCustomizedEvalInnerBlock(null);
>     > > >>     i++;
>     > > >>   }
>     > > >>   cg.getEvalBlock()._return(JExpr.lit(0));
>     > > >> }
>     > > >>
>     > > >>
>     > > >>  The corresponding generated codes :
>     > > >>
>     > > >>     public long getBuild64HashCodeInner(int incomingRowIdx, int
>     > > seedValue, int fieldId)
>     > > >>         throws SchemaChangeException
>     > > >>     {
>     > > >>         {
>     > > >>             IntHolder fieldId12 = new IntHolder();
>     > > >>             fieldId12 .value = fieldId;
>     > > >>             if (fieldId12 .value == constant14 .value) {
>     > > >>                 IntHolder out18 = new IntHolder();
>     > > >>                 {
>     > > >>                     out18 .value = vv15 .getAccessor().get((
>     > > incomingRowIdx));
>     > > >>                 }
>     > > >>                 IntHolder seedValue19 = new IntHolder();
>     > > >>                 seedValue19 .value = seedValue;
>     > > >>                 //---- start of eval portion of hash32AsDouble
>     > > function. ----//
>     > > >>                 IntHolder out20 = new IntHolder();
>     > > >>                 {
>     > > >>                     final IntHolder out = new IntHolder();
>     > > >>                     IntHolder in = out18;
>     > > >>                     IntHolder seed = seedValue19;
>     > > >>
>     > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>     > > >>     out.value =
>     > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>     > > in.value, seed.value);
>     > > >> }
>     > > >>
>     > > >>                     out20 = out;
>     > > >>                 }
>     > > >>                 //---- end of eval portion of hash32AsDouble
> function.
>     > > ----//
>     > > >>                 return out20 .value;
>     > > >>             }
>     > > >>             return  0;
>     > > >>         }
>     > > >>     }
>     > > >>
>     > > >>
>     > > >>
>     > > >>   Some other explanation:
>     > > >>   1st : The if checking won't hurt the performance , as I
> invoke this
>     > > >> method column by column , so it's branch predication friendly.
>     > > >>   2nd: I will use the murmur3_64 not the murmur3_32 ,since the
>     > efficient
>     > > >> bloom filter algorithm needs the 64 bit hash code to avoid the
>     > conflict.
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >>
>     > > >> On Tue, May 29, 2018 at 12:37 PM Paul Rogers
>     > <par0328@yahoo.com.invalid
>     > > >
>     > > >> wrote:
>     > > >>
>     > > >>> Hi Weijie,
>     > > >>>
>     > > >>> Seeing the discussion about the details of JCodeModel suggests
> you
>     > may
>     > > >>> be trying to debug your generated code at the level of the
code
>     > > generator.
>     > > >>>
>     > > >>> Some time ago we added the ability to step through the
> generated
>     > code.
>     > > >>> Look for the following line in the generator code:
>     > > >>>
>     > > >>>
>     > > >>>     // Uncomment out this line to debug the generated code.
>     > > >>>
>     > > >>> //    cg.saveCodeForDebugging(true);
>     > > >>>
>     > > >>>
>     > > >>> Uncomment the code line and Drill will save each generated
> file to a
>     > > >>> configured location (which, if I recall correctly, is
>     > > /tmp/drill/codegen,
>     > > >>> though it may have changed after Tim's test directory changes.)
>     > > >>>
>     > > >>> Then, set a breakpoint in the template setup() method and
you
> can
>     > step
>     > > >>> directly into the generated doSetup() method. Same for the
> eval()
>     > > method.
>     > > >>>
>     > > >>> This way, you can not only see the generated code, you can
step
>     > through
>     > > >>> it. I've found this to be a far easier way to understand the
>     > generated
>     > > code
>     > > >>> than the older techniques folks have used (look at byte codes,
> use
>     > > print
>     > > >>> statements, brute force reasoning, etc.)
>     > > >>>
>     > > >>> Tim, Boaz and others have used this technique more recently
> and can
>     > > >>> probably give you additional pointers.
>     > > >>>
>     > > >>> Thanks,
>     > > >>> - Paul
>     > > >>>
>     > > >>>
>     > > >>>
>     > > >>>     On Monday, May 28, 2018, 8:52:19 PM PDT, weijie tong <
>     > > >>> tongweijie178@gmail.com> wrote:
>     > > >>>
>     > > >>>  @aman thanks for your reply. "For the ifBlock, do you need
an
>     > _else()
>     > > >>> block
>     > > >>> also ?"  I give a default return logic at the method, so I
> don't need
>     > > the
>     > > >>> _else() block.  I have noticed the IfExpression's evaluation
> method
>     > at
>     > > >>> EvaluationVisitor which also uses the JConditional . But that
> also
>     > > >>> doesn't
>     > > >>> match my requirement. I think the key point here is the
>     > > >>> FunctionHolderExpression and ValueVectorReadExpression will
> put their
>     > > >>> corresponding generated codes to the eval method's JBlock
,
> not our
>     > > >>> specific IfBlock which is a inner block of the eval method's
> JBlock .
>     > > >>>
>     > > >>> So it seems I should make some changes to the ClassGenerator
> to let
>     > the
>     > > >>> getEvalBlock return the IfBlock (maybe accurately the
> JConditional's
>     > > then
>     > > >>> block) or implement some special FunctionHolderExpression
>     > > >>> 、ValueVectorReadExpression and corresponding visiting methods
> at the
>     > > >>> EvaluationVisitor to generate the special code blocks. Hope
> someone
>     > who
>     > > >>> are
>     > > >>> familiar with these part of codes to point out whether there
> are more
>     > > >>> easy
>     > > >>> or different choices to achieve the target.
>     > > >>>
>     > > >>> To make discussion more accurate, I put the generated codes
of
> the
>     > > >>> previous
>     > > >>> setupGetBuild64Hash method here:
>     > > >>>
>     > > >>>     public long getBuild64HashCodeInner(int incomingRowIdx,
int
>     > > >>> seedValue, int fieldId)
>     > > >>>         throws SchemaChangeException
>     > > >>>     {
>     > > >>>         {
>     > > >>>             IntHolder fieldId16 = new IntHolder();
>     > > >>>             fieldId16 .value = fieldId;
>     > > >>>             if (fieldId16 .value == constant18 .value) {
>     > > >>>                 return out24 .value;
>     > > >>>             }
>     > > >>>             IntHolder out22 = new IntHolder();
>     > > >>>             {
>     > > >>>                 out22 .value = vv19 .getAccessor().get((
>     > > incomingRowIdx));
>     > > >>>             }
>     > > >>>             IntHolder seedValue23 = new IntHolder();
>     > > >>>             seedValue23 .value = seedValue;
>     > > >>>             //---- start of eval portion of hash32AsDouble
> function.
>     > > >>> ----//
>     > > >>>             IntHolder out24 = new IntHolder();
>     > > >>>             {
>     > > >>>                 final IntHolder out = new IntHolder();
>     > > >>>                 IntHolder in = out22;
>     > > >>>                 IntHolder seed = seedValue23;
>     > > >>>
>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>     > > >>>     out.value =
>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>     > > >>> in.value, seed.value);
>     > > >>> }
>     > > >>>
>     > > >>>                 out24 = out;
>     > > >>>             }
>     > > >>>             //---- end of eval portion of hash32AsDouble
> function.
>     > > ----//
>     > > >>>             if (fieldId16 .value == constant18 .value) {
>     > > >>>                 return out26 .value;
>     > > >>>             }
>     > > >>>             IntHolder seedValue25 = new IntHolder();
>     > > >>>             seedValue25 .value = seedValue;
>     > > >>>             //---- start of eval portion of hash32AsDouble
> function.
>     > > >>> ----//
>     > > >>>             IntHolder out26 = new IntHolder();
>     > > >>>             {
>     > > >>>                 final IntHolder out = new IntHolder();
>     > > >>>                 IntHolder in = out22;
>     > > >>>                 IntHolder seed = seedValue25;
>     > > >>>
>     > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>     > > >>>     out.value =
>     > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>     > > >>> in.value, seed.value);
>     > > >>> }
>     > > >>>
>     > > >>>                 out26 = out;
>     > > >>>             }
>     > > >>>             //---- end of eval portion of hash32AsDouble
> function.
>     > > ----//
>     > > >>>             return  0;
>     > > >>>         }
>     > > >>>     }
>     > > >>>
>     > > >>>
>     > > >>>
>     > > >>>
>     > > >>>
>     > > >>> On Tue, May 29, 2018 at 10:51 AM Aman Sinha <
> amansinha@apache.org>
>     > > >>> wrote:
>     > > >>>
>     > > >>> > sorry, the previous email is incomplete.
>     > > >>> > For the ifBlock, do you need an _else() block also ?
>     > > >>> >
>     > > >>> > I have sometimes found that 'JConditional' is a good
way to
> break
>     > > down
>     > > >>> the
>     > > >>> > logic further.  Please see example usages of JConditional
> here [1].
>     > > >>> >
>     > > >>> > -Aman
>     > > >>> >
>     > > >>> > [1]
>     > > >>> >
>     > > >>> >
>     > > >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.programcreek.com_java-2Dapi-2Dexamples_-3Fapi-3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
> .
>     > > sun.codemodel.JBlock
>     > > >>> >
>     > > >>> > On Mon, May 28, 2018 at 7:46 PM, Aman Sinha <
> amansinha@apache.org>
>     > > >>> wrote:
>     > > >>> >
>     > > >>> > > Hi Weijie,
>     > > >>> > > It would be a little cumbersome to debug such issues
over
> email
>     > > >>> since one
>     > > >>> > > has to look at the generated code output and iteratively
> debug.
>     > > >>> > > Couple of thoughts I have that might help:
>     > > >>> > >
>     > > >>> > > For this particular if-then block, should you also
>     > > >>> > > JBlock ifBlock =
>     > > >>> > >
> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>     > > >>> > > tBuildSideFieldId))._then();
>     > > >>> > >
>     > > >>> > >
>     > > >>> > >
>     > > >>> > > On Mon, May 28, 2018 at 4:17 AM, weijie tong <
>     > > >>> tongweijie178@gmail.com>
>     > > >>> > > wrote:
>     > > >>> > >
>     > > >>> > >> HI All:
>     > > >>> > >>  Through implementing the JPPD feature (
>     > > >>> > >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1-iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9-prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
> , I was
>     > blocked
>     > > >>> by
>     > > >>> > the
>     > > >>> > >> problem: how to get the hash code of each build
side of
> the hash
>     > > >>> join
>     > > >>> > >> columns through the dynamic generated java code.
Hope
> someone
>     > can
>     > > >>> give
>     > > >>> > >> some
>     > > >>> > >> advice.
>     > > >>> > >>
>     > > >>> > >>    I supposed to add methods as below to the
> HashTableTemplate :
>     > > >>> > >>
>     > > >>> > >> public long getBuild64HashCode(int incomingRowIdx,
int
>     > seedValue,
>     > > >>> int
>     > > >>> > >> fieldId) throws SchemaChangeException{
>     > > >>> > >>    return getBuild64HashCodeInner(incomingRowIdx,
> seedValue,
>     > > >>> fieldId);
>     > > >>> > >> }
>     > > >>> > >>
>     > > >>> > >> protected abstract long
>     > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx")
int
>     > > incomingRowIdx,
>     > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId")
int
>     > fieldId)
>     > > >>> > >> throws SchemaChangeException;
>     > > >>> > >>
>     > > >>> > >>
>     > > >>> > >>    The high level code to invoke the getBuild64HashCode
> method
>     > is
>     > > >>> at the
>     > > >>> > >> HashJoinBatch's executeBuildPhase() :
>     > > >>> > >>
>     > > >>> > >> //create runtime filter
>     > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter)
{
>     > > >>> > >>  //create runtime filter and send out async
>     > > >>> > >>  int condFieldIndex = 0;
>     > > >>> > >>  for (BloomFilter bloomFilter : bloomFilters)
{
>     > > >>> > >>    //VV
>     > > >>> > >>    for (int ind = 0; ind < currentRecordCount;
ind++) {
>     > > >>> > >>      long hashCode = partitions[0].getBuild64HashCode(ind,
>     > > >>> > >> condFieldIndex);
>     > > >>> > >>      bloomFilter.insert(hashCode);
>     > > >>> > >>    }
>     > > >>> > >>    condFieldIndex++;
>     > > >>> > >>  }
>     > > >>> > >>  //TODO sered out async
>     > > >>> > >> }
>     > > >>> > >>
>     > > >>> > >>
>     > > >>> > >>  As you know, the abstract method getBuild64HashCodeInner
> needs
>     > to
>     > > >>> > >> calculate the hash codes of each build side
column by the
>     > fieldId
>     > > >>> input
>     > > >>> > >> parameter. In order to achieve this target,
I plan to have
>     > > different
>     > > >>> > >> solving parts corresponding to different column
> ValueVector ,
>     > > using
>     > > >>> the
>     > > >>> > if
>     > > >>> > >> statement to distinguish different solving parts
through
> the id
>     > of
>     > > >>> the
>     > > >>> > >> column.  The corresponding method to generate
the dynamic
> codes
>     > > is
>     > > >>> as
>     > > >>> > >> below:
>     > > >>> > >>
>     > > >>> > >> private void
> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>     > > >>> > >> MappingSet incomingMapping, VectorAccessible
batch,
>     > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
> buildKeyFieldIds)
>     > > >>> > >>  throws SchemaChangeException {
>     > > >>> > >>  cg.setMappingSet(incomingMapping);
>     > > >>> > >>  if (keyExprs == null || keyExprs.length ==
0) {
>     > > >>> > >>    cg.getEvalBlock()._return(JExpr.lit(0));
>     > > >>> > >>  }
>     > > >>> > >>  String seedValue = "seedValue";
>     > > >>> > >>  String fieldId = "fieldId";
>     > > >>> > >>  LogicalExpression seed =
>     > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>     > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>     > > >>> > >>
>     > > >>> > >>  LogicalExpression fieldIdParamExpr =
>     > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>     > > >>> > >> Types.required(TypeProtos.MinorType.INT) );
>     > > >>> > >>  HoldingContainer fieldIdParamHolder =
>     > > cg.addExpr(fieldIdParamExpr);
>     > > >>> > >>  int i = 0;
>     > > >>> > >>  for (LogicalExpression expr : keyExprs) {
>     > > >>> > >>    TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>     > > >>> > >>    ValueExpressions.IntExpression targetBuildFieldIdExp
=
> new
>     > > >>> > >>
> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>     > > )[0],
>     > > >>> > >> ExpressionPosition.UNKNOWN);
>     > > >>> > >>    JFieldRef targetBuildSideFieldId =
>     > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>     > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>     > > >>> > >>    JBlock ifBlock =
>     > > >>> > >>
> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>     > > >>> > >> tBuildSideFieldId))._then();
>     > > >>> > >>
>     > > >>> > >>    LogicalExpression hashExpression =
>     > > >>> > >> HashPrelUtil.getHashExpression(expr, seed, incomingProbe
> !=
>     > > null);
>     > > >>> > >>    LogicalExpression materializedExpr =
>     > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>     > > hashExpression,
>     > > >>> > >> batch, context.getFunctionRegistry());
>     > > >>> > >>    HoldingContainer hash = cg.addExpr(materializedExpr,
>     > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>     > > >>> > >>
>     > > >>> > >>
>     > > >>> > >>    ifBlock._return(hash.getValue());
>     > > >>> > >>    i++;
>     > > >>> > >>  }
>     > > >>> > >>  cg.getEvalBlock()._return(JExpr.lit(0));
>     > > >>> > >>
>     > > >>> > >> }
>     > > >>> > >>
>     > > >>> > >> But unfortunately, the generated codes are not
what I
> expected.
>     > > The
>     > > >>> > codes
>     > > >>> > >> to read ValueVector , calculate hash code of
the read
> value do
>     > not
>     > > >>> stay
>     > > >>> > in
>     > > >>> > >> the if block.  So how can I let the related
codes stay in
> the if
>     > > >>> block ?
>     > > >>> > >>
>     > > >>> > >
>     > > >>> > >
>     > > >>> >
>     > > >>
>     > > >>
>     > >
>     >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message