I find the answer that RecordBatch's max size is 2^16 which is defined at
RecordBatch's MAX_BATCH_SIZE.
Some questions about SelectionVector2 and SelectionVector4:
>
I want to create SelectionVector4 or SelectionVector2 to represent the
filtered ScanBatch to avoid memory copy. But I found the ProjectBatch does
not support SelectVector4 . And the SelectionVector2's record count size is
char type size . So why SelectionVector4 is not supported by the
ProjectBatch ? The same question is to the FilterBatch's SelectVector2
which also only support the 2 Byte size record count.
>
>
Hi Boaz:
>>
Your propose is valuable though I have implemented the dynamic
generating code logic. If a ``` long hash64(int index, long seed) ```
method is added to the ValueVector , it will also benefit others to
implement specific storage plugin's filter logic by using the pushed down
bloom filter. To HashJoin and HashAggregate , methods ```double
hash32AsDouble(int index, int seed) ``` and ```int hash32(int index, int
seed)``` will also be needed to the ValueVector. If no one else gives
objection , I will be pleasure to take this work.
>>
Btw, I will share my thought about the scan side's filter logic by the
BloomFilter. The scan side filter logic here I supposed to do is to filter
the materialized ValueVector ,not at the process to construct the
ValueVector from the original storage format data. The reason is the
checking logic will break down the performance to materialize the original
deep storage format data to ValueVector.
>>
>>
>>> Hi Weijie,
>>>
>>> Another option is to totally avoid the generated code.
>>> We were considering the idea of replacing the generated code used for
>>> computing hash values with “real java” code.
>>>
>>> This idea is analogous to the usage of the copyEntry() method in the
>>> ValueVector interface (that Paul added last year).
>>> See an example of using the copyEntry() (via the appendRow() in
>>> VectorContainer) in the new HashJoinSpill code.
>>> Basically no need to generate “type specific” code, as the virtual
>>> copyEntry() method does the “type specific” work.
>>>
>>> Similarly we could have a hash64() method in ValueVector, which would
>>> perform the “type specific” computation.
>>> (One difference from copyEntry() – the hash64() would also need to take
>>> the “seed” parameter, which is the hash value produced by the previous
>>> hash).
>>> And similar to appendRow(), there would be evalHash() iterating over the
>>> key columns.
>>> (And one difference from appendRow() – need to iterate only on the key
>>> columns; these are the first columns; their number can be found from the
>>> config: e.g., htConfig.getKeyExprsBuild().size() )
>>>
>>> With such implementation, that evalHash() could be used anywhere
>>> (e.g., to match the Bloom filters on the left side of the join).
>>>
>>> Thanks,
>>>
>>> Boaz
>>>
>>>
>>>
>>> Hi Aman:
>>>
>>> Thanks for your tips. I have rebased the latest code from the
>>> master
>>> branch . Yes, the spilltodisk feature does changed the original
>>> implementation. I have adjusted my implementation according to the
>>> new
>>> feature. But as you say, it will take some challenge to integration
>>> as I
>>> noticed the spilltodisk feature will continue to tune its
>>> implementation
>>> performance.
>>>
>>> The BloomFilter was implemented natively in Drill , not an external
>>> library. It's implemented the algorithm of the paper which was
>>> mentioned by
>>> you.
>>>
>>>
>>>
>>> > Hi Weijie,
>>> > I was hoping you could leverage the existing methods..so its good
>>> that you
>>> > found the ones that work for your use case.
>>> > One thing I want to point out (maybe you're already aware) .. the
>>> Hash Join
>>> > code has changed significantly in the master branch due to the
>>> > spilltodisk feature.
>>> > So, this may pose some integration challenges for your runtime
>>> join
>>> > pushdown feature.
>>> > Also, one other question/clarification: for the bloom filter
>>> itself are
>>> > you implementing it natively in Drill or using an external library
>>> ?
>>> >
>>> > Aman
>>> >
>>> >
>>> > > I found ClassGenerator's nestEvalBlock(JBlock block) and
>>> > unNestEvalBlock()
>>> > > which has the same effect to what I change to the
>>> ClassGenerator. So I
>>> > give
>>> > > up what I change to the ClassGenerator and hope this can help
>>> someone
>>> > else.
>>> > >
>>> > >
>>> > > > The code formatting is not nice. Put them again:
>>> > > >
>>> > > > private void setupGetBuild64Hash(ClassGenerator<HashTable>
cg,
>>> > > MappingSet
>>> > > > incomingMapping, VectorAccessible batch, LogicalExpression[]
>>> keyExprs,
>>> > > > TypedFieldId[] buildKeyFieldIds)
>>> > > > throws SchemaChangeException {
>>> > > > cg.setMappingSet(incomingMapping);
>>> > > > if (keyExprs == null  keyExprs.length == 0) {
>>> > > > cg.getEvalBlock()._return(JExpr.lit(0));
>>> > > > }
>>> > > > String seedValue = "seedValue";
>>> > > > String fieldId = "fieldId";
>>> > > > LogicalExpression seed =
>>> > > > ValueExpressions.getParameterExpression(seedValue,
>>> Types.required(
>>> > > > TypeProtos.MinorType.INT));
>>> > > >
>>> > > > LogicalExpression fieldIdParamExpr =
>>> > > > ValueExpressions.getParameterExpression(fieldId,
>>> Types.required(
>>> > > > TypeProtos.MinorType.INT) );
>>> > > > HoldingContainer fieldIdParamHolder =
>>> cg.addExpr(fieldIdParamExpr);
>>> > > > int i = 0;
>>> > > > for (LogicalExpression expr : keyExprs) {
>>> > > > TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>> > > > ValueExpressions.IntExpression targetBuildFieldIdExp
= new
>>> > > >
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>> > > > ExpressionPosition.UNKNOWN);
>>> > > >
>>> > > > JFieldRef targetBuildSideFieldId =
>>> > cg.addExpr(targetBuildFieldIdExp,
>>> > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>> > > > JBlock ifBlock =
>>> > > > cg.getEvalBlock()._if(fieldIdParamHolder.getValue().
>>> > > eq(targetBuildSideFieldId))._then();
>>> > > > //specify a special JBlock which is a inner one of the
>>> eval block
>>> > to
>>> > > > the ClassGenerator to substitute the returned JBlock of
>>> getEvalBlock()
>>> > > > cg.setCustomizedEvalInnerBlock(ifBlock);
>>> > > > LogicalExpression hashExpression =
>>> > > > HashPrelUtil.getHashExpression(expr, seed, incomingProbe !=
>>> null);
>>> > > > LogicalExpression materializedExpr =
>>> > > >
>>> ExpressionTreeMaterializer.materializeAndCheckErrors(hashExpression,
>>> > > batch,
>>> > > > context.getFunctionRegistry());
>>> > > > HoldingContainer hash = cg.addExpr(materializedExpr,
>>> > > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>> > > > ifBlock._return(hash.getValue());
>>> > > > //reset the customized block to null ,so the
>>> getEvalBlock() return
>>> > > the
>>> > > > truly eval JBlock
>>> > > > cg.setCustomizedEvalInnerBlock(null);
>>> > > > i++;
>>> > > > }
>>> > > > cg.getEvalBlock()._return(JExpr.lit(0));
>>> > > > }
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > public long getBuild64HashCodeInner(int incomingRowIdx, int
>>> seedValue,
>>> > > int
>>> > > > fieldId)
>>> > > > throws SchemaChangeException
>>> > > > {
>>> > > > {
>>> > > > IntHolder fieldId12 = new IntHolder();
>>> > > > fieldId12 .value = fieldId;
>>> > > > if (fieldId12 .value == constant14 .value) {
>>> > > > IntHolder out18 = new IntHolder();
>>> > > > {
>>> > > > out18 .value = vv15 .getAccessor().get((incomingRowIdx));
>>> > > > }
>>> > > > IntHolder seedValue19 = new IntHolder();
>>> > > > seedValue19 .value = seedValue;
>>> > > > // start of eval portion of hash32AsDouble function.
>>> //
>>> > > > IntHolder out20 = new IntHolder();
>>> > > > {
>>> > > > final IntHolder out = new IntHolder();
>>> > > > IntHolder in = out18;
>>> > > > IntHolder seed = seedValue19;
>>> > > >
>>> > > > Hash32WithSeedAsDouble$IntHash_eval: {
>>> > > > out.value =
>>> > > > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>> in.value,
>>> > > > seed.value);
>>> > > > }
>>> > > >
>>> > > > out20 = out;
>>> > > > }
>>> > > > // end of eval portion of hash32AsDouble function. //
>>> > > > return out20 .value;
>>> > > > }
>>> > > > return 0;
>>> > > > }
>>> > > > }
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >> HI Paul:
>>> > > >>
>>> > > >> Thanks for your enthusiasm. I have managed this skill
as you
>>> ever
>>> > > >> mentioned me at another mail thread. It's really helpful
>>> ,thanks for
>>> > > your
>>> > > >> valuable work.
>>> > > >>
>>> > > >> Now I have solved this tough problem by adding a customized
>>> JBlock
>>> > > >> member field to the ClassGenerator. So once you want the
>>> > getEvalBlock()
>>> > > of
>>> > > >> the ClassGenerator to return a inner customized JBlock
, then
>>> you set
>>> > > this
>>> > > >> member, if you want the method to return eval self JBlock
,
>>> you reset
>>> > > this
>>> > > >> member to null.
>>> > > >>
>>> > > >> Here is my changed setup method :
>>> > > >>
>>> > > >>
>>> > > >> private void setupGetBuild64Hash(ClassGenerator<HashTable>
cg,
>>> > > MappingSet incomingMapping, VectorAccessible batch,
>>> LogicalExpression[]
>>> > > keyExprs, TypedFieldId[] buildKeyFieldIds)
>>> > > >> throws SchemaChangeException {
>>> > > >> cg.setMappingSet(incomingMapping);
>>> > > >> if (keyExprs == null  keyExprs.length == 0) {
>>> > > >> cg.getEvalBlock()._return(JExpr.lit(0));
>>> > > >> }
>>> > > >> String seedValue = "seedValue";
>>> > > >> String fieldId = "fieldId";
>>> > > >> LogicalExpression seed =
>>> > ValueExpressions.getParameterExpression(seedValue,
>>> > > Types.required(TypeProtos.MinorType.INT));
>>> > > >>
>>> > > >> LogicalExpression fieldIdParamExpr = ValueExpressions.
>>> > > getParameterExpression(fieldId, Types.required(
>>> TypeProtos.MinorType.INT)
>>> > > );
>>> > > >> HoldingContainer fieldIdParamHolder =
>>> cg.addExpr(fieldIdParamExpr);
>>> > > >> int i = 0;
>>> > > >> for (LogicalExpression expr : keyExprs) {
>>> > > >> TypedFieldId targetTypeFieldId = buildKeyFieldIds[i];
>>> > > >> ValueExpressions.IntExpression targetBuildFieldIdExp
= new
>>> > >
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds()[0],
>>> > > ExpressionPosition.UNKNOWN);
>>> > > >>
>>> > > >> JFieldRef targetBuildSideFieldId =
>>> > cg.addExpr(targetBuildFieldIdExp,
>>> > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>> > > >> JBlock ifBlock = cg.getEvalBlock()._if(
>>> > >
>>> fieldIdParamHolder.getValue().eq(targetBuildSideFieldId))._then();
>>> > > >> //specify a special JBlock which is a inner one of
the
>>> eval block
>>> > > to the ClassGenerator to substitute the returned JBlock of
>>> getEvalBlock()
>>> > > >> cg.setCustomizedEvalInnerBlock(ifBlock);
>>> > > >> LogicalExpression hashExpression =
>>> > HashPrelUtil.getHashExpression(expr,
>>> > > seed, incomingProbe != null);
>>> > > >> LogicalExpression materializedExpr =
>>> ExpressionTreeMaterializer.
>>> > > materializeAndCheckErrors(hashExpression, batch,
>>> > > context.getFunctionRegistry());
>>> > > >> HoldingContainer hash = cg.addExpr(materializedExpr,
>>> > > ClassGenerator.BlkCreateMode.TRUE_IF_BOUND);
>>> > > >> ifBlock._return(hash.getValue());
>>> > > >> //reset the customized block to null ,so the
>>> getEvalBlock() return
>>> > > the truly eval JBlock
>>> > > >> cg.setCustomizedEvalInnerBlock(null);
>>> > > >> i++;
>>> > > >> }
>>> > > >> cg.getEvalBlock()._return(JExpr.lit(0));
>>> > > >> }
>>> > > >>
>>> > > >>
>>> > > >> The corresponding generated codes :
>>> > > >>
>>> > > >> public long getBuild64HashCodeInner(int incomingRowIdx,
>>> int
>>> > > seedValue, int fieldId)
>>> > > >> throws SchemaChangeException
>>> > > >> {
>>> > > >> {
>>> > > >> IntHolder fieldId12 = new IntHolder();
>>> > > >> fieldId12 .value = fieldId;
>>> > > >> if (fieldId12 .value == constant14 .value)
{
>>> > > >> IntHolder out18 = new IntHolder();
>>> > > >> {
>>> > > >> out18 .value = vv15 .getAccessor().get((
>>> > > incomingRowIdx));
>>> > > >> }
>>> > > >> IntHolder seedValue19 = new IntHolder();
>>> > > >> seedValue19 .value = seedValue;
>>> > > >> // start of eval portion of hash32AsDouble
>>> > > function. //
>>> > > >> IntHolder out20 = new IntHolder();
>>> > > >> {
>>> > > >> final IntHolder out = new IntHolder();
>>> > > >> IntHolder in = out18;
>>> > > >> IntHolder seed = seedValue19;
>>> > > >>
>>> > > >> Hash32WithSeedAsDouble$IntHash_eval: {
>>> > > >> out.value =
>>> > org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>> > > in.value, seed.value);
>>> > > >> }
>>> > > >>
>>> > > >> out20 = out;
>>> > > >> }
>>> > > >> // end of eval portion of hash32AsDouble
>>> function.
>>> > > //
>>> > > >> return out20 .value;
>>> > > >> }
>>> > > >> return 0;
>>> > > >> }
>>> > > >> }
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> Some other explanation:
>>> > > >> 1st : The if checking won't hurt the performance , as
I
>>> invoke this
>>> > > >> method column by column , so it's branch predication friendly.
>>> > > >> 2nd: I will use the murmur3_64 not the murmur3_32 ，since
the
>>> > efficient
>>> > > >> bloom filter algorithm needs the 64 bit hash code to avoid
the
>>> > conflict.
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>> Hi Weijie,
>>> > > >>>
>>> > > >>> Seeing the discussion about the details of JCodeModel
>>> suggests you
>>> > may
>>> > > >>> be trying to debug your generated code at the level
of the
>>> code
>>> > > generator.
>>> > > >>>
>>> > > >>> Some time ago we added the ability to step through
the
>>> generated
>>> > code.
>>> > > >>> Look for the following line in the generator code:
>>> > > >>>
>>> > > >>>
>>> > > >>> // Uncomment out this line to debug the generated
code.
>>> > > >>>
>>> > > >>> // cg.saveCodeForDebugging(true);
>>> > > >>>
>>> > > >>>
>>> > > >>> Uncomment the code line and Drill will save each generated
>>> file to a
>>> > > >>> configured location (which, if I recall correctly,
is
>>> > > /tmp/drill/codegen,
>>> > > >>> though it may have changed after Tim's test directory
>>> changes.)
>>> > > >>>
>>> > > >>> Then, set a breakpoint in the template setup() method
and
>>> you can
>>> > step
>>> > > >>> directly into the generated doSetup() method. Same
for the
>>> eval()
>>> > > method.
>>> > > >>>
>>> > > >>> This way, you can not only see the generated code,
you can
>>> step
>>> > through
>>> > > >>> it. I've found this to be a far easier way to understand
the
>>> > generated
>>> > > code
>>> > > >>> than the older techniques folks have used (look at
byte
>>> codes, use
>>> > > print
>>> > > >>> statements, brute force reasoning, etc.)
>>> > > >>>
>>> > > >>> Tim, Boaz and others have used this technique more
recently
>>> and can
>>> > > >>> probably give you additional pointers.
>>> > > >>>
>>> > > >>> Thanks,
>>> > > >>>  Paul
>>> > > >>>
>>> > > >>>
>>> > > >>>
>>> > > >>>
>>> > > >>> @aman thanks for your reply. "For the ifBlock, do
you need
>>> an
>>> > _else()
>>> > > >>> block
>>> > > >>> also ?" I give a default return logic at the method,
so I
>>> don't need
>>> > > the
>>> > > >>> _else() block. I have noticed the IfExpression's
evaluation
>>> method
>>> > at
>>> > > >>> EvaluationVisitor which also uses the JConditional
. But
>>> that also
>>> > > >>> doesn't
>>> > > >>> match my requirement. I think the key point here is
the
>>> > > >>> FunctionHolderExpression and ValueVectorReadExpression
will
>>> put their
>>> > > >>> corresponding generated codes to the eval method's
JBlock ,
>>> not our
>>> > > >>> specific IfBlock which is a inner block of the eval
method's
>>> JBlock .
>>> > > >>>
>>> > > >>> So it seems I should make some changes to the ClassGenerator
>>> to let
>>> > the
>>> > > >>> getEvalBlock return the IfBlock (maybe accurately
the
>>> JConditional's
>>> > > then
>>> > > >>> block) or implement some special FunctionHolderExpression
>>> > > >>> 、ValueVectorReadExpression and corresponding visiting
>>> methods at the
>>> > > >>> EvaluationVisitor to generate the special code blocks.
Hope
>>> someone
>>> > who
>>> > > >>> are
>>> > > >>> familiar with these part of codes to point out whether
there
>>> are more
>>> > > >>> easy
>>> > > >>> or different choices to achieve the target.
>>> > > >>>
>>> > > >>> To make discussion more accurate, I put the generated
codes
>>> of the
>>> > > >>> previous
>>> > > >>> setupGetBuild64Hash method here:
>>> > > >>>
>>> > > >>> public long getBuild64HashCodeInner(int incomingRowIdx,
>>> int
>>> > > >>> seedValue, int fieldId)
>>> > > >>> throws SchemaChangeException
>>> > > >>> {
>>> > > >>> {
>>> > > >>> IntHolder fieldId16 = new IntHolder();
>>> > > >>> fieldId16 .value = fieldId;
>>> > > >>> if (fieldId16 .value == constant18 .value)
{
>>> > > >>> return out24 .value;
>>> > > >>> }
>>> > > >>> IntHolder out22 = new IntHolder();
>>> > > >>> {
>>> > > >>> out22 .value = vv19 .getAccessor().get((
>>> > > incomingRowIdx));
>>> > > >>> }
>>> > > >>> IntHolder seedValue23 = new IntHolder();
>>> > > >>> seedValue23 .value = seedValue;
>>> > > >>> // start of eval portion of hash32AsDouble
>>> function.
>>> > > >>> //
>>> > > >>> IntHolder out24 = new IntHolder();
>>> > > >>> {
>>> > > >>> final IntHolder out = new IntHolder();
>>> > > >>> IntHolder in = out22;
>>> > > >>> IntHolder seed = seedValue23;
>>> > > >>>
>>> > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>> > > >>> out.value =
>>> > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>> > > >>> in.value, seed.value);
>>> > > >>> }
>>> > > >>>
>>> > > >>> out24 = out;
>>> > > >>> }
>>> > > >>> // end of eval portion of hash32AsDouble
>>> function.
>>> > > //
>>> > > >>> if (fieldId16 .value == constant18 .value)
{
>>> > > >>> return out26 .value;
>>> > > >>> }
>>> > > >>> IntHolder seedValue25 = new IntHolder();
>>> > > >>> seedValue25 .value = seedValue;
>>> > > >>> // start of eval portion of hash32AsDouble
>>> function.
>>> > > >>> //
>>> > > >>> IntHolder out26 = new IntHolder();
>>> > > >>> {
>>> > > >>> final IntHolder out = new IntHolder();
>>> > > >>> IntHolder in = out22;
>>> > > >>> IntHolder seed = seedValue25;
>>> > > >>>
>>> > > >>> Hash32WithSeedAsDouble$IntHash_eval: {
>>> > > >>> out.value =
>>> > > >>> org.apache.drill.exec.expr.fn.impl.HashHelper.hash32((double)
>>> > > >>> in.value, seed.value);
>>> > > >>> }
>>> > > >>>
>>> > > >>> out26 = out;
>>> > > >>> }
>>> > > >>> // end of eval portion of hash32AsDouble
>>> function.
>>> > > //
>>> > > >>> return 0;
>>> > > >>> }
>>> > > >>> }
>>> > > >>>
>>> > > >>>
>>> > > >>>
>>> > > >>>
>>> > > >>>
>>> > > >>>
>>> > > >>> > sorry, the previous email is incomplete.
>>> > > >>> > For the ifBlock, do you need an _else() block
also ?
>>> > > >>> >
>>> > > >>> > I have sometimes found that 'JConditional' is
a good way
>>> to break
>>> > > down
>>> > > >>> the
>>> > > >>> > logic further. Please see example usages of
JConditional
>>> here [1].
>>> > > >>> >
>>> > > >>> > Aman
>>> > > >>> >
>>> > > >>> > [1]
>>> > > >>> >
>>> > > >>> >
>>> > > >>>
>>> https://urldefense.proofpoint.com/v2/url?u=https3A__www.programcreek.com_java2Dapi2Dexamples_3Fapi3Dcom&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=O2Th00tVjOSHTLlOn_lFp8JiUlh_FueCbHs8giRVS3k&e=
>>> .
>>> > > sun.codemodel.JBlock
>>> > > >>> >
>>> > > >>> >
>>> > > >>> > > Hi Weijie,
>>> > > >>> > > It would be a little cumbersome to debug
such issues
>>> over email
>>> > > >>> since one
>>> > > >>> > > has to look at the generated code output
and iteratively
>>> debug.
>>> > > >>> > > Couple of thoughts I have that might help:
>>> > > >>> > >
>>> > > >>> > > For this particular ifthen block, should
you also
>>> > > >>> > > JBlock ifBlock =
>>> > > >>> > >
>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>> > > >>> > > tBuildSideFieldId))._then();
>>> > > >>> > >
>>> > > >>> > >
>>> > > >>> > >
>>> > > >>> > >
>>> > > >>> > >> HI All:
>>> > > >>> > >> Through implementing the JPPD feature
(
>>> > > >>> > >>
>>> https://urldefense.proofpoint.com/v2/url?u=https3A__issues.apache.org_jira_browse_DRILL2D6385&d=DwIFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=EqulKDxxEDCX6zbp1AZAa1iAPQGgCioAqgDp7DE2BU&m=doaiFF3edu9prktKvLSIoNdmzt_nV6nzCtF_ZGQRBk&s=FIkIkgR6E_qJADP1J55y11SgJZD8NyPaNv_AeTabiaY&e=)
>>> , I was
>>> > blocked
>>> > > >>> by
>>> > > >>> > the
>>> > > >>> > >> problem: how to get the hash code of
each build side of
>>> the hash
>>> > > >>> join
>>> > > >>> > >> columns through the dynamic generated
java code. Hope
>>> someone
>>> > can
>>> > > >>> give
>>> > > >>> > >> some
>>> > > >>> > >> advice.
>>> > > >>> > >>
>>> > > >>> > >> I supposed to add methods as below
to the
>>> HashTableTemplate :
>>> > > >>> > >>
>>> > > >>> > >> public long getBuild64HashCode(int incomingRowIdx,
int
>>> > seedValue,
>>> > > >>> int
>>> > > >>> > >> fieldId) throws SchemaChangeException{
>>> > > >>> > >> return getBuild64HashCodeInner(incomingRowIdx,
>>> seedValue,
>>> > > >>> fieldId);
>>> > > >>> > >> }
>>> > > >>> > >>
>>> > > >>> > >> protected abstract long
>>> > > >>> > >> getBuild64HashCodeInner(@Named("incomingRowIdx")
int
>>> > > incomingRowIdx,
>>> > > >>> > >> @Named("seedValue") int seedValue, @Named("fieldId")
int
>>> > fieldId)
>>> > > >>> > >> throws SchemaChangeException;
>>> > > >>> > >>
>>> > > >>> > >>
>>> > > >>> > >> The high level code to invoke the
getBuild64HashCode
>>> method
>>> > is
>>> > > >>> at the
>>> > > >>> > >> HashJoinBatch's executeBuildPhase()
:
>>> > > >>> > >>
>>> > > >>> > >> //create runtime filter
>>> > > >>> > >> if (cycleNum == 0 && enableRuntimeFilter)
{
>>> > > >>> > >> //create runtime filter and send out
async
>>> > > >>> > >> int condFieldIndex = 0;
>>> > > >>> > >> for (BloomFilter bloomFilter : bloomFilters)
{
>>> > > >>> > >> //VV
>>> > > >>> > >> for (int ind = 0; ind < currentRecordCount;
ind++) {
>>> > > >>> > >> long hashCode =
>>> partitions[0].getBuild64HashCode(ind,
>>> > > >>> > >> condFieldIndex);
>>> > > >>> > >> bloomFilter.insert(hashCode);
>>> > > >>> > >> }
>>> > > >>> > >> condFieldIndex++;
>>> > > >>> > >> }
>>> > > >>> > >> //TODO sered out async
>>> > > >>> > >> }
>>> > > >>> > >>
>>> > > >>> > >>
>>> > > >>> > >> As you know, the abstract method
>>> getBuild64HashCodeInner needs
>>> > to
>>> > > >>> > >> calculate the hash codes of each build
side column by
>>> the
>>> > fieldId
>>> > > >>> input
>>> > > >>> > >> parameter. In order to achieve this
target, I plan to
>>> have
>>> > > different
>>> > > >>> > >> solving parts corresponding to different
column
>>> ValueVector ,
>>> > > using
>>> > > >>> the
>>> > > >>> > if
>>> > > >>> > >> statement to distinguish different solving
parts
>>> through the id
>>> > of
>>> > > >>> the
>>> > > >>> > >> column. The corresponding method to
generate the
>>> dynamic codes
>>> > > is
>>> > > >>> as
>>> > > >>> > >> below:
>>> > > >>> > >>
>>> > > >>> > >> private void
>>> setupGetBuild64Hash(ClassGenerator<HashTable> cg,
>>> > > >>> > >> MappingSet incomingMapping, VectorAccessible
batch,
>>> > > >>> > >> LogicalExpression[] keyExprs, TypedFieldId[]
>>> buildKeyFieldIds)
>>> > > >>> > >> throws SchemaChangeException {
>>> > > >>> > >> cg.setMappingSet(incomingMapping);
>>> > > >>> > >> if (keyExprs == null  keyExprs.length
== 0) {
>>> > > >>> > >> cg.getEvalBlock()._return(JExpr.lit(0));
>>> > > >>> > >> }
>>> > > >>> > >> String seedValue = "seedValue";
>>> > > >>> > >> String fieldId = "fieldId";
>>> > > >>> > >> LogicalExpression seed =
>>> > > >>> > >> ValueExpressions.getParameterExpression(seedValue,
>>> > > >>> > >> Types.required(TypeProtos.MinorType.INT));
>>> > > >>> > >>
>>> > > >>> > >> LogicalExpression fieldIdParamExpr
=
>>> > > >>> > >> ValueExpressions.getParameterExpression(fieldId,
>>> > > >>> > >> Types.required(TypeProtos.MinorType.INT)
);
>>> > > >>> > >> HoldingContainer fieldIdParamHolder
=
>>> > > cg.addExpr(fieldIdParamExpr);
>>> > > >>> > >> int i = 0;
>>> > > >>> > >> for (LogicalExpression expr : keyExprs)
{
>>> > > >>> > >> TypedFieldId targetTypeFieldId =
buildKeyFieldIds[i];
>>> > > >>> > >> ValueExpressions.IntExpression targetBuildFieldIdExp
>>> = new
>>> > > >>> > >>
>>> ValueExpressions.IntExpression(targetTypeFieldId.getFieldIds(
>>> > > )[0],
>>> > > >>> > >> ExpressionPosition.UNKNOWN);
>>> > > >>> > >> JFieldRef targetBuildSideFieldId
=
>>> > > >>> > >> cg.addExpr(targetBuildFieldIdExp,
>>> > > >>> > >> ClassGenerator.BlkCreateMode.TRUE_IF_BOUND).getValue();
>>> > > >>> > >> JBlock ifBlock =
>>> > > >>> > >>
>>> cg.getEvalBlock()._if(fieldIdParamHolder.getValue().eq(targe
>>> > > >>> > >> tBuildSideFieldId))._then();
>>> > > >>> > >>
>>> > > >>> > >> LogicalExpression hashExpression
=
>>> > > >>> > >> HashPrelUtil.getHashExpression(expr,
seed,
>>> incomingProbe !=
>>> > > null);
>>> > > >>> > >> LogicalExpression materializedExpr
=
>>> > > >>> > >> ExpressionTreeMaterializer.materializeAndCheckErrors(
>>> > > hashExpression,
>>> > > >>> > >> batch, context.getFunctionRegistry());
>>> > > >>> > >> HoldingContainer hash = cg.addExpr(materializedExpr,
>>> > > >>> > >> ClassGenerator.BlkCreateMode.FALSE);
>>> > > >>> > >>
>>> > > >>> > >>
>>> > > >>> > >> ifBlock._return(hash.getValue());
>>> > > >>> > >> i++;
>>> > > >>> > >> }
>>> > > >>> > >> cg.getEvalBlock()._return(JExpr.lit(0));
>>> > > >>> > >>
>>> > > >>> > >> }
>>> > > >>> > >>
>>> > > >>> > >> But unfortunately, the generated codes
are not what I
>>> expected.
>>> > > The
>>> > > >>> > codes
>>> > > >>> > >> to read ValueVector , calculate hash
code of the read
>>> value do
>>> > not
>>> > > >>> stay
>>> > > >>> > in
>>> > > >>> > >> the if block. So how can I let the
related codes stay
>>> in the if
>>> > > >>> block ?
>>> > > >>> > >>
>>> > > >>> > >
>>> > > >>> > >
>>> > > >>> >
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>>
>>>
>>>
