trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Birdsall <dave.birds...@esgyn.com>
Subject RE: trying to understand MDAM found something weird, is that a bug...
Date Tue, 08 Mar 2016 18:07:02 GMT
Hi,

Ah, you were concerned with the sometimes duplicate and sometimes
contradictory predicates involving K1 and _sys_HostVarHi0 in the
mdam_disjuncts.

I don't know the details of how these get generated but will speculate they
get there as a result of conversion to disjunctive normal form.

In any case, they are not a burden at run-time. At run-time when the MDAM
network is constructed, the duplicates are eliminated and the contradictory
predicates are found and removed. So, once we start MDAM traversal, all
these redundant and contradictory predicates have been removed and do not
affect scan time.

Dave

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, March 8, 2016 9:43 AM
To: 'dev@trafodion.incubator.apache.org'
<dev@trafodion.incubator.apache.org>
Subject: RE: trying to understand MDAM found something weird, is that a
bug...

Hi Eric,

Unfortunately the red color doesn't come through on the dev list. Can you
supply which text you're interested in?

Dave

-----Original Message-----
From: Hans Zeller [mailto:hans.zeller@esgyn.com]
Sent: Tuesday, March 8, 2016 9:24 AM
To: dev <dev@trafodion.incubator.apache.org>
Subject: Re: trying to understand MDAM found something weird, is that a
bug...

Hi Eric,

You probably know all of this, but just to recap the entire process: To
partition the scan into ranges for parallel execution, this "partition key
predicate" gets added:

 (k1, k2) >= (:_sys_hostVarLo0, :_sys_hostVarLo1) and
  case when :_sys_hostVarExclRange
          then (k1, k2) < (:_sys_hostVarHi0, :_sys_HostVarHi1)
          else (k1, k2) <= (:_sys_hostVarHi0, :_sys_HostVarHi1)

The case expression handles the case of the last range that needs to include
the highest key value, while all other ranges exclude the high range. At
runtime, each ESP gets a different set of these host variables, so that it
reads a different range.

These multi-valued comparisons then get converted into an equivalent form of
ANDs and ORs. That is shown as the executor predicate.

Then, that predicate, plus the user predicate (k2 between 10 and 20) get
transformed into disjunctive normal form. I assume it's possible that during
this process some disjuncts get created that always evaluate to TRUE or
FALSE. We could try to detect those, but it's probably not worth the
trouble. Dave may have more insights into this.

I didn't go though the exercise of doing these steps by hand and validating
them all.

Hans

On Tue, Mar 8, 2016 at 9:01 AM, Eric Owhadi <eric.owhadi@esgyn.com> wrote:

> Hello Trafodioneers,
>
>
>
> Trying to learn more about mdam, so tried the following on my deb
> build…
>
>
>
> create table t132helper (a int not null, primary key(a));
>
> insert into t132helper values(1);
>
> create table t132 (k1 int not null, k2 int not null, a int not null, b
> int not null,
>
>      c
> char(1000),
> primary key (k1,k2)) ATTRIBUTES ALIGNED FORMAT ;
>
> upsert using load
>
> into t132
>
>   select  x1000*1000+ x10000*10000 + x100000*100000 ,
>
>           x1+x10*10+x100*100,
>
>           x1+x10*10+x100*100+ x1000*1000+ x10000*10000 +
> x100000*100000 ,
>
>           x1+x10*10+x100*100+ x1000*1000+ x10000*10000 +
> x100000*100000 ,
>
>           'yo bro'
>
>
>
>   from t132helper
>
> transpose 0,1,2,3,4,5,6,7,8,9 as x1
>
> transpose 0,1,2,3,4,5,6,7,8,9 as x10
>
> transpose 0,1,2,3,4,5,6,7,8,9 as x100
>
> transpose 0,1,2,3,4,5,6,7,8,9 as x1000
>
> transpose 0,1,2,3,4,5,6,7,8,9 as x10000
>
> transpose 0,1,2,3,4,5,6,7,8,9 as x100000;
>
> update statistics for table t132 on every column;
>
>
>
>
>
> then I do:
>
> explain select avg(a) from t132 where k2 between 10 and 20;
>
> ------------------------------------------------------------------
> PLAN SUMMARY
>
> MODULE_NAME .............. DYNAMICALLY COMPILED
>
> STATEMENT_NAME ........... XX
>
> PLAN_ID .................. 212324155011172566
>
> ROWS_OUT ................. 1
>
> EST_TOTAL_COST .......... 17.64
>
> STATEMENT ................ select avg(a) from t132 where k2 between 10
> and 20;
>
>
>
>
>
> ------------------------------------------------------------------
> NODE LISTING
>
> ROOT ======================================  SEQ_NO 5        ONLY CHILD 4
>
> REQUESTS_IN .............. 1
>
> ROWS_OUT ................. 1
>
> EST_OPER_COST ............ 0
>
> EST_TOTAL_COST .......... 17.64
>
> DESCRIPTION
>
>   max_card_est ........... 1
>
>   fragment_id ............ 0
>
>   parent_frag ............ (none)
>
>   fragment_type .......... master
>
>   statement_index ........ 0
>
>   affinity_value ......... 0
>
>   max_max_cardinalit  11,990
>
>   total_overflow_size .... 0.00 KB
>
>   esp_2_node_map ......... (\NSK:-1:-1)
>
>   xn_access_mode ......... read_only
>
>   xn_autoabort_interval    0
>
>   auto_query_retry ....... enabled
>
>   plan_version ....... 2,600
>
>   embedded_arkcmp ........ used
>
>   IS_SQLCI ............... ON
>
>   LDAP_USERNAME
>
>   ObjectUIDs ............. 3134597362287934668
>
>  select_list ............
> cast(cast(cast((cast((cast((cast(sum(sum(TRAFODION.S
>
>                              CH.T132.A))) * 10000 ...0)) /
> cast(sum(count(1
>
>                              ))))) / 10000 ...0))))
>
>
>
>
>
> SORT_PARTIAL_AGGR_ROOT ====================  SEQ_NO 4        ONLY CHILD 3
>
> REQUESTS_IN .............. 1
>
> ROWS_OUT ................. 1
>
> EST_OPER_COST ............ 0.01
>
> EST_TOTAL_COST .......... 17.64
>
> DESCRIPTION
>
>   max_card_est ........... 1
>
>   fragment_id ............ 0
>
>   parent_frag ............ (none)
>
>   fragment_type .......... master
>
>   aggregates ............. sum(sum(TRAFODION.SCH.T132.A)), sum(count(1
> ))
>
>
>
>
>
> ESP_EXCHANGE ==============================  SEQ_NO 3        ONLY CHILD 2
>
> REQUESTS_IN .............. 1
>
> ROWS_OUT ................. 1
>
> EST_OPER_COST ............ 0.01
>
> EST_TOTAL_COST .......... 17.64
>
> DESCRIPTION
>
>   max_card_est ........... 1
>
>   fragment_id ............ 2
>
>   parent_frag ............ 0
>
>   fragment_type .......... esp
>
>   buffer_size ........ 6,250
>
>   record_length ......... 24
>
>   parent_processes ....... 1
>
>   child_processes ........ 2
>
>   child_partitioning_func  range partitioned 2 ways on
> (TRAFODION.SCH.T132.K1,
>
>                              TRAFODION.SCH.T132.K2) with
> boundaries(c(<min>)
>
>                              c(<min>) ;c(281000) c(571) )
>
>   seamonster_query ....... no
>
>   seamonster_exchange .... no
>
>
>
>
>
> SORT_PARTIAL_AGGR_LEAF ====================  SEQ_NO 2        ONLY CHILD 1
>
> REQUESTS_IN .............. 1
>
> ROWS_OUT ................. 1
>
> EST_OPER_COST ............ 0.01
>
> EST_TOTAL_COST .......... 17.64
>
> DESCRIPTION
>
>   max_card_est ........... 1
>
>   fragment_id ............ 2
>
>   parent_frag ............ 0
>
>   fragment_type .......... esp
>
>   aggregates ............. sum(TRAFODION.SCH.T132.A), count(1 )
>
>
>
>
>
> TRAFODION_SCAN ============================  SEQ_NO 1        NO CHILDREN
>
> TABLE_NAME ............... T132
>
> REQUESTS_IN .............. 1
>
> ROWS_OUT ............ 11,990
>
> EST_OPER_COST ........... 17.64
>
> EST_TOTAL_COST .......... 17.64
>
> DESCRIPTION
>
>   max_card_est ...... 11,990
>
>   fragment_id ............ 2
>
>   parent_frag ............ 0
>
>   fragment_type .......... esp
>
>   scan_type .............. subset scan limited by mdam of table
>
>                              TRAFODION.SCH.T132
>
>   object_type ............ Trafodion
>
>   cache_size ......... 5,995
>
>   probes ................. 1
>
>   rows_accessed ..... 11,990
>
>   column_retrieved ....... #1:1
>
>   key_columns ............ K1, K2
>
>   executor_predicates .... ((K1 < \:_sys_HostVarHi0) or (K1 =
>
>                              \:_sys_HostVarHi0) and
> case(if_then_else((0 <>
>
>                              \:_sys_hostVarExclRange),
>
>                              (K2 < \:_sys_HostVarHi1),
>
>                              (K2 <= \:_sys_HostVarHi1)))) and ((K1 >
>
>                              \:_sys_HostVarLo0) or (K1 =
> \:_sys_HostVarLo0) and
>
>                              (K2 >= \:_sys_HostVarLo1)) and ((K1 <
>
>                              \:_sys_HostVarHi0) or (K1 =
> \:_sys_HostVarHi0) and
>
>                              (K2 <= \:_sys_HostVarHi1))
>
>   mdam_disjunct .......... (K1 < \:_sys_HostVarHi0) and (K1 >
>
>                              \:_sys_HostVarLo0) and (K1 <
> \:_sys_HostVarHi0)
> -> duplicate identical
>
>                              and (K2 >= 10) and (K2 <= 20)
>
>   mdam_disjunct .......... (K1 < \:_sys_HostVarHi0) and (K1 >
>
>                              \:_sys_HostVarLo0) and (K1 =
> \:_sys_HostVarHi0)
> -> K1 cannot be < and = to the same value at the same time, so all
> -> this
> expression is always false
>
>                              and (K2 <= \:_sys_HostVarHi1) and (K2 >=
> 10) and
>
>                              (K2 <= 20)
>
>   mdam_disjunct .......... (K1 < \:_sys_HostVarHi0) and (K1 =
> ->duplicate identical
>
>                              \:_sys_HostVarLo0) and (K2 >=
> \:_sys_HostVarLo1)
>
>                              and (K1 < \:_sys_HostVarHi0) and (K2 >=
> 10) and
>
>                              (K2 <= 20)
>
>   mdam_disjunct .......... (K1 < \:_sys_HostVarHi0) and (K1 = -> K1
> cannot be < and = to the same value at the same time, so all this
> expression is always false
>
>                              \:_sys_HostVarLo0) and (K2 >=
> \:_sys_HostVarLo1)
>
>                              and (K1 = \:_sys_HostVarHi0) and (K2 <=
>
>                              \:_sys_HostVarHi1) and (K2 >= 10) and (K2
> <=
> 20)
>
>   mdam_disjunct .......... (K1 = \:_sys_HostVarHi0) and (K1 >-> K1
> cannot be < and = to the same value at the same time, so all this
> expression is always false
>
>                              \:_sys_HostVarLo0) and (K1 <
> \:_sys_HostVarHi0)
>
>                              and (K2 >= 10) and (K2 <= 20)
>
>   mdam_disjunct .......... (K1 = \:_sys_HostVarHi0) and (K1 >
>
>                              \:_sys_HostVarLo0) and (K1 =
> \:_sys_HostVarHi0)-> duplicate identical
>
>                              and (K2 <= \:_sys_HostVarHi1) and (K2 >=
> 10) and
>
>                              (K2 <= 20)
>
>   mdam_disjunct .......... (K1 = \:_sys_HostVarHi0) and (K1 =
>
>                              \:_sys_HostVarLo0) and (K2 >=
> \:_sys_HostVarLo1)
>
>                              and (K1 < \:_sys_HostVarHi0) and (K2 >=
> 10)
> and->
> K1 cannot be < and = to the same value at the same time, so all this
> expression is always false
>
>                              (K2 <= 20)
>
>   mdam_disjunct .......... (K1 = \:_sys_HostVarHi0) and (K1 =
> ->duplicate identical
>
>                              \:_sys_HostVarLo0) and (K2 >=
> \:_sys_HostVarLo1)
>
>                              and (K1 = \:_sys_HostVarHi0) and (K2 <=
>
>                              \:_sys_HostVarHi1) and (K2 >= 10) and (K2
> <=
> 20)
>
>   part_key_predicates .... (K2 >= 10) and (K2 <= 20)
>
>
>
> --- SQL operation complete.
>
>
>
>
>
> So I am not sure I fully understand how to read an mdam_disjunct, but
> I am sure I am struggling to make sense out of the stuff in red?
>
> Can someone help me understand please,
>
> Thanks,
> Eric
>

Mime
View raw message