trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Birdsall <dave.birds...@esgyn.com>
Subject RE: Question: Multi-column histograms for key columns on salted tables
Date Fri, 12 Feb 2016 18:19:37 GMT
Hi Qifan,

Thanks! So, we were making a trade-off of reducing the number of
multi-column histograms on salted tables gaining a reduction in UPDATE STATS
overhead.

Since that time, some improvements have been made to UPDATE STATS elapsed
time particularly for large tables. (I'm thinking primarily of JIRA
TRAFODION-1740, where we increase the degree of parallelism used by the
sampling query.) Perhaps this mitigates all or in part the original concern?

I'd like to propose leaving the CQD in the code, but changing the default to
'ON'. Does this sound reasonable?

Hi all,

A bit more background for others new to this issue.

What we are describing here is the default behavior of UPDATE STATISTICS ON
EVERY KEY and UPDATE STATISTICS ON EVERY COLUMN. These two ON clauses are
syntactic sugars. Without salting (and before salting), the behavior of ON
EVERY KEY was: create a single-column histogram on each key column, and
multi-column histograms for prefixes of the key (so, the first two columns,
the first three, the first four, and up to five). ON EVERY COLUMN was
similar: It does a single column histogram on every column, along with the
key prefixes that ON EVERY KEY provides.

For salted tables, this behavior was changed to only do a single
multi-column histogram on the entire primary key.

Now, one could still get the old behavior by either flipping a CQD, or by
specifying the histograms directly (e.g., UPDATE STATSITICS ON ("_SALT_",A),
("_SALT_",A,B), etc.)

So we are only talking about what semantics to associate with a syntactic
sugar. (A very convenient syntactic sugar though.)

The trade-off made was to reduce UPDATE STATS overhead for salted tables by
doing fewer multi-column histograms by default, at the price of possibly bad
query plans (which is the complaint of JIRA TRAFODION-1467). The question
here is, was this the right trade-off? And now that UPDATE STATS elapsed
times have improved for large tables, should we trade-off the other way?
That's the proposal I make above.

Thanks,

Dave

-----Original Message-----
From: Qifan Chen [mailto:qifan.chen@esgyn.com]
Sent: Friday, February 12, 2016 9:15 AM
To: dev <dev@trafodion.incubator.apache.org>
Subject: Re: Question: Multi-column histograms for key columns on salted
tables

Hi Dave,

The intension was to reduce the load on US and Barry and I figured the best
way is to use the CQD to control it.

thanks --Qifan

On Fri, Feb 12, 2016 at 11:08 AM, Dave Birdsall <dave.birdsall@esgyn.com>
wrote:

> Hi,
>
>
>
> Reaching out to the development community to learn a little history.
>
>
>
> For a single-partitioned table T, with primary key columns A, B, C, if
> I do an UPDATE STATISTICS ON EVERY KEY (or ON EVERY COLUMN), I will
> get multi-column histograms for prefixes of the key. So, I’ll get a
> multi-column histogram for (A,B) and for (A,B,C).
>
>
>
> However, if I now salt that table, then an additional column,
> “_SALT_”, is prepended to the primary key.
>
>
>
> And the behavior of UPDATE STATISTICS is different. Instead of getting
> multi-column histograms for (“_SALT_”,A), (“_SALT_”,A,B), and
> (“_SALT_”,A,B,C), I get only (“_SALT_”,A,B,C).
>
>
>
> Indeed, JIRA TRAFODION-1467 complains about this very behavior. See
> https://issues.apache.org/jira/browse/TRAFODION-1467.
>
>
>
> Now, I was looking into fixing this JIRA, and I found the following
> code in
> sql/ustat/hs_parser.cpp:
>
>
>
>         // For salted table, generate only the longest MC for the key
> (subject
>
>         // to max cols determined above) unless a cqd is set to gen
> all MCs of
>
>         // allowable sizes.
>
>         if
> (CmpCommon::getDefault(USTAT_ADD_SALTED_KEY_PREFIXES_FOR_MC) == DF_OFF
> &&
>
>             hs_globals->objDef->getColNum("_SALT_", FALSE) >= 0)
>
>           minMCGroupSz = numKeys;
>
>
>
> And indeed this CQD has the default value of ‘OFF’.
>
>
>
> So it appears this behavior is intentional.
>
>
>
> My question is, why? On the surface it seems a bit arbitrary to me.
>
>
>
> Depending on the answer, I’ll address the JIRA in one of the following
> ways:
>
>
>
> 1.       Say, it’s working as intended. And if you want all the
> multi-column histograms, set CQD USTAT_ADD_SALTED_KEY_PREFIXES_FOR_MC ‘ON’
> prior to issuing the UPDATE STATISTICS command.
>
> 2.       Change the default to ‘ON’. And you’ll have to set CQD
> USTAT_ADD_SALTED_KEY_PREFIXES_FOR_MC ‘OFF’ to reduce the number of
> multi-column histograms on key prefixes in salted tables.
>
> 3.       Remove the CQD altogether if there was no good reason for it.
> Which would give us the ‘ON’ behavior all the time.
>
>
>
> Dave
>



--
Regards, --Qifan

Mime
View raw message