db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "XmlOptimizerTracing" by RichardHillegas
Date Fri, 13 Dec 2013 18:08:49 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The "XmlOptimizerTracing" page has been changed by RichardHillegas:

New page:
= Overview =

This web page describes how to use the optional optimizerTracing and optimizerTracingViews
tools to produce and view an xml-formatted trace of the optimizer's analysis of a query. This
page starts out with a short example of how to use the tools. A more detailed explanation
of the tools follows the example.

These tools were introduced by work on [[https://issues.apache.org/jira/browse/DERBY-6211|DERBY-6211]].


= Example =

You load the optimizerTracing tool as follows:

call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml'

Then you issue the queries which you want to trace. When you are done tracing queries, you
unload the tool as follows...

call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'myTrace.xml' );

...where myTrace.xml is the name of the file which will hold the trace output. You can then
browse the trace file with your favorite tool. I like using the Firefox browser because it
lets me expand and collapse xml elements in order to highlight important information and suppress

At this point, you can load the optimizerTracingViews tool, which lets you search and filter
the trace output using SQL. The tool installs the planCost view (backed by a table function).
By querying this view, you can display the plan shapes which the optimizer considered and
the costs which the optimizer calculated for those plans. Here's how you load this tool...

call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'myTrace.xml' );

...where myTrace.xml is the name of the trace output file dumped by the optimizerTracing tool.
A sample query against planCost might look like this:

select distinct summary, estimatedCost from planCost
where complete and qbID = 1
order by summary;

When you are done querying plan shapes and costs, you unload the planCost view as follows:

call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

The rest of this web page explains these tools in greater detail.

= Concepts =

Before describing the tools, it's useful to understand a couple concepts which are important
to the Derby optimizer:

 * '''Statement''' - The optimizer performs a separate analysis for each statement. Once a
statement has been optimized, its compiled plan is stored in the Derby statement cache. The
next time you run the statement, compilation will be short-circuited and the already-compiled
plan will simply be read from the statement cache. Short-circuiting applies to optimization
too. If the statement has already been compiled, then optimization will be skipped. You can
force recompilation by adding a vacuous space somewhere in the statement text.
 * '''Query Block''' - The optimizer further divides each statement into one or more query
blocks and then optimizes each query block separately. Query blocks are created in the following
   * Each branch of a UNION is treated as a separate query block.
   * Materialized views are treated as separate query blocks.
   * Materialized subqueries are treated as separate query blocks.
   * Outer joins which can't be transformed into inner joins are treated as separate query
 * '''Optimizable''' - For each query block, the optimizer analyzes the query block's row
sources, called optimizables. These are the tables, table functions, and nested query blocks
which must be joined.
 * '''Join Order''' - For each query block, the optimizer considers all possible left-to-right
orders of the optimizables. A left-to-right join order gives rise to a corresponding left-deep
join tree. Join orders start out as partial, incomplete lists of optimizables and are built
up incrementally. If a partial join order survives short-circuiting (described below), the
optimizer adds another optimizable to the right end of the partial join order and continues
 * '''Slot''' - Each position in the join order is called a slot.
 * '''Conglomerate''' - A table is stored on disk as a number of files, also called conglomerates.
Each table has a heap conglomerate, which is just the raw, unordered set of rows. There is
a separate, btree conglomerate for every index on the table. There is also a separate, btree
conglomerate for every primary, unique, and foreign key on the table.
 * '''Join strategy''' - A join strategy indicates how an optimizable joins to the optimizables
to its left in the join order. Derby currently supports two join strategies: !NestedLoop and
!HashJoin. The !NestedLoop strategy means that the whole optimizable is read for every composite
row which percolates up out of the slots to the left. The !HashJoin strategy means that the
whole optimizable is read once and cached in a temporary table (hopefully in memory) indexed
by some useful key; as each composite row percolates up out of the joined slots to the left,
a key is built from the left row and used to probe into the temporary table on the right.
 * '''Decoration''' - For each slot in a join order, the optimizer considers every possible
combination of conglomerate and join strategy. Conglomerates are only relevant for optimizables
which are tables. Join strategies are only relevant for the right slots of the join order;
the leftmost slot does not have a meaningful join strategy. A ( conglomerate, join strategy
) pair is called a decoration.
 * '''Short-circuiting''' - Most join orders are never completely evaluated. The cost of a
partial join order is always less than the cost of a more complete join order. Once the optimizer
finds the cheapest set of decorations for a partial join order, the optimizer compares that
cost to the the cheapest complete plan found so far. The optimizer abandons the whole join
order if the cheapest, partial decorated join order costs more than the cheapest complete
join order found so far.

= XML Elements =

The optimizerTracing tool produces xml output which includes the following important elements:

 * '''optimizerTrace''' - This is the outermost (top) element.
 * '''statement''' - The top element's children are all statement elements. Each of these
represents a single SQL statement.
 * '''queryBlock''' - The statement element's children are its query blocks. Each query block
is represented by its own queryBlock element.
 * '''joinOrder''' - A queryBlock element has several kinds of child elements. The most important
are the joinOrder elements. Each partial or complete join order considered by the optimizer
is represented by a separate joinOrder element.
 * '''decoration''' - Under the joinOrder element are a series of decoration child elements,
one for each decoration considered for a slot in the join order.
 * '''planCost''' - For every partial or complete join order which it considers, the optimizer
calculates a cost. This is represented as a planCost element, a child of the joinOrder element.
For every query block, the optimizer remembers the cheapest plan found for it. This is represented
as a planCost child of the queryBlock. So planCost elements appear as children of both joinOrder
and queryBlock elements.
 * '''pcSummary''' - Every partial or complete plan can be viewed as a human readable summary.
These are represented as pcSummary elements. They are children of the corresponding planCost

A plan summary (pcSummary element) looks like this:


It shows how the optimizables join to each other, including the details of the chosen decorations.
Tables appear as schema-qualified conglomerate names. The * operator means !NestedLoop and
the # operator means !HashJoin. Parentheses indicate the left-deep nesting of the plan.

= The planCost View =

The optimizerTracingViews tool creates a planCost view on the optimizer trace file. The rows
in the view represent information from the planCost elements in the trace file. That view
has the following shape:

 * '''text''' varchar( 32672 ) - The SQL text of the statement being traced.
 * '''stmtID''' int - An id for the statement. Statements are numbered starting at 1.
 * '''qbID''' int - An id for a query block inside a statement. Query blocks are numbered
starting at 1.
 * '''complete''' boolean - True if the planCost element corresponds to a complete join order,
that is, one representing all of the optimizables in the query block.
 * '''summary''' varchar( 32672 ) - The compact pcSummary text described above.
 * '''type''' varchar( 50 ) - The type of plan being considered. Costs are calculated for
various types of plans, including "withoutSortAvoidance" and "withSortAvoidance". For each
query block, a "bestPlan" is marked.
 * '''estimatedCost''' double - The cost which the optimizer calculated for the plan. Higher
numbers represent more expensive (less desirable) plans.
 * '''estimatedRowCount''' bigint - The number of rows which the optimizer thinks will be
returned by the partial or complete plan.

= Sample Script =

Here is a sample script showing how to use these tools:

connect 'jdbc:derby:trunk/generated/toursdb/toursdb';

call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

select * from cities, countries
where cities.country_iso_code = countries.country_iso_code
and 1=2
order by city_name;

select *
from sys.sysschemas s, sys.systables t, sys.syscolumns c
where s.schemaid = t.schemaid
and t.tableid = c.referenceid
and 1=2
order by s.schemaname, t.tablename, c.columnname;

call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'myTrace.xml' );

call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'myTrace.xml' );

select distinct summary, estimatedCost from planCost
where complete and qbID = 1
order by summary;

call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

View raw message