trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Birdsall <>
Subject RE: A chicken-and-egg issue with metadata upgrade
Date Fri, 01 Jul 2016 20:51:24 GMT
Hi all,

I went back and repeated my repository upgrade tests more carefully. I
discovered that the code actually drops the repository tables and then
simply creates new ones. If an error occurs, the new tables are dropped. In
either case, none of the old repository data is retained. I'm guessing when
I did my first round of unit tests I had empty repository tables (I tend to
test using sqlci rather than ODBC/JDBC), so I would have had empty
repository tables to begin with.

It seems that one of the major design goals of upgrade is that on any
failure, things get put back to their original state.

That suggests that I really need to use the same design for repository and
privilege manager tables as we use for metadata: rename the old ones, create
the new (e.g. via vanilla "initialize trafodion"), copy the data from old to
new, then drop the old. In this way, if an error occurs before the "drop the
old" step, we can drop the new tables and rename the old back to what they
were, getting our original state back.

That corresponds to approach #2 in my original e-mail.

This will take a week or so to do; there's some refactoring to be done.

So, if you would like to do work in the upgrade area, please co-ordinate
with me so we can avoid code conflicts.



-----Original Message-----
From: Dave Birdsall []
Sent: Friday, July 1, 2016 11:29 AM
To: ''
Subject: RE: A chicken-and-egg issue with metadata upgrade

The "initialize trafodion, upgrade" executes in the child tdm_arkcmp
process. The "initialize trafodion" from step d. executes in the grand child
tdm_arkcmp process. It does not know that it is underneath an "initialize
trafodion, upgrade". Which is probably a good thing from a modularity

-----Original Message-----
From: Roberta Marton []
Sent: Friday, July 1, 2016 11:27 AM
Subject: RE: A chicken-and-egg issue with metadata upgrade

Does the "initialize trafodion" performed in step 4 run in the master
executor or in a compiler process?
If in the master executer, are there any session attributes that indicates
that we are doing an upgrade and what state we are executing?
If so, could the privilege manager code check this state and just return
instead of doing the upgrade in step 4.


-----Original Message-----
From: Dave Birdsall []
Sent: Friday, July 1, 2016 10:55 AM
Subject: A chicken-and-egg issue with metadata upgrade


This e-mail concerns

Roberta Marton pointed out to me (privately) that I needed to test
“initialize trafodion, upgrade” for the case where privileges had been
previously enabled.

This morning I did this, and I ran into the following failure:

>>initialize trafodion, upgrade;

Metadata Upgrade: started

Version Check: started

  Metadata need to be upgraded from Version 1.0.1 to 2.1.0.

  Upgrade needed for Catalogs,  Privileges, Repository.

Version Check: done

Drop Old Metadata: started

Drop Old Metadata: done

Backup Current Metadata: started

Backup Current Metadata: done

Drop Current Metadata: started

Drop Current Metadata: done

Initialize New Metadata: started

Restore from Old Metadata: started

Restore from Old Metadata: done

Drop Old Metadata: started

Drop Old Metadata: done

Metadata Upgrade: failed

HBase. This could be due to a concurrent transactional ddl operation in
progress on this table.

--- SQL operation failed with errors.


I debugged the failure and analyzed the cause. Here’s what happens.

The “initialize trafodion, upgrade” logic is implemented in
(sqlcomp/CmpSeabaseDDLupgrade.cpp). It implements a state machine that
roughly speaking does this:

a.       Checks current version information, determining if the metadata
needs upgrading and if so whether this software knows how to upgrade it

b.      Assuming the answers in step a are both “Yes”, uses HBase snapshots
to make a copy of the existing metadata. The new tables created have “old”

c.       Drops the current metadata tables.

d.      Does an “INITIALIZE TRAFODION”. So a completely new set of tables
is created. (There is no optimization of creating just the changed tables,
for example. Keeps it simple.)

e.      Copies the data from the old metadata tables to the new ones. The
DML to do this is pre-defined (where?)

f.        Customize the new metadata as needed. (examples?)

g.       Validate that the copy was successful. At the moment, this step is

h.      Delete information about the old metadata tables from the new
metadata tables using SQL DELETEs on object_uid.

i.         Using HBase drop, drop the old metadata tables.

j.        Update metadata views as needed.

k.       Upgrade privilege manager tables as needed.

l.         Update repository tables as needed.

m.    Update the version info in the VERSION table.

n.      Report success

The state machine architecture was chosen so that status messages could be
returned to the caller (sqlci or trafci say) as the steps are progressing.
That is, the method CmpSeabaseMDupgrade::executeSeabaseMDupgrade returns to
its caller whenever it has something interesting to report. The SQL executor
then redrives CmpSeabaseMDupgrade::executeSeabaseMDupgrade with another call
to move to the next step or substep.

The thing to notice is steps d., e. and k. When we get to step d., we do an
“initialize trafodion” DDL statement. This is a common recursive technique
used in many places in DDL processing. For example, “drop schema cascade”
under the covers executes “drop table” statements for any tables that exist
in a schema.

The “initialize trafodion” logic creates new metadata tables of the proper
shape. But it also tries to create privilege manager tables. The privilege
manager code in PrivMgrMDAdmin::initializeMetadata (sqlcomp/PrivMgrMD.cpp)
looks in the metadata to see if the privilege manager tables exist, and if
they don’t, tries to create them.

And herein lies the problem. This logic is being executed as part of step d.
in the state machine. At this point, new metadata tables have been created,
but they are pristine. Knowledge about other tables that existed at the time
of the upgrade has not been copied into them yet. That happens in step e.

So, the privilege manager tables do in fact exist in HBase but not in the
metadata. So, when PrivMgrMDAdmin::initializeMetadata tries to create the
first one, it gets a 1431 error. Which causes the upgrade to fail.

It looks like the state machine isn’t expecting to upgrade the privilege
manager tables until step k. If we had gotten that far, I think it would
have worked, because the metadata about existing privilege manager tables
would have been copied in step e. to the new metadata tables.

I am guessing this problem wasn’t detected before because there was no
privilege manager upgrade logic at the time of the last Trafodion metadata
upgrade. So this chicken-and-egg problem was not realized.

How to fix it?

I can think of a few approaches.

1.       Change the “initialize trafodion” logic so that if it gets a 1431
error from PrivMgrMDAdmin::initializeMetadata, it simply ignores it and
moves on. I explored this idea some, but don’t like it. The error is
detected a few layers down from the
CmpSeabaseMDupgrade::executeSeabaseMDupgrade logic. Seems like too much risk
of there being detritus laying around that doesn’t get cleaned up.

2.       Treat the privilege manager tables at the same time as the
metadata tables. That is, in step b. make snapshots of the privilege manager
tables, in step c. drop the current ones, and so on. This is fairly major
surgery on the existing code. I’d like to find an easier way.

3.       Add a new option to “initialize trafodion, minimal” that does the
same thing as “initialize trafodion”, except it skips the privilege manager
step. That allows the state machine to create or upgrade the privilege
manager tables in step k, as the state machine expects. Change step d. to
execute “initialize trafodion, minimal” instead of “initialize trafodion”.
As of this moment, I favor this approach.

As part of my testing, I unit tested repository upgrade. That works, but the
design for repository upgrade seems like it is similar to the privilege
manager code. I’m not quite sure why it works yet. And I will investigate
that before deciding on an approach.

But I wanted to throw this conundrum out there, to see what other folks


View raw message