db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "DatabaseCorruption" by MyrnavanLunteren
Date Wed, 18 Dec 2013 22:57:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The "DatabaseCorruption" page has been changed by MyrnavanLunteren:

New page:
= Derby Database Corruption =

Derby is a very stable database.

However, just like with any DBMS, it is still possible to corrupt a database.

This document explores some of the issues related to database corruption, including things
that might cause corruption, and techniques for dealing with a corrupted database.

== Categories of Corruption ==

There are two types of corruption

   * The database does not boot
     *  In this case, on attempting to connect you'll see the error:
        ERROR XJ040: Failed to start database ‘testdb' with class loader sun.misc.Launcher$!AppClassLoader@fee16cf5,
see the next exception for details.
     *  The nested exception has the real error, usually starting with XS e.g.
        ERROR XSDBB: Unknown page format at page Page(11093,Container(0, 1745))
        The only way to recover from such an error is to restore from a good backup or recreate
the database and schema.
   * The database boots, but some other store error is returned.
     *  In this case, sometimes some remedial activity can be attempted, such as dropping
and recreating indexes if the problem is with one of the index files, or (off line) compress.
The first step is to do a Consistency Check. More about this below.

== Possible Causes of Database Corruption ==

The Derby Developers community is always attempting to ensure database corruption is as unlikely
as is possible. However, there are some issues that have shown in the past to lead to database

   * Manual corruption - intentional or unintentional.
     *   As a database physically lives on disk in a directory, it is possible for users to
remove, or update files that are essential to the database. Especially when space is tight,
users may think that certain files can be deleted. This has happened to the files in the log
directory, with people thinking these were only error logs, and thus could be removed. But
the log directory holds the transaction logs, and these are essential for recovery and restore
from backup. In addition, they are used when derby needs to abort a transaction, e.g. in the
case of deadlock situations or other errors. More recent versions of Derby have warning files
in the directories.
     *   Improper shutdown and subsequent manual manipulation. Other corruption situations
have occurred because users copied the database directory around while it was still being
used, and then copied the database directory back, resulting in an inconsistent situation.
Before doing any type of OS level manipulation of the database, one should make sure that
the database, or the Derby engine, was shutdown (use a connect with URL attribute ';shutdown=true').
    * Dual boot. Derby can only correctly handle updates to the database on disk if the JVM
that started it can control the actual writes. If two JVMS manage to access the same database,
corruption is possible. To prevent this, Derby relies on files in the database directory with
extension .lck. Derby will handle creation and removal of such files. Note that on Windows
systems, the OS will not allow delete of any files in use, but it might be possible on other
OSs. If a user manages to delete a .lck file, Derby might not recognize that another instance
is active and then dual boot is possible. This in turn will result in two (embedded) Derby
processes connecting to the same database, which will corrupt the database.

    Note also, that if multiple JVM  processes need to connect to the same database, one can
use the !DerbyNetClient and Derby Network Server.

   * System interference. Again, Derby can only correctly handle updates to the disk if it
has the control to insure when disk writes are guaranteed flushed to disk. Sometimes the system
will interfere with this control;
     *    write cache techniques. Some hardware has a default setting of 'write cache' set
to enabled. This functionality will interfere with Derby's expectation of what is written
when to the disk, and has shown to cause database corruption.
     *   remote disk management. A remote disk is always connected to the local machine using
some OS level write control. So Derby databases should not be placed on a remote disk. Note
that read-only access may be OK.
     *   System/hardware crash. In rare cases a system/hardware crash will happen at an unlucky
moment and writes may not succeed. Various techniques are applied to limit this situation
to the very utmost, but in the end, if software cannot get to the disk, database corruption
is still possible. This is why it is still needed to make backups.  An example of this is
that Derby employs checksums to insure complete pages are written to disk, but if a bad timed
crash happens to only write 1/2 of a Derby page to disk, derby will recognize the checksum
error but the only recover option is to recover from backup at that point.
   * Bugs. In rare occasions, bugs in Derby have been responsible for database corruption.
The Derby Development community is always fixing these, so checking JIRA for these is important.

== Prevention/Preparedness ==

Matching the list of possible causes above, one can do the following to prepare and prevent
database corruption:

   * Do NOT modify the contents of a Derby database directory.
   * If you need to move a database, ensure it is completely shutdown, incl. any network server
   * Switch off the machine's write caching
   * Do not put Derby on a remote disk
   * Check JIRA for fixed corruption bugs, and either get the most current release (or at
least one where the problem has been fixed), or avoid using calls/scenarios that have been
identified as problematic.
   * Ensure you have a backup plan / scenario. (See the manuals for tips, and this web site:

== Salvage ==

If the database has become corrupt, and you do not have a usable backup, there are still a
couple of approaches for salvaging as much as possible from the damaged databases. This is
not a guaranteed mechanism of restoring a database - depending on the corruption, the tools
may not be able to access the database, and even if they do, in the end, there may still be
data missing, and it's impossible to say what. Having a proper backup plan, and restoring
from backup is still a better option.

   * run the !ConsistencyChecker: DatabaseConsistencyCheck
     *   If the corruption is in an index (usually .dat files with names ending in '1' represent
an index), drop it.
     *   select all data from accessible tables and move them into another database using
   * Run off-line compress. This will also rebuild the indexes.
   * Use the optional tools. See below.

=== Optional Tools for working with corrupt databases ===

There are some optional tools that have been written to facilitate crawling a database in
the face of corruption. These are not part of the product distributions, and at time of writing
this (December 2013), they are not even checked-in into the source code. To use them (currently),
you will have to download the various tools from their JIRAs and compile with access to the
derby.jars. These tools are (partly) based on some documentation on the Apache Derby web site:

   * http://db.apache.org/derby/papers/pageformats.html: Derby On Disk Page Format
   * http://db.apache.org/derby/papers/logformats.html: Derby Write Ahead Log Format

The optional tools are found in these JIRAs:

   * https://issues.apache.org/jira/browse/DERBY-6136: create a custom/optional tool for dumping
the data in a corrupted database; dataFileVTI.sql, DataFileVTI.java and RawDBReader.java
   * https://issues.apache.org/jira/browse/DERBY-5201: Create tools for reading the contents
of the seg0 directory; TableSignatureReader.java and DataFileReader.java
   * https://issues.apache.org/jira/browse/DERBY-5195; Create tools for browsing the files
in the database log directory; LogFileReader.java and ControlFileReader.java

Note: currently, to compile LogFileReader.java in trunk, you need to adjust the permissions
of the method org.apache.derby.iapi.services.io.RegisteredFormatIds.TwoByte from private to

==== ListOfTables ====

First it is helpful to understand the connection between tablenames and the .dat filenames.
See this page: http://wiki.apache.org/db-derby/ListFileNamesOfTables. A quick way to go from
container/conglomerate number is to convert the container number from dec to hex. Then add
'c' to the front and '.dat' as extension. If the hex number ends in 0 it is a base table,
and if it ends in 1, it is a btree (index).

==== RawDBReader====

The first of these tools is DataFileVTI and RawDBReader.java of https://issues.apache.org/jira/browse/DERBY-6136DERBY-6136.
The JIRA issue provided a .sql file which can be used to access a database. You have to change
the path in the .sql.

Note, that you *do* need to know the passwords of the various users to access their schemas.
Otherwise, the tool will give an error like this: The exception 'java.lang.Exception: APP
is not the owner of the database

Note also, that if you've built the tools using trunk, and the database was created with an
older Derby version, you need the special alpha-version upgrade flag, e.g.: java -Dderby.database.allowPreReleaseUpgrade=true
org.apache.derby.tools.ij dataFileVTI-1.sql

Which for instance for the syschemas showed this:

    { dataSrc: "unbound", dataSrcObj: { noAutoLoad: true }, render: "code", renderObj: { style:
"border: 1px solid #88c; border-radius: 6px; width: 80%; background-color: #ddf; white-space:
pre; font-family: courier;" }, obj: { body: "ij> select * from sysschemas;<br><br>SCHEMAID|SCHEMANAME|AUTHORIZATIONID<br>-------------------------------------------------------------------<br>c013800d-00f8-5b53-28a9-00000019ed88|SYSIBM|DBA<br>8000000d-00d0-fd77-3ed8-000a0a0b1900|SYS|DBA<br>c013800d-00fb-2641-07ec-000000134f30|SYSCAT|DBA<br>c013800d-00fb-2642-07ec-000000134f30|SYSFUN|DBA<br>c013800d-00fb-2643-07ec-000000134f30|SYSPROC|DBA<br>c013800d-00fb-2644-07ec-000000134f30|SYSSTAT|DBA<br>c013800d-00fb-2647-07ec-000000134f30|NULLID|DBA<br>c013800d-00fb-2648-07ec-000000134f30|SQLJ|DBA<br>c013800d-00fb-2646-07ec-000000134f30|SYSCS_DIAG|DBA<br>c013800d-00fb-2649-07ec-000000134f30|SYSCS_UTIL|DBA<br>80000000-00d2-b38f-4cda-000a0a412c00|APP|APP<br>2faf0139-011a-2edf-f872-00000306a538|TSTSCHEMA|DIDA"
}, id: "ii0" }
    ij> select * from sysschemas;


A second mechanism to access the database is by registering a custom tool. You need to have
a database physically upgraded to at least 10.10 or the syscs_register_tool procedure will
not be available:

call syscs_util.syscs_register_tool('customTool',<register_or_un>,'<name of the class>,'<controlschemaname>',<tableprefix>,

For example:

call syscs_util.syscs_register_tool('customTool',true,'RawDBReader','CONTROL','RAW_', 'c:/10tst/opttools/corruptdb/TST',null,'APP',null);

At this point, you can select contents from the corrupted database.  Tor instance, if there's
a table 'sched' in the schema 'TSTSCHEMA', like so:

select * from RAW_TSTSCHEMA.SCHED;

I found, that a good way to effectively see all schemas and their tables was to run this query
in ij (after issuing maximumdisplaywidth 20;)

ij> SELECT SCHEMANAME, TABLENAME FROM sys.sysschemas s, sys.systables t WHERE s.schemaid
= t.schemaid


and you unregister the tool like so;

call syscs_util.syscs_register_tool ('customTool',false,'RawDBReader','CONTROL','RAW_');

The schemas starting with 'RAW_' will be removed again when you unregister the tool.

See https://issues.apache.org/jira/browse/DERBY-6136: for more examples when the database
is encrypted, or using other authentication mechanisms.

==== TableSignatureReader.java and DataFileReader.java ====

This is from https://issues.apache.org/jira/browse/DERBY-5201. The intention is to read the
===== TableSignatureReader =====

This tool prints out the signature of a table, which is needed for the DataFileReader. This
 tool did not work with just compiled classes for me, it gave a 'java.sql.Exception: No suitable
driver". But it worked with jars (derbyrun.jar) in the classpath. 


java TableSignatureReader connectionURL schemaName tableName


connectionURL e.g. "jdbc:derby:db1"
schemaName case-sensitive schema name, e.g. APP
tableName case-sensitive table name, e.g. T1

Here's an example use...

java TableSignatureReader "jdbc:derby:db1" SYS SYSALIASES

...which prints out the following result:

( "ALIASID" char( 36 ), "ALIAS" varchar( 128 ), "SCHEMAID" char( 36 ), "JAVACLASSNAME" long
varchar, "ALIASTYPE" char( 1 ), "NAMESPACE" char( 1 ), "SYSTEMALIAS" boolean, "ALIASINFO"
serializable, "SPECIFICNAME" varchar( 128 ) )


java DataFileReader $dataFileName [ -v ] [ -d $D ] [ -p $P ] [ -n $N ] [ -e $encryptionAttributes
$serviceProperties ]

-v Verbose. Print out records and slot tables. Field data appears as byte arrays. If you do
not set this flag, the tool just decodes the page headers.
-d Data signature. This makes a verbose printout turn the field data into objects. $D is a
row signature, e.g., "( a int, b varchar( 30 ) )"
-p Starting page. $P is a number which must be at least 1, the first page to read after the
header. Page 0 (the header) is always read.
-n Number of pages to read. $N is a positive number. Defaults to all subsequent pages.
-e If the database is encrypted, you must supply the encryption attributes and the location
of service.properties.

For example, the following command deserializes all of the records in the SYSCONGLOMERATES

java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar(
128), e boolean, f serializable, g boolean, h char( 36 ) )"

The following command decodes the entire SYSCOLUMNS conglomerate:

java DataFileReader db/seg0/c90.dat -v -d "( a char(36), b char(128), c int, d serializable,
e serializable, f char( 36 ), g bigint, h bigint, i bigint )"

Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined
types and for the system columns which are objects.

The following example decrypts and deserializes an entire SYSCONGLOMERATES file, dumping the
result into an xml file for inspection:

java DataFileReader wombat/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar(
128), e boolean, f serializable, g boolean, h char( 36 ) )" -e Wednesday wombat/service.properties
> z.xml

Note the special 'serializable' type in the preceding example. Use 'serializable' for user-defined
types and for the system columns which are objects.

Here are examples of using this tool on encrypted databases:

java DataFileReader encryptedDB/seg0/c490.dat -v -d "( a varchar( 50 ), b char( 11 ) )" -e
"encryptionKey=abcd1234efab5678" encryptedDB/service.properties > ~/junk/z.xml

java DataFileReader bootpasswordDB/seg0/c490.dat -v -d "( a varchar( 50 ), b char( 11 ) )"
-e "bootPassword=mysecretpassword" bootpasswordDB/service.properties > ~/junk/zz.xml

Other examples of usage:


1) Decode an entire data file, putting the resulting xml in the file z.xml. You can then view
that file using a browser like Firefox, which lets you collapse and expand the elements.

java DataFileReader db/seg0/c20.dat -v -d "( a char(36), b char(36), c bigint, d varchar(
128), e boolean, f serializable, g boolean, h char( 36 ) )" > z.xml


2) Pretty-print the file header:

java DataFileReader db/seg0/c20.dat -n 1
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<allocPage number="0">
<status hexvalue="1">
<containerStatus hexvalue="0">
<extentStatus hexvalue="30000010">
<freePages totalLength="8" bitsThatAreSet="0"/>
<unFilledPages totalLength="8" bitsThatAreSet="1"/>


3) Count the number of pages in a data file:

java DataFileReader db/seg0/c20.dat | grep pageCount <pageCount>9</pageCount>


4) Decode 3 pages, starting at page 2. This one is a little tricky because the header page
is always decoded. So you need to ask for 4 pages (3 data pages plus 1 header page):

java DataFileReader db/seg0/c20.dat -v -p 4 -n 3

==== LogFileReader and ControlFileReader ====

These are from https://issues.apache.org/jira/browse/DERBY-5195; Create tools for browsing
the files in the database log directory.
===== LogFileReader =====


java LogFileReader $logFileName [ -v ] [ -p $P ] [ -n $N ] [ -e $bootPassword $serviceProperties

-v Verbose. Deserialize the logged operations. If you do not set this flag, the tool just
decodes the wrapper headers.
-p Starting position. $P is a positive number, the offset of the first log entry to read.
This causes the tool to skip reading the file header as well.
-n Number of records to read. $N is a non-negative number. If you do not specify this flag,
the tool prints all subsequent log entries.
-e If the database is encrypted, you must supply the boot password and the location of service.properties.

The following example decrypts and deserializes an entire log file, dumping the result into
an xml file for inspection:

java LogFileReader wombat/log/log1.dat -v -e Wednesday wombat/service.properties > z.xml


More examples:

1) Decode an entire log file, putting the resulting xml in the file z.xml. You can then view
that file using a browser like Firefox, which lets you collapse and expand the elements. Because
the -v switch is specified, the contents of the logged operations are deserialized and <details>
elements are populated with the toString() results:

java LogFileReader db/log/log2.dat -v > z.xml


2) Pretty-print a log file header:

java LogFileReader db/log/log2.dat -n 0
<?xml version="1.0" encoding="UTF-8" standalone="no"?>


3) Count the number of entries in a log file:

java LogFileReader db/log/log2.dat | grep recordCount


4) Decode 3 log entries, starting at a given record offset:

java LogFileReader db/log/log2.dat -v -p 29363 -n 3
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<groups hexvalue="180">
<transactionID value="192"/>
<operation type="org.apache.derby.impl.store.raw.data.PurgeOperation">
<details>Page Operation: Page(2,Container(0, 81)) pageVersion 75 : Purge : 36 slots
starting at 37 (recordId=43) (recordId=44) (recordId=45) (recordId=46) (recordId=47) (recordId=48)
(recordId=49) (recordId=50) (recordId=51) (recordId=52) (recordId=53) (recordId=54) (recordId=55)
(recordId=56) (recordId=57) (recordId=58) (recordId=59) (recordId=60) (recordId=61) (recordId=62)
(recordId=63) (recordId=64) (recordId=65) (recordId=66) (recordId=67) (recordId=68) (recordId=69)
(recordId=70) (recordId=71) (recordId=72) (recordId=73) (recordId=74) (recordId=75) (recordId=76)
(recordId=77) (recordId=78)</details>
<groups hexvalue="180">
<transactionID value="192"/>
<operation type="org.apache.derby.impl.store.raw.data.InsertOperation">
<details>Page Operation: Page(1,Container(0, 81)) pageVersion 147 : Insert : Slot=1
<groups hexvalue="112">
<transactionID value="192"/>
<operation type="org.apache.derby.impl.store.raw.xact.EndXact">
<details>EndXact null Committed : transactionStatus = Committed</details>

===== ControlFileReader =====

This tool reads the control file in the log directory and pretty-prints it as an xml file.


java ControlFileReader db/log/log.ctrl
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<flags flags="1">

  This example shows it's from a database created before 10.9 was released (as it has the
BETA flag), built using jars.

  Another example:

java ControlFileReader generated/toursdb/toursdb/log/log.ctrl
<?xml version="1.0" encoding="UTF-8"?><controlFile>
    <flags flags="0"/>

   This example shows this toursdb database was created using 10.10, after the first release
on the branch (no beta flag), but using classes, not jars, so there is no subversion number.

View raw message