On 05.08.2011 16:35, De Meyer Tim wrote:
Derby eating up disk space + how to trigger ReclaimSpace elegantly


We're working on a java webapp and using a derby database (v10.5) in embedded mode.
The first version of the application went live about 2 months ago.
The application is used to make invoices and we also store the XML of an invoice document (input data for generating a PDF against a template).
Today, the database is about 1,7GB on disk.

During these 2 months, we've released some minor upgrades, including database migration scripts (like extra tables for new functionality).
Now we've noticed that, when we ran our latest upgrade, the database has suddenly shrunk to a size of 600MB.
This latest upgrade contained a migration script that dropped a no longer needed column on practically every table.
It's after executing this script that the shrink happened.
We did some debugging and hit suspend when the disk size started shrinking, it lead us to a Derby class called "ReclaimSpace".

The shrink was a bliss, because the customer was already complaining about the large size on disk :-)
We're afraid the database will start using up unnecessary space again soon, and of course, we're not going to have a similar migration script in every upgrade.
Is there an elegant way to configure derby to do this cleaning continuously, or to let our webapp instruct derby to do some cleaning?
We're launching the webapp from within a small java webstart app (we launch a Jetty and attach our war file), so it's even OK for us to write some java code to do it programmatically.

We've found this, but it's on a per table basis.

The SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE and SYSCS_UTIL.SYSCS_COMPRESS_TABLE are the normal two way to reclaim space in Derby. The latter will also return space to the operating system, but requires a table level lock. The fact that they procedures opererates on a table at a time should't pose any big problem, you can use metadata to iterate over the tables of interest.

Some users have apps that can't afford to wait for the cleaning and opt for partitioning tables, say, one per month of data, and then dropping the tables that are older than a certain threshold. Views and/or table functions can be used to allows viewing the data as a whole in queries.


By the way, we're running about 10 of these webapps on different desktop PC's.
Each webapp synchronizes its data to a server running a postgresql 9 database.
This means that this postgres database accumulates all the data of these 10 webapps (some data is shared, so it's not times ten).
The size of this postgres DB is less than 1GB, which we think is surprisingly small compared to the derby DB for one webapp.

Any help on all of this would be more than welcome.

Kind regards,

Tim De Meyer