db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hohl, Gerrit" <g.h...@aurenz.de>
Subject Questions about query execution and optimization
Date Fri, 03 Jul 2020 14:21:45 GMT
Hello everyone,

this week I came across a behaviour of Apache Derby which I couldn't explain to myself.
I'm using version 10.14.2.0 and the structure of the database looks like this:

CREATE TABLE license (
	id BIGINT NOT NULL,
	[...]
);
CREATE TABLE installation (
	id BIGINT NOT NULL,
	[...]
	license_id BIGINT NOT NULL,
	[...]
);
CREATE TABLE log (
	id BIGINT NOT NULL,
	action VARCHAR(255) NOT NULL,
	create_timestamp TIMESTAMP NOT NULL,
	entity_cls VARCHAR(255),
	entity_id BIGINT,
	type INTEGER NOT NULL,
	message VARCHAR(32672) NOT NULL,
	PRIMARY KEY (ID)
);
CREATE INDEX logcreatetimestampindex ON log (create_timestamp);
CREATE INDEX logentityclsentityidindex ON log (entity_cls, entity_id);
ALTER TABLE installation ADD CONSTRAINT fkinstallationlicense FOREIGN KEY (license_id) REFERENCES
license (id);

The log table contains log messages about actions on records in other tables.
Because it should be useable with all tables, there can't be any foreign keys.
Especially as also logs should be kept when the corresponding record is deleted.

Now I wanted to create a query which returns all logs of a certain license and its installations:

SELECT l.*
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?))
OR ((l.entity_cls = 'Installation) AND (l.entity_id IN (
SELECT i.id FROM installation i WHERE (i.license_id = ?)
)))
ORDER BY l.create_timestamp DESC, l.id DESC;

But that thing took forever (~ 12.5s).
I thought the reason would maybe my index, so I introduced two more:

CREATE INDEX logentityidindex ON log (entity_cls);
CREATE INDEX logentityclsindex ON log (entity_id);

Unfortunately that didn't change much. The query took almost the exact same amount of time.
Next thing was removing the sub-select and directly giving a list of IDs for the installation
records - just for testing.

SELECT l.*
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = 123))
OR ((l.entity_cls = 'Installation) AND (l.entity_id IN (234, 345)))
ORDER BY l.create_timestamp DESC, l.id DESC;

Much to my amazement that also didn't change anything.
In black despair I split the query into two while keeping the sub-select:

SELECT l.*
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?))
ORDER BY l.create_timestamp DESC, l.id DESC;

SELECT l.*
FROM log l
WHERE ((l.entity_cls = 'Installation) AND (l.entity_id IN (
SELECT i.id FROM installation.id WHERE (l.license_id = ?)
)))
ORDER BY l.create_timestamp DESC, l.id DESC;

Unbelievable: The 1st query took ~0.15s while the 2nd query took ~0.2s.
As the sub-select seems not to have any effect how fast or slow the query is, that can be
ignored.
In all cases I only have columns in my WHERE clause which are in the indexes.
But somehow Apache Derby seems not to notice it and not using them, if the WHERE clause gets
too complicated.

Is there anything I haven't seen? Anything I'm doing wrong by structuring the query like I
did?
Somehow I don't get it why the execution time gets 100 times slower just by having everything
in one query.

Regards,
Gerrit


Mime
View raw message