db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hohl, Gerrit" <g.h...@aurenz.de>
Subject AW: AW: AW: Questions about query execution and optimization
Date Tue, 07 Jul 2020 07:02:46 GMT
Hi Rick,


ah, okay, that was the missing piece of puzzle.
And as I mentioned PostgreSQL before: It seems that ORs are a general problem.
https://www.cybertec-postgresql.com/en/avoid-or-for-better-performance/

But it seems there are some techniques or patterns to avoid them. One of them is using an
UNION:

SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?))
ORDER BY l.create_timestamp DESC, l.id DESC;

--> Total: 0.187s

SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message
FROM log l
WHERE ((l.entity_cls = 'Installation') AND (l.entity_id IN (
SELECT i.id FROM installation i WHERE (i.license_id = ?)
)))
ORDER BY l.create_timestamp DESC, l.id DESC;

--> Total: 0.407s

SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?))
OR ((l.entity_cls = 'Installation') AND (l.entity_id IN (
SELECT i.id FROM installation i WHERE (i.license_id = ?)
)))
ORDER BY l.create_timestamp DESC, l.id DESC;

--> Total: 23.190s

SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message
FROM log l
WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?))
UNION
SELECT l.id, l.action, l.create_timestamp, l.entity_cls, l.entity_id, l.type, l.message
FROM log l
WHERE ((l.entity_cls = 'Installation') AND (l.entity_id IN (
SELECT i.id FROM installation i WHERE (i.license_id = ?)
)))
ORDER BY create_timestamp DESC, id DESC;

--> Total: 0.624s

Not bad, I would say. Maybe one remark: I used also aliases in the ORDER BY clause.
But you can't as you work on the result of the UNION which only consists of the columns without
their aliases.

Thanks a lot, Rick. That helped me a lot.


Regards
Gerrit

-----Ursprüngliche Nachricht-----
Von: Rick Hillegas <rick.hillegas@gmail.com> 
Gesendet: Dienstag, 7. Juli 2020 00:48
An: Derby Discussion <derby-user@db.apache.org>; Hohl, Gerrit <g.hohl@aurenz.de>
Betreff: Re: AW: AW: Questions about query execution and optimization

Hi Gerrit,

One thing I notice about your big WHERE clause is that it contains an OR. In general, ORs
are not optimizable. You will end of with full table scans. Please see http://db.apache.org/derby/docs/10.15/tuning/ctunoptimz39106.html

Hope this helps,
-Rick

On 7/6/20 6:21 AM, Hohl, Gerrit wrote:
> Hi Rick,
>
>
> thanks for your message.
>
>
> Thanks also for the link, also I'm not sure I understand its content correctly.
> My understand of indexes so far was that they are used to filter tables when performing
a query.
> Let's say we have a table PERSON containing the name and age of persons. We have an index
on the age column.
> In our query we want all persons which are 30 years old: SELECT * FROM 
> person WHERE age = 30; I would expect that the DBMS - after it analysed the query - will
look for all "30" entries in the age index.
> Those contain pointers on the original records / tuples in the table file.
> Let's say we have 10 matches in the age index, the DBMS will get those 10 matches from
the table file.
> Of course, if all columns we select are already in the index - in the best case we have
"SELECT age FROM person WHERE age = 30;" - the query would be extremely fast.
> And it would be slower if additional columns which are not backed by an index can also
be found in the WHERE clause.
>
> But even set that aside: In my 1st mail I selected the same columns in the big query
as well as in the two smaller queries.
> There was no difference between them in this aspect.
> The difference was in the WHERE clause - which was more complicated in the big query
and a simpler in the two smaller queries.
> But in all versions the same columns were used in the WHERE clause, all of them backed
by indexes.
> So it won't also explain the behaviour.
>
>
> About the XPLAIN style things: I guess you're talking about this here.
> https://db.apache.org/derby/docs/10.14/tuning/ctun_xplain_style.html
> Seems like a neat little project on its own...
>
>
> Regards,
> Gerrit
>
> -----Ursprüngliche Nachricht-----
> Von: Rick Hillegas <rick.hillegas@gmail.com>
> Gesendet: Montag, 6. Juli 2020 14:54
> An: Derby Discussion <derby-user@db.apache.org>; Hohl, Gerrit 
> <g.hohl@aurenz.de>
> Betreff: Re: AW: Questions about query execution and optimization
>
> Hi Gerrit,
>
> I suspect that your query performs poorly because your indexes do not 
> cover the query. That means that you are selecting columns which don't 
> appear in the indexes. In this case, the optimizer knows that Derby 
> cannot satisfy the query by simply reading index pages. Derby also has 
> to read base table pages in order to fetch the other columns. Please 
> see http://db.apache.org/derby/docs/10.15/tuning/ctunoptimz30768.html
>
> Derby query plans are not easy to read. If you are snapshotting plans with XPLAIN style,
then the details go into snapshot tables. In theory, you could write a tool to read those
snapshot results and format them better.
>
> Hope this helps,
> -Rick
>
> On 7/6/20 12:46 AM, Hohl, Gerrit wrote:
>> Hello everyone,
>>
>>
>> me again.
>> In the meantime I tried to tune the table using the
>>
>> 	SYSCS_UTIL.SYSCS_COMPRESS_TABLE
>>
>> and
>>
>> 	SYSCS_UTIL.SYSCS_UPDATE_STATISTICS
>>
>> functions. It didn't have any effect on the query runtime.
>> I also tried the query analyse functions:
>>
>>
>> 	CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
>> 	CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1);
>> 	<Query>
>> 	VALUES SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
>> 	CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(0);
>> 	CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(0);
>>
>> The result was long, localized (German in my case - nice)... and absolutely cryptic.
>> It didn't mention any of the tokens of my SQL query.
>> Instead it was taking about UNIONs and JOINs, also I don't use any. At least not
explicitly.
>> I couldn't figure at all which entry of the analysis belonged to which part of the
SQL query.
>>
>> I remember when I worked with PostgreSQL there was that neat EXPLAIN command.
>> That command had a very good structure of its output.
>> And it seems they still have it in their current version:
>> https://www.postgresql.org/docs/current/using-explain.html
>>
>>
>> Regards,
>> Gerrit
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Hohl, Gerrit <g.hohl@aurenz.de>
>> Gesendet: Freitag, 3. Juli 2020 16:22
>> An: Derby Discussion <derby-user@db.apache.org>
>> Betreff: Questions about query execution and optimization
>>
>> Hello everyone,
>>
>> this week I came across a behaviour of Apache Derby which I couldn't explain to myself.
>> I'm using version 10.14.2.0 and the structure of the database looks like this:
>>
>> CREATE TABLE license (
>> 	id BIGINT NOT NULL,
>> 	[...]
>> );
>> CREATE TABLE installation (
>> 	id BIGINT NOT NULL,
>> 	[...]
>> 	license_id BIGINT NOT NULL,
>> 	[...]
>> );
>> CREATE TABLE log (
>> 	id BIGINT NOT NULL,
>> 	action VARCHAR(255) NOT NULL,
>> 	create_timestamp TIMESTAMP NOT NULL,
>> 	entity_cls VARCHAR(255),
>> 	entity_id BIGINT,
>> 	type INTEGER NOT NULL,
>> 	message VARCHAR(32672) NOT NULL,
>> 	PRIMARY KEY (ID)
>> );
>> CREATE INDEX logcreatetimestampindex ON log (create_timestamp); 
>> CREATE INDEX logentityclsentityidindex ON log (entity_cls, 
>> entity_id); ALTER TABLE installation ADD CONSTRAINT 
>> fkinstallationlicense FOREIGN KEY
>> (license_id) REFERENCES license (id);
>>
>> The log table contains log messages about actions on records in other tables.
>> Because it should be useable with all tables, there can't be any foreign keys.
>> Especially as also logs should be kept when the corresponding record is deleted.
>>
>> Now I wanted to create a query which returns all logs of a certain license and its
installations:
>>
>> SELECT l.*
>> FROM log l
>> WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) OR 
>> ((l.entity_cls = 'Installation) AND (l.entity_id IN ( SELECT i.id 
>> FROM installation i WHERE (i.license_id = ?)
>> )))
>> ORDER BY l.create_timestamp DESC, l.id DESC;
>>
>> But that thing took forever (~ 12.5s).
>> I thought the reason would maybe my index, so I introduced two more:
>>
>> CREATE INDEX logentityidindex ON log (entity_cls); CREATE INDEX 
>> logentityclsindex ON log (entity_id);
>>
>> Unfortunately that didn't change much. The query took almost the exact same amount
of time.
>> Next thing was removing the sub-select and directly giving a list of IDs for the
installation records - just for testing.
>>
>> SELECT l.*
>> FROM log l
>> WHERE ((l.entity_cls = 'License') AND (l.entity_id = 123)) OR 
>> ((l.entity_cls = 'Installation) AND (l.entity_id IN (234, 345))) 
>> ORDER BY l.create_timestamp DESC, l.id DESC;
>>
>> Much to my amazement that also didn't change anything.
>> In black despair I split the query into two while keeping the sub-select:
>>
>> SELECT l.*
>> FROM log l
>> WHERE ((l.entity_cls = 'License') AND (l.entity_id = ?)) ORDER BY 
>> l.create_timestamp DESC, l.id DESC;
>>
>> SELECT l.*
>> FROM log l
>> WHERE ((l.entity_cls = 'Installation) AND (l.entity_id IN ( SELECT 
>> i.id FROM installation.id WHERE (l.license_id = ?)
>> )))
>> ORDER BY l.create_timestamp DESC, l.id DESC;
>>
>> Unbelievable: The 1st query took ~0.15s while the 2nd query took ~0.2s.
>> As the sub-select seems not to have any effect how fast or slow the query is, that
can be ignored.
>> In all cases I only have columns in my WHERE clause which are in the indexes.
>> But somehow Apache Derby seems not to notice it and not using them, if the WHERE
clause gets too complicated.
>>
>> Is there anything I haven't seen? Anything I'm doing wrong by structuring the query
like I did?
>> Somehow I don't get it why the execution time gets 100 times slower just by having
everything in one query.
>>
>> Regards,
>> Gerrit
>>

Mime
View raw message