db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <rick.hille...@gmail.com>
Subject Re: AW: Derby Scheduler and FETCH FIRST question
Date Sat, 27 Jan 2018 21:30:20 GMT
On 1/26/18 12:11 AM, g.hohl@aurenz.de wrote:
>
> Hello Rick,
>
> and thanks for your reply.
>
> I will try what you've written as soon as I have time for it.
>
> Unfortunately I already switch to another project and I don't know 
> when I will get the time to have a look on this again (blame my 
> superiors ;-) ).
>
> 2) is surely a good idea.
>
> 1) Did you also read my 2nd mail?
>
I just responded to that 2nd message. Can you share the CREATE TABLE and 
CREATE INDEX statements associated with this table? They may provide 
some further clues.

Thanks,
-Rick
>
> I also tried using a subselect, so I have a WHERE clause. I had the 
> same idea as you that the scheduler might not recognize the ORDER BY 
> and FETCH FIRST.
>
> It was faster, but still not what I would have expected. I've worked a 
> lot with Borland Interbase / Firebird, MySQL and especially with 
> PostgreSQL.
>
> And PostgreSQL would have done a lot faster than this.
>
> By the way: PostgreSQL also has a more easy to use approach in aspect 
> of analysis: https://www.postgresql.org/docs/9.6/static/using-explain.html
>
> Would be great of Derby would offer something similar.
>
> Regards,
>
> Gerrit
>
> *Von:*Rick Hillegas [mailto:rick.hillegas@gmail.com]
> *Gesendet:* Freitag, 26. Januar 2018 00:39
> *An:* derby-user@db.apache.org
> *Betreff:* Re: Derby Scheduler and FETCH FIRST question
>
> On 1/24/18 4:45 AM, g.hohl@aurenz.de <mailto:g.hohl@aurenz.de> wrote:
>
>     Hello everyone,
>
>     I'm using Apache Derby v10.14.1.0 and having some problems using
>     the FETCH FIRST clauses.
>
>     https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html
>
>     I'm accessing the database using the Derby Embedded driver.
>
>     I have a table which contains some indexes as well as some fields
>     and a BLOB field. The table is somewhat big (means many rows, ~13 GB).
>
>     I'm using a query like this (timestamp has an index):
>
>     SELECT * FROM history ORDER BY timestamp DESC FETCH FIRST 10 ROWS ONLY
>
>     The query takes ages (about 27 minutes for that ~13 GB table) and
>     I can see how Derby slowly fills up my harddisk.
>
>     And a look in the "tmp" folder of the database shows several
>     ".tmp" files.
>
>     First I get several files having 10 MB, then I get two big files
>     having 5 GB, then the 10 MB files are deleted, then the 5 GB files
>     are deleted and finally I get the result.
>
>     As I thought something is wrong with my application I also did the
>     same query on the same database and table using SQuirreL v3.8.1.
>     But the result is the same.
>
>     I would have expected that the scheduler of Derby would first look
>     at the timestamp column / index (which should be sorted), taking
>     the first 10 values from there and
>
>     finally reading the first 10 rows matching these values.
>
>     Instead it seems that it first processes the " SELECT * FROM
>     history" part (as memory is not sufficient it swaps it to the
>     harddisk), orders it and takes the first 10 elements.
>
>     Is that correct?
>
>     And if that is correct, where is the benefit of FETCH FIRST -
>     beside that maybe not that much data is transferred (maybe only
>     interesting if you use Derby not by the Embedded Driver because of
>     the TCP/IP connection)?
>
>     Regards,
>
>     Gerrit
>
> Hi Gerrit,
>
> Can you share table and index DDL for this problem as well as the 
> query plan which Derby chose for the query? See the section on 
> "Working with RunTimeStatistics" in the Derby Tuning Guide: 
> http://db.apache.org/derby/docs/10.14/tuning/index.html
>
> It may be that Derby did not choose the index. That in turn, may have 
> happened for 2 reasons:
>
> 1) You're selecting all of the columns in the table and there is no 
> filtering WHERE clause. That reduces the likelihood that Derby will 
> pick an indexed access path since the optimizer sees this as a full 
> table scan.
>
> 2) I don't think that any optimizer support was built for the FETCH 
> FIRST clause. That's worth filing a performance bug for. I think that 
> the FETCH FIRST clause is only applied at execution time in order to 
> short-circuit the number of rows which are returned.
>
> Thanks,
>
> -Rick
>


Mime
View raw message