db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <rick.hille...@gmail.com>
Subject Re: Derby Scheduler and FETCH FIRST question
Date Thu, 25 Jan 2018 23:39:03 GMT
On 1/24/18 4:45 AM, g.hohl@aurenz.de wrote:
> Hello everyone,
> I'm using Apache Derby v10.14.1.0 and having some problems using the 
> FETCH FIRST clauses.
> https://db.apache.org/derby/docs/10.14/ref/rrefsqljoffsetfetch.html
> I'm accessing the database using the Derby Embedded driver.
> I have a table which contains some indexes as well as some fields and 
> a BLOB field. The table is somewhat big (means many rows, ~13 GB).
> I'm using a query like this (timestamp has an index):
> The query takes ages (about 27 minutes for that ~13 GB table) and I 
> can see how Derby slowly fills up my harddisk.
> And a look in the "tmp" folder of the database shows several ".tmp" files.
> First I get several files having 10 MB, then I get two big files 
> having 5 GB, then the 10 MB files are deleted, then the 5 GB files are 
> deleted and finally I get the result.
> As I thought something is wrong with my application I also did the 
> same query on the same database and table using SQuirreL v3.8.1. But 
> the result is the same.
> I would have expected that the scheduler of Derby would first look at 
> the timestamp column / index (which should be sorted), taking the 
> first 10 values from there and
> finally reading the first 10 rows matching these values.
> Instead it seems that it first processes the " SELECT * FROM history" 
> part (as memory is not sufficient it swaps it to the harddisk), orders 
> it and takes the first 10 elements.
> Is that correct?
> And if that is correct, where is the benefit of FETCH FIRST - beside 
> that maybe not that much data is transferred (maybe only interesting 
> if you use Derby not by the Embedded Driver because of the TCP/IP 
> connection)?
> Regards,
> Gerrit
Hi Gerrit,

Can you share table and index DDL for this problem as well as the query 
plan which Derby chose for the query? See the section on "Working with 
RunTimeStatistics" in the Derby Tuning Guide: 

It may be that Derby did not choose the index. That in turn, may have 
happened for 2 reasons:

1) You're selecting all of the columns in the table and there is no 
filtering WHERE clause. That reduces the likelihood that Derby will pick 
an indexed access path since the optimizer sees this as a full table scan.

2) I don't think that any optimizer support was built for the FETCH 
FIRST clause. That's worth filing a performance bug for. I think that 
the FETCH FIRST clause is only applied at execution time in order to 
short-circuit the number of rows which are returned.



View raw message