db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Date Fri, 28 Oct 2011 20:16:00 GMT
Posting your test to a JIRA issue would be best.  It would be 
interesting to post the space table results after each
insert/delete/compress iteration (or every 10, ...).
When do you commit (every row or every 10000)?  Is it multi-threaded? 
Does your
test always insert rows at one end of the index and delete them
from the other end.  If so it may be DERBY-5473 (a runtime issue,
not a compress table issue).

inplace compress as currently inplemented does not do much for
indexes.  Originally the problem it was solving was row level
space reclamation of base tables, indexes were not an issue.  Lately
multiple reports of index space issues have been coming up so more
work would be profitable in this area.
Currently it's only use is to give back space to the OS if possible,
and that depends on the highest page number of a used page.

Here is a discussion of the 3 phases:

1) it does not do purging of index rows.
    The original reason for this is that there is no space savings
    from purging
    alone in indexes.  "half filled" pages can't be used as they are in
    base tables.  Each leaf page "owns" a range of keys and when a key
    is inserted in this range and no space is found on the page, it
    automatically at that point purges the deleted rows before splitting.

    Given DERBY-5473 it may make sense to implement purging and merging
    to handle cases where we "miss" reclaiming fully empty pages.

2) it does not do defragment of indexes.

    Especially without a table level lock this is very complicated.
    Moving rows is not possible as they only can go where they are.  It
    could be possible to move pages but then all pointers would also
    have to fixed up.  Currently only code exists to do splits under
    row level locking, this operation would be much more complicated
    than a split.  The process is even more complicated in that one
    can not look at a page in an index and get all necessary pointer
    that need to be updated from that page (there are no parent pointers
    on children).

3) it does try to truncate the file of an index if possible, but it does
    no work to manipulate the pages such that the last page used in the
    file is moved to earlier in the file.

Sundar Narayanaswamy wrote:
> Hi,
> I am trying to use Derby database in the embedded mode in an 
> application. I wrote a test program where I have a
> table (named LOCATION) with 4 columns as below:
> create table location(id int, num int, addr varchar(40), zip int, 
> primary key(id, zip))
> create index loc_index on location (num)
> I insert 10000 rows into the table, then delete all that rows. I then 
> with just the option to purge rows so that the space left behind by 
> deleted rows can be reused for future inserts. I have
> the inserts, deletes and compress in a loop. After running through the 
> loop a few times, I am noticing that the
> number of allocated and unfilled pages for the primary key keeps growing 
> (causing database size to grow over time). The
> longer I run the loop, larger these numbers are. That does not happen 
> for the table or the index files though..
> CONGLOMERATE            NumAllocatedPages      NumFreePages    
> NumUnFilledPages
> LOCATION                        1                                   
> 831                      0
> SQL111027234806120       1342                            294    
>                    594
> LOC_INDEX                       1                                  
> 521                      1   
> The primary key space continues to grow even when I include the options 
> to defragment_rows and truncate_end to the above function.
> CONGLOMERATE            NumAllocatedPages   NumFreePages    NumUnFilledPages
> LOCATION                        1                                   0    
>                        0
> SQL111027233119770       1674                            47    
>                      704
> LOC_INDEX                       13                                357    
>                     3
> The SYSCS_UTIL.SYSCS_COMPRESS_TABLE function works well and leaves no 
> free/unfilled pages as expected. However, I am concerned with the
> efficiency (time taken) of using the compress function since there could 
> be 10s of millions of rows in a production setup. It seems that the
> purge_rows option would work best for my situation. Is
> this function not expected to free up deleted space in primary key files ?
> Thanks very much for you help,
> Sundar.

View raw message