db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Indexing speed in presence of BLOB columns
Date Fri, 07 Jan 2011 02:41:49 GMT
Hi all.

I have been doing some performance testing of the time it takes to
create an index on a table with BLOB data.  The schema:

    CREATE TABLE binary (
        id INTEGER NOT NULL,
        binary BLOB(1G) NOT NULL

Each time, 10,000 rows are inserted with data of a given size and then
an index is created at the end:

    CREATE INDEX binary_id ON binary (id)

Times are an average of three runs after some warmup runs which are not counted.

1 kB blobs:
  Index: 567 ms
  Insert: 78 ms

10 kB blobs:
  Insert: 3954 ms
  Index: 515 ms

100 kB blobs:
  Insert: 56307 ms
  Index: 20591 ms

1 MB blobs:
  Insert: 521904 ms
  Index: 122527 ms

Surprisingly, the larger the data which is present in the BLOB column,
the longer it takes to index.  Since the indexing process shouldn't
need to read the BLOB data in order to index the row, I am surprised
that it slows down when more data is present.  What is going on here
exactly?  Is Derby physically copying the BLOB data to a new location
and then deleting the original copy instead of performing some kind of
cheap move operation?  Is there some way to avoid this (which won't
require changing the schema)?


View raw message