db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <Kristian.Waa...@Sun.COM>
Subject Re: IndexOutOfBoundsException from client driver during getBlob
Date Mon, 10 Nov 2008 12:20:09 GMT
On 11/10/08 01:27, Daniel Noll wrote:
> Kristian Waagan wrote:
>> I feel we have too little information to create a fix - we don't even 
>> know what the real problem is.
>> The locator values are drawn from a counter, and there is a counter 
>> for each (root) connection. I'm having trouble understanding how we 
>> could get concurrency issues in this case.
>> Also, I think the error you are seeing suggests an invalid locator 
>> value, not a duplicate value.
>> Anything special about your network server setup? (time-slicing, 
>> statement caching, connection pooling)
>> My suggestion is to wait for a while and see if it happens again, or 
>> see if anyone else has suggestions.
> It has happened again.  This time it took 12 hours for it to happen, 
> which is information I didn't previously have.  If I'm lucky this will 
> help reproducing it here.  Maybe it's something that takes a long time 
> until it occurs.  Or maybe it's something where the probability is just 
> really low so it takes an enormous number of attempts before it happens.
> As far as the network server setup itself, it's straight-forward.  We're 
> not using connection pooling due to bugs preventing that from working 
> properly, and everything else is normal as well.

Are the problems you are having with connection pooling logged in Jira?

> I guess I can run a test overnight to see if something similar happens, 
> with tracing turned on.  It's going to generate a lot of output though 
> so I somewhat fear for my disk space. :-)

You can also run the test without logging to see if it can be reproduced 
by a 12 hour run. If so, I think we have two initial options;
  a) Synchronize the access to the counter properly
  b) Add custom logging to the code that fails, to see which value 
causes the failure. If it is one of the invalid locator values, it's a 
strong indication that the problem is indeed the counter.

The bug I'm thinking of on one with a low probability, so if it happens 
constantly after ~12 hours it sounds more like an overflow problem of 
some kind.

If you can give me some more details about the data and the load, I 
might be able to kick of some test runs of my own;
  - Blob size
  - number of rows in the table
  - number of clients accessing the table concurrently
  - isolation level
  - page cache size
  - any other information you think might be relevant


> Daniel

View raw message