phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samarth Jain (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-2408) Update statistics fails to complete
Date Fri, 04 Dec 2015 22:34:11 GMT


Samarth Jain commented on PHOENIX-2408:

Spent the last couple of days trying to figure out what is going on here. On my laptop (1
region server), I loaded a table with 400 millions rows distributed over 8 regions. I added
logging in a few places to see what is going on.  I see errors like these in my logs on the
server side:

Exception caught in post scanner open for scan: 4. Exception: org.apache.hadoop.hbase.ipc.CallerDisconnectedException:
Aborting on region TESTXYZ,\x04\x00\x00\x00\x00\x00\x00\x00\x00,1449215361195.5fa492cebc9f25b9602ecaf1d4601daf.,
call org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl@3efcf4dd after 121324
ms, since caller disconnected
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(
	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(
	at org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(
	at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(
	at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(
	at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(
	at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
	at org.apache.hadoop.hbase.ipc.RpcExecutor$

It looks like org.apache.hadoop.hbase.ipc.CallerDisconnectedException is a regular IOException
and not a DoNotRetryIOException. As a result, the BaseScannerRegionObserver#doPostScannerOpen()
re-throws a regular IO exception back to the client resulting in retries. These retries however
are never successful and we end up retrying the default number of times (31).

One thought I had was that I may be maxing out the IO on my laptop SSD. But then, reducing
the number of region server handler threads from default to 2 (to limit the I/O) didn't help

> Update statistics fails to complete
> -----------------------------------
>                 Key: PHOENIX-2408
>                 URL:
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Samarth Jain
>             Fix For: 4.7.0
> On a production cluster, when UPDATE STATISTICS is run, it fails to complete.

This message was sent by Atlassian JIRA

View raw message