hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Jeltema <brian.jelt...@digitalenvoy.net>
Subject Re: snapshot timeouts
Date Wed, 08 Oct 2014 20:23:53 GMT
Thanks for the quick responses. I’ll get back on this later; I discovered that HBase didn’t
restart properly
after changing the timeouts, so the second ERROR may be a side-effect of that.

I also just discovered that the table in question was not pre-split properly, and the region
distribution
is screwed up. So I’ll clean up the mess and try again tomorrow.

Regrets for the possible false alarm

Brian

On Oct 8, 2014, at 3:25 PM, Brian Jeltema <brian.jeltema@digitalenvoy.net> wrote:

> Sorry, I usually include that info. HBase version is 0.98. hbase.rpc.timeout is the default.
> 
> When the ‘ERROR: Call id….’ occurred, there was no stack trace. That was the entire
error output.
> 
> Before I increased the snapshot timeout parameters, the timeout I was seeing looked like:
> 
> ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=Host-bdj
table=Host type=FLUSH } had an error.  Procedure Host-bdj { waiting=[] done=[host-22.hdfs.foo.net,60020,1410543068459,
host-24.hdfs.foo.net,60020,1412603246174, host-17.hdfs.foo.net,60020,1410543059186, host-19.hdfs.foo.net,60020,1412419924491,
host-20.hdfs.foo.net,60020,1412419942143, host-16.hdfs.foo.net,60020,1403178964733, host-15.hdfs.foo.net,60020,1403178962029,
host-21.hdfs.foo.net,60020,1403178959748, host-23.hdfs.foo.net,60020,1410543079248, host-18.hdfs.foo.net,60020,1410543061865]
}
> 	at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:366)
> 	at org.apache.hadoop.hbase.master.HMaster.isSnapshotDone(HMaster.java:2993)
> 	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38245)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
> 	at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.hadoop.hbase.errorhandling.TimeoutException via timer-java.util.Timer@3097c4e1:org.apache.hadoop.hbase.errorhandling.TimeoutException:
Timeout elapsed! Source:Timeout caused Foreign Exception Start:1412792382137, End:1412792442137,
diff:60000, max:60000 ms
> 	at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
> 	at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:318)
> 	at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:356)
> 	... 10 more
> Caused by: org.apache.hadoop.hbase.errorhandling.TimeoutException: Timeout elapsed! Source:Timeout
caused Foreign Exception Start:1412792382137, End:1412792442137, diff:60000, max:60000 ms
> 	at org.apache.hadoop.hbase.errorhandling.TimeoutExceptionInjector$1.run(TimeoutExceptionInjector.java:67)
> 	at java.util.TimerThread.mainLoop(Timer.java:555)
> 	at java.util.TimerThread.run(Timer.java:505)
> 
> On Oct 8, 2014, at 3:18 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> 
>> Can you give a bit more information :
>> 
>> the release of hbase you're using
>> value for hbase.rpc.timeout (looks like you leave it @ default)
>> more of the error (please include stack trace if possible)
>> 
>> Cheers
>> 
>> On Wed, Oct 8, 2014 at 12:09 PM, Brian Jeltema <
>> brian.jeltema@foo.net> wrote:
>> 
>>> I’m trying to snapshot a moderately large table (3 billion rows, but not a
>>> huge amount of data per row).
>>> Those snapshots have been timing out, so I set the following parameters to
>>> relatively large values:
>>> 
>>>    hbase.snapshot.master.timeoutMillis
>>>    hbase.snapshot.region.timeout
>>>    hbase.snapshot.master.timeout.millis
>>> 
>>> A snapshot attempt then resulted in the terse result:
>>> 
>>>    ERROR: Call id=13, waitTime=60060, rpcTimeout=60000
>>> 
>>> A brief review of some of the hbase log files didn’t reveal anything (but
>>> there are many).
>>> How should I pursue getting these snapshots to work?
>>> 
>>> Brian
> 


Mime
View raw message