trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Narain Arvind" <narain.arv...@gmail.com>
Subject RE: Trafodion master Daily Test Result - 332 - Still Failing
Date Thu, 15 Sep 2016 16:22:29 GMT
Timeout in core-regress-executor-hdp is during drop of table T106A. 
https://jenkins.esgyn.com/job/core-regress-executor-hdp/369/console

Messages related to transactional.SplitBalancerHelper repeated in Region Server logs.

Sean could you please take a look ? Thanks.

http://traf-testlogs.esgyn.com/Daily-master/332/regress-executor-hdp/hbase-logs/hbase-hbase-regionserver-slave-ahw23.log


		2016-09-15 10:25:01,806 INFO  [RS_CLOSE_REGION-slave-ahw23:16020-2] regionserver.HRegion:
Closed TRAFODION.SCH.T106A,\x00\x00\x00\x01\x00\x00\x00\x00,1473934644524.f32b6c8d8f3932729b9824f51a95a63e.
		2016-09-15 10:25:02,059 INFO  [PriorityRpcServer.handler=8,queue=0,port=16020] transactional.SplitBalanceHelper:
scannersListClear Active Scanner found, ScannerId: 0 Txid: 1054 Region: TRAFODION.SCH.T106A,\x00\x00\x00\x02\x00\x00\x00\x00,1473934644524.39d9ea92d6f2109a682f66f38a795b79.
		… 
		2016-09-15 10:31:37,204 INFO  [regionserver/slave-ahw23.trafodion.org/172.31.3.234:16020.leaseChecker]
regionserver.RSRpcServices: Scanner 12971 lease expired on region TRAFODION.SCH.T106A,\x00\x00\x00\x01\x00\x00\x00\x00,1473934644524.f32b6c8d8f3932729b9824f51a95a63e.
		2016-09-15 10:31:37,205 ERROR [regionserver/slave-ahw23.trafodion.org/172.31.3.234:16020.leaseChecker]
regionserver.RSRpcServices: Closing scanner for TRAFODION.SCH.T106A,\x00\x00\x00\x01\x00\x00\x00\x00,1473934644524.f32b6c8d8f3932729b9824f51a95a63e.
		org.apache.hadoop.hbase.NotServingRegionException: Region TRAFODION.SCH.T106A,\x00\x00\x00\x01\x00\x00\x00\x00,1473934644524.f32b6c8d8f3932729b9824f51a95a63e.
is not online on slave-ahw23.trafodion.org,16020,1473931541282
			at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898)
			at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2875)
			at org.apache.hadoop.hbase.regionserver.RSRpcServices$ScannerListener.leaseExpired(RSRpcServices.java:285)
			at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:121)
			at java.lang.Thread.run(Thread.java:745)
		…
		
		2016-09-15 11:29:49,857 INFO  [PriorityRpcServer.handler=1,queue=1,port=16020] transactional.SplitBalanceHelper:
scannersListClear Active Scanner found, ScannerId: 0 Txid: 1054 Region: TRAFODION.SCH.T106A,\x00\x00\x00\x02\x00\x00\x00\x00,1473934644524.39d9ea92d6f2109a682f66f38a795b79.
		2016-09-15 11:29:49,864 INFO  [PriorityRpcServer.handler=19,queue=1,port=16020] transactional.SplitBalanceHelper:
scannersListClear Active Scanner found, ScannerId: 0 Txid: 1054 Region: TRAFODION.SCH.T106A,\x00\x00\x00\x02\x00\x00\x00\x00,1473934644524.39d9ea92d6f2109a682f66f38a795b79.
		

>From the master logs:
http://traf-testlogs.esgyn.com/Daily-master/332/regress-executor-hdp/hbase-logs/hbase-hbase-master-slave-ahw23.log


		2016-09-15 10:25:01,810 INFO  [AM.ZK.Worker-pool2-t265] master.RegionStates: Transition
{f32b6c8d8f3932729b9824f51a95a63e state=PENDING_CLOSE, ts=1473935101056, server=slave-ahw23.trafodion.org,16020,1473931541282}
to {f32b6c8d8f3932729b9824f51a95a63e state=OFFLINE, ts=1473935101810, server=slave-ahw23.trafodion.org,16020,1473931541282}
		2016-09-15 10:25:01,810 INFO  [AM.ZK.Worker-pool2-t265] master.RegionStates: Offlined f32b6c8d8f3932729b9824f51a95a63e
from slave-ahw23.trafodion.org,16020,1473931541282
		2016-09-15 10:26:31,059 INFO  [slave-ahw23.trafodion.org,16000,1473931529194-org.apache.hadoop.hbase.master.procedure.DisableTableProcedure$BulkDisabler-2]
master.AssignmentManager: Server slave-ahw23.trafodion.org,16020,1473931541282 returned java.io.IOException:
Call to slave-ahw23.trafodion.org/172.31.3.234:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException:
Call id=496, waitTime=90001, operationTimeout=90000 expired. for TRAFODION.SCH.T106A,\x00\x00\x00\x02\x00\x00\x00\x00,1473934644524.39d9ea92d6f2109a682f66f38a795b79.,
try=1 of 10
		java.io.IOException: Call to slave-ahw23.trafodion.org/172.31.3.234:16020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=496, waitTime=90001,
operationTimeout=90000 expired.
			at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1262)
			at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1230)
			at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
			at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
			at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.closeRegion(AdminProtos.java:23149)
			at org.apache.hadoop.hbase.protobuf.ProtobufUtil.closeRegion(ProtobufUtil.java:1737)
			at org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:809)
			at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1853)
			at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:2571)
			at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:2583)
			at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:2458)
			at org.apache.hadoop.hbase.master.procedure.DisableTableProcedure$BulkDisabler$1.run(DisableTableProcedure.java:534)
			at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
			at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
			at java.lang.Thread.run(Thread.java:745)
		Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=496, waitTime=90001,
operationTimeout=90000 expired.
			at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
			at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1204)
			... 13 more


>From the dtm logs:
http://traf-testlogs.esgyn.com/Daily-master/332/regress-executor-hdp/traf_run/logs/trafodion.dtm.log


		2016-09-15 10:35:08,544 ERROR dtm.HBaseTxClient: Returning from HBaseTxClient:prepareCommit,
txid: 1064 retval: RET_IOEXCEPTION IOException
		org.apache.hadoop.hbase.exceptions.TimeoutIOException: java.util.concurrent.TimeoutException:
The procedure 672 is still running
			at org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:1205)
			at org.apache.hadoop.hbase.client.transactional.TransactionManager.disableTable(TransactionManager.java:2959)
			at org.apache.hadoop.hbase.client.transactional.TransactionManager.prepareCommit(TransactionManager.java:1893)
			at org.trafodion.dtm.HBaseTxClient.prepareCommit(HBaseTxClient.java:489)
		Caused by: java.util.concurrent.TimeoutException: The procedure 672 is still running
			at org.apache.hadoop.hbase.client.HBaseAdmin$ProcedureFuture.waitProcedureResult(HBaseAdmin.java:4177)
			at org.apache.hadoop.hbase.client.HBaseAdmin$ProcedureFuture.get(HBaseAdmin.java:4098)
			at org.apache.hadoop.hbase.client.HBaseAdmin.disableTable(HBaseAdmin.java:1201)
			... 3 more

Regards
Arvind


-----Original Message-----
From: steve.varnau@esgyn.com [mailto:steve.varnau@esgyn.com] 
Sent: Thursday, September 15, 2016 4:54 AM
To: dev@trafodion.incubator.apache.org
Subject: Trafodion master Daily Test Result - 332 - Still Failing

Daily Automated Testing master

Jenkins Job:   https://jenkins.esgyn.com/job/Check-Daily-master/332/
Archived Logs: http://traf-testlogs.esgyn.com/Daily-master/332
Bld Downloads: http://traf-builds.esgyn.com

Changes since previous daily build:
[dbirdsall] [TRAFODION-2187] Fix DROP SCHEMA CASCADE when sample tables are present

[anoop.sharma] jira TRAFODION-2184 groupby/orderby construct extensions and enablement

[hzeller] [TRAFODION-2222] Trafodion on HDP 2.3 needs yarn client in class path

[anoop.sharma] jira TRAFODION-2184 additional changes



Test Job Results:

FAILURE core-regress-executor-hdp (2 hr 20 min) SUCCESS build-rh6-master-debug (30 min) SUCCESS
build-rh6-master-release (35 min) SUCCESS core-regress-charsets-cdh (44 min) SUCCESS core-regress-charsets-hdp
(43 min) SUCCESS core-regress-compGeneral-cdh (41 min) SUCCESS core-regress-compGeneral-hdp
(1 hr 1 min) SUCCESS core-regress-core-cdh (1 hr 1 min) SUCCESS core-regress-core-hdp (1 hr
25 min) SUCCESS core-regress-executor-cdh (1 hr 14 min) SUCCESS core-regress-fullstack2-cdh
(10 min) SUCCESS core-regress-fullstack2-hdp (21 min) SUCCESS core-regress-hive-cdh (49 min)
SUCCESS core-regress-hive-hdp (48 min) SUCCESS core-regress-privs1-cdh (50 min) SUCCESS core-regress-privs1-hdp
(53 min) SUCCESS core-regress-privs2-cdh (51 min) SUCCESS core-regress-privs2-hdp (1 hr 20
min) SUCCESS core-regress-qat-cdh (29 min) SUCCESS core-regress-qat-hdp (38 min) SUCCESS core-regress-seabase-cdh
(1 hr 27 min) SUCCESS core-regress-seabase-hdp (1 hr 54 min) SUCCESS core-regress-udr-cdh
(25 min) SUCCESS core-regress-udr-hdp (40 min) SUCCESS jdbc_test-cdh (30 min) SUCCESS jdbc_test-hdp
(48 min) SUCCESS phoenix_part1_T2-cdh (1 hr 7 min) SUCCESS phoenix_part1_T2-hdp (1 hr 19 min)
SUCCESS phoenix_part1_T4-cdh (1 hr 2 min) SUCCESS phoenix_part1_T4-hdp (1 hr 23 min) SUCCESS
phoenix_part2_T2-cdh (1 hr 2 min) SUCCESS phoenix_part2_T2-hdp (1 hr 19 min) SUCCESS phoenix_part2_T4-cdh
(1 hr 0 min) SUCCESS phoenix_part2_T4-hdp (1 hr 19 min) SUCCESS pyodbc_test-cdh (11 min) SUCCESS
pyodbc_test-hdp (27 min)


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message