trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arvind" <narain.arv...@gmail.com>
Subject RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing
Date Fri, 27 May 2016 05:26:58 GMT
Hi Steve,

It does seem to be related to ephemeral ports and/or tcp timeout settings (time_wait state
for 2 mins or something similar). 

Other logs might indicate how many socket opens are being done for this test, but the following
log files (.3) show that we most probably ran out of ephemeral ports, which does match with
what you were suspecting.

http://traf-testlogs.esgyn.com/Requested/57/regress-cm5.4/traf_run/logs/trafodion.hdfs.log.3
http://traf-testlogs.esgyn.com/Requested/57/regress-cm5.4/traf_run/logs/trafodion.hdfs.log.2
http://traf-testlogs.esgyn.com/Requested/57/regress-cm5.4/traf_run/logs/trafodion.hdfs.log.1
http://traf-testlogs.esgyn.com/Requested/57/regress-cm5.4/traf_run/logs/trafodion.hdfs.log

		2016-05-26 17:34:29,178 INFO compress.CodecPool: Got brand-new compressor [.gz]
		2016-05-26 17:36:11,728 INFO hdfs.DFSClient: Exception in createBlockOutputStream
		java.net.BindException: Cannot assign requested address
			at sun.nio.ch.Net.connect0(Native Method)
			at sun.nio.ch.Net.connect(Net.java:484)
			at sun.nio.ch.Net.connect(Net.java:476)
			at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:675)
			at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
			at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
			at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1622)
			at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1420)
			at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1373)
			at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:600)

Regards
Arvind

-----Original Message-----
From: Steve Varnau [mailto:steve.varnau@esgyn.com] 
Sent: Thursday, May 26, 2016 12:12 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing

Tested out this theory. I tested PR502 (Selva's memory fix for TEST018) against hive test,
and it still fails:
https://jenkins.esgyn.com/job/Requested-Test/57/

Then I changed jenkins config to use the previous VM image and ran it again, and it passed:
https://jenkins.esgyn.com/job/Requested-Test/59/

The only intentional change between those VM images was limiting the range of ephemeral ports.
Perhaps some unintentional change also got in, otherwise I'm stumped how that would cause
this problem.

--Steve


> -----Original Message-----
> From: Steve Varnau [mailto:steve.varnau@esgyn.com]
> Sent: Thursday, May 26, 2016 9:11 AM
> To: 'dev@trafodion.incubator.apache.org'
> <dev@trafodion.incubator.apache.org <mailto:dev@trafodion.incubator.apache.org>
>
> Subject: RE: Trafodion release2.0 Daily Test Result - 23 - Still 
> Failing
>
> I think the error usually looks like that or more often it hangs and 
> the test times out.
>
> The odd thing is that it started failing on both branches on the same day.
> There
> were changes on master branch, but none on the release2.0 branch.  
> That is what makes me think the trigger was environmental rather than 
> a code change.
>
> I guess I could switch jenkins back to using the previous VM image to 
> see if it goes away.
>
> --Steve
>
>
> > -----Original Message-----
> > From: Sandhya Sundaresan [mailto:sandhya.sundaresan@esgyn.com]
> > Sent: Thursday, May 26, 2016 9:04 AM
> > To: dev@trafodion.incubator.apache.org <mailto:dev@trafodion.incubator.apache.org>

> > Subject: RE: Trafodion release2.0 Daily Test Result - 23 - Still 
> > Failing
> >
> >  RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing
> >
> > Hi Steve,
> >
> >    The error today is this :
> >
> >  *** ERROR[8448] Unable to access Hbase interface. Call to 
> > ExpHbaseInterface::scanOpen returned error HBASE_OPEN_ERROR(-704).
> > Cause:
> >
> > > java.lang.Exception: Cannot create Table Snapshot Scanner
> >
> > > org.TRAFODION.sql.HTableClient.startScan(HTableClient.java:1003)
> >
> > We have seen this when there is  java memory pressure in the past.
> >
> > A few days back this same snapshot scan creation failed with this : 
> > I wonder if anyone can see  pattern here or knows the causes of 
> > either of these.
> >
> > >>--snapshot
> >
> > >>execute snp;
> >
> > *** ERROR[8448] Unable to access Hbase interface. Call to 
> > ExpHbaseInterface::scanOpen returned error HBASE_OPEN_ERROR(-704).
> > Cause:
> >
> > java.io.IOException: java.util.concurrent.ExecutionException:
> > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> >
> /bulkload/20160520102824/TRAFODION.HBASE.CUSTOMER_ADDRESS_SNAP11
> > 1/6695c6f9-4bb5-4ad5-893b-
> >
> adf07fc8a4b9/data/default/TRAFODION.HBASE.CUSTOMER_ADDRESS/7143c21
> > b40a7bef21768685f7dc18e1c/.regioninfo
> > could only be replicated to 0 nodes instead of minReplication (=1).
> > There
> > are 1 datanode(s) running and no node(s) are excluded in this operation.
> >
> >         at
> >
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarge
> t
> > 4NewBlock(BlockManager.java:1541)
> >
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock
> (F
> > SNamesystem.java:3289)
> >
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(Nam
> > eNodeRpcServer.java:668)
> >
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClien
> tPro
> > tocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
> >
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTran
> sla
> > torPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483
> > )
> >
> >         at
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$Cli
> entN
> >
> amenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java
> )
> >
> >         at
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call
> (Pro
> > tobufRpcEngine.java:619)
> >
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> >
> >         at 
> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> >
> >         at 
> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
> >
> >         at java.security.AccessController.doPrivileged(Native 
> > Method)
> >
> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat
> ion
> > .java:1671)
> >
> >         at 
> > org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
> >
> >
> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyReg
> ionU
> > tils.java:162)
> >
> >
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegion
> s(R
> > estoreSnapshotHelper.java:561)
> >
> >
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegi
> ons
> > (RestoreSnapshotHelper.java:237)
> >
> >
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegi
> ons
> > (RestoreSnapshotHelper.java:159)
> >
> >
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotFor
> Sca
> > nner(RestoreSnapshotHelper.java:812)
> >
> >
> org.apache.hadoop.hbase.client.TableSnapshotScanner.init(TableSnapshot
> Scann
> > er.java:156)
> >
> >
> org.apache.hadoop.hbase.client.TableSnapshotScanner.<init>(TableSnapsh
> otSca
> > nner.java:124)
> >
> >
> org.apache.hadoop.hbase.client.TableSnapshotScanner.<init>(TableSnapsh
> otSca
> > nner.java:101)
> >
> >
> org.trafodion.sql.HTableClient$SnapshotScanHelper.createTableSnapshotS
> cann
> > er(HTableClient.java:222)
> >
> > org.trafodion.sql.HTableClient.startScan(HTableClient.java:1009)
> >
> > .
> >
> > --- 0 row(s) selected.
> >
> > >>log;
> >
> > Sandhya
> >
> > -----Original Message-----
> > From: Steve Varnau [mailto:steve.varnau@esgyn.com 
> > <steve.varnau@esgyn.com <mailto:steve.varnau@esgyn.com> >]
> > Sent: Thursday, May 26, 2016 8:49 AM
> > To: dev@trafodion.incubator.apache.org <mailto:dev@trafodion.incubator.apache.org>

> > Subject: RE: Trafodion release2.0 Daily Test Result - 23 - Still 
> > Failing
> >
> > This hive regression behavior is still puzzling, however, I just 
> > realized one thing that did change just before it started failing 
> > and is a test environment change common to both branches.  The VM 
> > image for cloudera was updated to set a smaller ephemeral port range 
> > to reduce chance of port conflict that was occasionally impacting 
> > HBase.
> >
> > The range was set to 51000 - 59999, to avoid default port numbers 
> > that Cloudera distro uses.
> >
> > So how could this possibly be causing disaster in hive/TEST018?   I have
> > no
> >
> > idea.
> >
> > --Steve
> >
> > > -----Original Message-----
> >
> > > From: steve.varnau@esgyn.com <mailto:steve.varnau@esgyn.com>  [mailto:steve.varnau@esgyn.com
> > <steve.varnau@esgyn.com <mailto:steve.varnau@esgyn.com> >]
> >
> > > Sent: Thursday, May 26, 2016 1:36 AM
> >
> > > To: dev@trafodion.incubator.apache.org <mailto:dev@trafodion.incubator.apache.org>

> >
> > > Subject: Trafodion release2.0 Daily Test Result - 23 - Still 
> > > Failing
> >
> > >
> >
> > > Daily Automated Testing release2.0
> >
> > >
> >
> > > Jenkins Job:
> > > https://jenkins.esgyn.com/job/Check-Daily-release2.0/23/
> >
> > > Archived Logs: http://traf-testlogs.esgyn.com/Daily-release2.0/23
> >
> > > Bld Downloads: http://traf-builds.esgyn.com
> >
> > >
> >
> > > Changes since previous daily build:
> >
> > > No changes
> >
> > >
> >
> > >
> >
> > > Test Job Results:
> >
> > >
> >
> > > FAILURE core-regress-hive-cdh (55 min) SUCCESS 
> > > build-release2.0-debug
> >
> > > (24 min) SUCCESS build-release2.0-release (28 min) SUCCESS
> >
> > > core-regress-charsets-cdh (28 min) SUCCESS 
> > > core-regress-charsets-hdp
> >
> > > (41 min) SUCCESS core-regress-compGeneral-cdh (36 min) SUCCESS
> >
> > > core-regress-compGeneral-hdp (45 min) SUCCESS 
> > > core-regress-core-cdh
> >
> > > (39 min) SUCCESS core-regress-core-hdp (1 hr 10 min) SUCCESS
> >
> > > core-regress-executor-cdh (56 min) SUCCESS 
> > > core-regress-executor-hdp
> >
> > > (1 hr 25 min) SUCCESS core-regress-fullstack2-cdh (13 min) SUCCESS
> >
> > > core-regress-fullstack2-hdp (14 min) SUCCESS core-regress-hive-hdp 
> > > (53
> >
> > > min) SUCCESS core-regress-privs1-cdh (39 min) SUCCESS
> >
> > > core-regress-privs1-hdp (59 min) SUCCESS core-regress-privs2-cdh 
> > > (41
> >
> > > min) SUCCESS core-regress-privs2-hdp (54 min) SUCCESS
> >
> > > core-regress-qat-cdh (16 min) SUCCESS core-regress-qat-hdp (21 
> > > min)
> >
> > > SUCCESS core-regress-seabase-cdh (57 min) SUCCESS
> >
> > > core-regress-seabase-hdp (1 hr 16 min) SUCCESS 
> > > core-regress-udr-cdh
> >
> > > (28 min) SUCCESS core-regress-udr-hdp (31 min) SUCCESS 
> > > jdbc_test-cdh
> >
> > > (22 min) SUCCESS jdbc_test-hdp (40 min) SUCCESS 
> > > phoenix_part1_T2-cdh
> >
> > > (56 min) SUCCESS phoenix_part1_T2-hdp (1 hr 17 min) SUCCESS
> >
> > > phoenix_part1_T4-cdh (46 min) SUCCESS phoenix_part1_T4-hdp (57 
> > > min)
> >
> > > SUCCESS phoenix_part2_T2-cdh (53 min) SUCCESS phoenix_part2_T2-hdp 
> > > (1
> >
> > > hr 25 min) SUCCESS phoenix_part2_T4-cdh (44 min) SUCCESS
> >
> > > phoenix_part2_T4-hdp (1 hr 0 min) SUCCESS pyodbc_test-cdh (11 min)
> >
> > > SUCCESS pyodbc_test-hdp (23 min)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message