spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantinos Kougios <kostas.koug...@googlemail.com>
Subject Re: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS
Date Mon, 08 Jun 2015 14:39:39 GMT
No luck I am afraid. After giving the namenode 16GB of RAM, I am still 
getting an out of mem exception, kind of different one:

15/06/08 15:35:52 ERROR yarn.ApplicationMaster: User class threw 
exception: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
     at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1351)
     at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
     at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
     at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)
     at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
     at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:606)
     at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
     at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
     at com.sun.proxy.$Proxy10.getListing(Unknown Source)
     at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
     at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:724)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
     at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
     at org.apache.hadoop.fs.Globber.listStatus(Globber.java:69)
     at org.apache.hadoop.fs.Globber.glob(Globber.java:217)
     at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1644)
     at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:292)
     at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
     at 
org.apache.spark.input.StreamFileInputFormat.setMinPartitions(PortableDataStream.scala:47)
     at 
org.apache.spark.rdd.BinaryFileRDD.getPartitions(BinaryFileRDD.scala:43)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
     at scala.Option.getOrElse(Option.scala:120)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
     at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)


and on the 2nd retry of spark, a similar exception:

java.lang.OutOfMemoryError: GC overhead limit exceeded
     at 
com.google.protobuf.LiteralByteString.toString(LiteralByteString.java:148)
     at com.google.protobuf.ByteString.toStringUtf8(ByteString.java:572)
     at 
org.apache.hadoop.hdfs.protocol.proto.HdfsProtos$HdfsFileStatusProto.getOwner(HdfsProtos.java:21558)
     at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
     at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
     at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)
     at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
     at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:606)
     at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
     at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
     at com.sun.proxy.$Proxy10.getListing(Unknown Source)
     at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
     at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:724)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
     at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
     at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
     at org.apache.hadoop.fs.Globber.listStatus(Globber.java:69)
     at org.apache.hadoop.fs.Globber.glob(Globber.java:217)
     at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1644)
     at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:292)
     at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
     at 
org.apache.spark.input.StreamFileInputFormat.setMinPartitions(PortableDataStream.scala:47)
     at 
org.apache.spark.rdd.BinaryFileRDD.getPartitions(BinaryFileRDD.scala:43)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
     at scala.Option.getOrElse(Option.scala:120)
     at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)


Any ideas which part of hadoop is running out of mem?


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message