spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewan Leith <ewan.le...@realitymine.com>
Subject RE: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS
Date Mon, 08 Jun 2015 14:12:14 GMT
Try putting a * on the end of xmlDir, i.e.

xmlDir = fdfs:///abc/def/*

Rather than

xmlDir = Hdfs://abc/def

and see what happens. I don't know why, but that appears to be more reliable for me with S3
as the filesystem.

I'm also using binaryFiles, but I've tried running the same command while wholeTextFiles and
had the same error.

Ewan

-----Original Message-----
From: Kostas Kougios [mailto:kostas.kougios@googlemail.com] 
Sent: 08 June 2015 15:02
To: user@spark.apache.org
Subject: spark timesout maybe due to binaryFiles() with more than 1 million files in HDFS

I am reading millions of xml files via

val xmls = sc.binaryFiles(xmlDir)

The operation runs fine locally but on yarn it fails with:

 client token: N/A
 diagnostics: Application application_1433491939773_0012 failed 2 times due to ApplicationMaster
for attempt appattempt_1433491939773_0012_000002 timed out. Failing the application.
 ApplicationMaster host: N/A
 ApplicationMaster RPC port: -1
 queue: default
 start time: 1433750951883
 final status: FAILED
 tracking URL:
http://controller01:8088/cluster/app/application_1433491939773_0012
 user: ariskk
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed
status at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

On hadoops/userlogs logs I am frequently getting these messages:

15/06/08 09:15:38 WARN util.AkkaUtils: Error sending message [message = Heartbeat(1,[Lscala.Tuple2;@2b4f336b,BlockManagerId(1,
controller01.stratified, 58510))] in 2 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:195)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:427)

I run my spark job via spark-submit and it works for an other HDFS directory that contains
only 37k files. Any ideas how to resolve this?




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-timesout-maybe-due-to-binaryFiles-with-more-than-1-million-files-in-HDFS-tp23208.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail:
user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message