spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ABHISHEK <abhi...@gmail.com>
Subject Spark Yarn Cluster with Reference File
Date Fri, 23 Sep 2016 07:33:18 GMT
Hello there,

I have Spark Application which refer to an external file ‘abc.drl’ and
having unstructured data.
Application is able to find this reference file if I  run app in Local mode
but in Yarn with Cluster mode, it is not able to  find the file in the
specified path.
I tried with both local and hdfs path with –-files option but it didn’t
work.


What is working ?
1. Current  Spark Application runs fine if I run it in Local mode as
mentioned below.
In below command   file path is local path not HDFS.
spark-submit --master local[*]  --class "com.abc.StartMain"
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl

3. I want to run this Spark application using Yarn with cluster mode.
For that, I used below command but application is not able to find the path
for the reference file abc.drl.I tried giving both local and HDFS path but
didn’t work.

spark-submit --master yarn --deploy-mode cluster  --files
/home/abhietc/abc/abc.drl --class com.abc.StartMain
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar /home/abhietc/abc/abc.drl

spark-submit --master yarn --deploy-mode cluster  --files hdfs://
abhietc.com:8020/user/abhietc/abc.drl --class com.abc.StartMain
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar hdfs://
abhietc.com:8020/user/abhietc/abc.drl

spark-submit --master yarn --deploy-mode cluster  --files hdfs://
abc.com:8020/tmp/abc.drl --class com.abc.StartMain
abc-0.0.1-SNAPSHOT-jar-with-dependencies.jar hdfs://abc.com:8020/tmp/abc.drl


Error Messages:
Surprising we are not doing any Write operation on reference file but still
log shows that application is trying to write to file instead reading the
file.
Also log shows File not found exception for both HDFS and Local path.
-------------
16/09/20 14:49:50 ERROR scheduler.JobScheduler: Error running job streaming
job 1474363176000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage
1.0 (TID 4, abc.com): java.lang.RuntimeException: Unable to write Resource:
FileResource[file=hdfs:/abc.com:8020/user/abhietc/abc.drl]
        at
org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:71)
        at
com.hmrc.taxcalculator.KieSessionFactory$.getNewSession(KieSessionFactory.scala:49)
        at
com.hmrc.taxcalculator.KieSessionFactory$.getKieSession(KieSessionFactory.scala:21)
        at
com.hmrc.taxcalculator.KieSessionFactory$.execute(KieSessionFactory.scala:27)
        at
com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
        at
com.abc.StartMain$$anonfun$main$1$$anonfun$4.apply(TaxCalculatorMain.scala:124)
        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
        at
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: hdfs:/
abc.com:8020/user/abhietc/abc.drl (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:146)
        at
org.drools.core.io.impl.FileSystemResource.getInputStream(FileSystemResource.java:123)
        at
org.drools.compiler.kie.builder.impl.KieFileSystemImpl.write(KieFileSystemImpl.java:58)
        ... 19 more
--------------
Cheers,
Abhishek

Mime
View raw message