spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaurav1809 <gauravhpan...@gmail.com>
Subject [Spark-Submit] Where to store data files while running job in cluster mode?
Date Fri, 29 Sep 2017 10:01:27 GMT
Hi All,

I have multi node architecture of (1 master,2 workers) Spark cluster, the
job runs to read CSV file data and it works fine when run on local mode
(Local(*)). However, when the same job is ran in cluster mode
(Spark://HOST:PORT), it is not able to read it. I want to know how to
reference the files Or where to store them? Currently the CSV data file is
on master(from where the job is submitted).

Following code works fine in local mode but not in cluster mode.

val spark = SparkSession
      .builder()
      .appName("SampleFlightsApp")
      .master("spark://masterIP:7077") // change it to .master("local[*])
for local mode
      .getOrCreate()

    val flightDF =
spark.read.option("header",true).csv("/home/username/sampleflightdata")
    flightDF.printSchema()

Error: FileNotFoundException: File file:/home/gaurav/sampleflightdata does
not exist



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message