Place it in HDFS and give the reference path in your code.

Thanks,
Sathish

On Fri, Sep 29, 2017 at 3:31 PM, Gaurav1809 <gauravhpandya@gmail.com> wrote:
Hi All,

I have multi node architecture of (1 master,2 workers) Spark cluster, the
job runs to read CSV file data and it works fine when run on local mode
(Local(*)). However, when the same job is ran in cluster mode
(Spark://HOST:PORT), it is not able to read it. I want to know how to
reference the files Or where to store them? Currently the CSV data file is
on master(from where the job is submitted).

Following code works fine in local mode but not in cluster mode.

val spark = SparkSession
      .builder()
      .appName("SampleFlightsApp")
      .master("spark://masterIP:7077") // change it to .master("local[*])
for local mode
      .getOrCreate()

    val flightDF =
spark.read.option("header",true).csv("/home/username/sampleflightdata")
    flightDF.printSchema()

Error: FileNotFoundException: File file:/home/gaurav/sampleflightdata does
not exist



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org