spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: [Spark-Submit] Where to store data files while running job in cluster mode?
Date Fri, 29 Sep 2017 10:14:10 GMT
You should use a distributed filesystem such as HDFS. If you want to use the local filesystem
then you have to copy each file to each node.

> On 29. Sep 2017, at 12:05, Gaurav1809 <gauravhpandya@gmail.com> wrote:
> 
> Hi All,
> 
> I have multi node architecture of (1 master,2 workers) Spark cluster, the
> job runs to read CSV file data and it works fine when run on local mode
> (Local(*)). 
> However, when the same job is ran in cluster mode(Spark://HOST:PORT), it is
> not able to read it. 
> I want to know how to reference the files Or where to store them? Currently
> the CSV data file is on master(from where the job is submitted).
> 
> Following code works fine in local mode but not in cluster mode.
> 
> val spark = SparkSession
>      .builder()
>      .appName("SampleFlightsApp")
>      .master("spark://masterIP:7077") // change it to .master("local[*])
> for local mode
>      .getOrCreate()
> 
>    val flightDF =
> spark.read.option("header",true).csv("/home/username/sampleflightdata")
>    flightDF.printSchema()
> 
> Error: FileNotFoundException: File file:/home/username/sampleflightdata does
> not exist
> 
> 
> 
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message