spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vaquar khan <vaquar.k...@gmail.com>
Subject Re: [Spark-Submit] Where to store data files while running job in cluster mode?
Date Sat, 30 Sep 2017 00:08:44 GMT
If you're running in a clustered mode you need to copy the file across all
the nodes of same shared file system.

1) put it into a distributed filesystem as HDFS or via (s)ftp

2) you  have to transfer /sftp the file into the worker node before running
the Spark job and then you have to put as an argument of textFile the path
of the file in the worker filesystem.

Regards,
Vaquar khan

On Fri, Sep 29, 2017 at 2:00 PM, JG Perrin <jperrin@lumeris.com> wrote:

> On a test system, you can also use something like
> Owncloud/Nextcloud/Dropbox to insure that the files are synchronized. Would
> not do it for TB of data ;) ...
>
> -----Original Message-----
> From: Jörn Franke [mailto:jornfranke@gmail.com]
> Sent: Friday, September 29, 2017 5:14 AM
> To: Gaurav1809 <gauravhpandya@gmail.com>
> Cc: user@spark.apache.org
> Subject: Re: [Spark-Submit] Where to store data files while running job in
> cluster mode?
>
> You should use a distributed filesystem such as HDFS. If you want to use
> the local filesystem then you have to copy each file to each node.
>
> > On 29. Sep 2017, at 12:05, Gaurav1809 <gauravhpandya@gmail.com> wrote:
> >
> > Hi All,
> >
> > I have multi node architecture of (1 master,2 workers) Spark cluster,
> > the job runs to read CSV file data and it works fine when run on local
> > mode (Local(*)).
> > However, when the same job is ran in cluster mode(Spark://HOST:PORT),
> > it is not able to read it.
> > I want to know how to reference the files Or where to store them?
> > Currently the CSV data file is on master(from where the job is
> submitted).
> >
> > Following code works fine in local mode but not in cluster mode.
> >
> > val spark = SparkSession
> >      .builder()
> >      .appName("SampleFlightsApp")
> >      .master("spark://masterIP:7077") // change it to
> > .master("local[*]) for local mode
> >      .getOrCreate()
> >
> >    val flightDF =
> > spark.read.option("header",true).csv("/home/username/sampleflightdata")
> >    flightDF.printSchema()
> >
> > Error: FileNotFoundException: File
> > file:/home/username/sampleflightdata does not exist
> >
> >
> >
> > --
> > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>


-- 
Regards,
Vaquar Khan
+1 -224-436-0783
Greater Chicago

Mime
View raw message