spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Reading files on a cluster / shared file system
Date Thu, 16 Jan 2014 00:54:15 GMT
If you are running a distributed Spark cluster over the nodes, then the
reading should be done in a distributed manner. If you give sc.textFile() a
"local path" to a directory in the shared file system, then each worker
should read a subset of the files in directory by accessing them locally.
Nothing should be read on the master.

TD


On Wed, Jan 15, 2014 at 3:56 PM, Ognen Duzlevski
<ognen@nengoiksvelzud.com>wrote:

> On a cluster where the nodes and the master all have access to a shared
> filesystem/files - does spark read a file (like one resulting from
> sc.textFile()) in parallel/different sections on each node? Or is the file
> read on master in sequence and chunks processed on the nodes afterwards?
>
> Thanks!
> Ognen
>

Mime
View raw message