spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matan Safriel <dev.ma...@gmail.com>
Subject Re: Appending to an hdfs file
Date Thu, 29 Jan 2015 09:47:35 GMT
Thanks. I actually looked up foreachPartition() in this context yesterday,
and couldn't land where it's documented in Javadocs or elsewhere.. probably
for some silly reason. Can you please point me in the right direction?

Many thanks!

By the way, I realize the solution should rather be to concatenate the
output of a job to a file after it's done, rather than append small pieces
from individual workers.

On Thu, Jan 29, 2015 at 1:12 AM, Sean Owen <sowen@cloudera.com> wrote:

> You can call any API you like in a Spark job, as long as the libraries
> are available, and Hadoop HDFS APIs will be available from the
> cluster. You could write a foreachPartition() that appends partitions
> of data to files, yes.
>
> Spark itself does not use appending. I think the biggest reason is
> that RDDs are immutable and so their input and output is naturally
> immutable, not mutable.
>
> On Wed, Jan 28, 2015 at 10:39 PM, Matan Safriel <dev.matan@gmail.com>
> wrote:
> > Hi,
> >
> > Is it possible to append to an existing (hdfs) file, through some Spark
> > action?
> > Should there be any reason not to use a hadoop append api within a Spark
> > job?
> >
> > Thanks,
> > Matan
> >
>

Mime
View raw message